SQLite Full Text Search: Implementing Text-Based Queries

Searching textual data efficiently is critical for many applications – whether it’s a database of documents, a catalog of products, a collection of support articles, or anything else that contains long-form text.

Finding relevant records using textual keywords can be slow and cumbersome without full-text search capabilities.

SQLite’s full-text search module enables you to perform advanced linguistic queries against text stored in SQLite database tables. Instead of basic LIKE queries that do substring matching, SQLite FTS allows you to search for words, phrases, and keywords in documents stored across multiple columns.

In this comprehensive guide, you’ll learn:

  • How to configure full-text search virtual tables in SQLite databases
  • Different types of textual queries supported by FTS modules
  • Matching records by words, clauses, prefixes, and proximity
  • Ranking results by relevance and other metrics
  • Optimizing full-text search performance

By the end, you can add robust text search and retrieval capabilities to SQLite applications – no external search servers are required!

What is Full Text Search in SQLite

SQLite supports full-text indexes and searches in its built-in FTS5 module. FTS5 allows you to run advanced linguistic queries by converting textual content into indexed tokens. Some key capabilities provided by FTS5 include:

  • Tokenization – Splitting text into meaningful words/tokens using custom rules
  • Term matching – Quickly look at exact words or prefixes.
  • Boolean queries – Combine terms with AND, OR, and NOT operators
  • Phrase search – Match on specific word sequences
  • Proximity search – Query for terms within a given distance
  • Ranking – Order matches by relevance to query
  • Efficient indexing – Uses compressed trie-based indexes

Integrating full-text search into SQLite allows you to add search capabilities directly inside the database engine without needing external tools.

A full-text search is valid when you query tables containing large volumes of text – like catalogs of products, document databases, corpuses of support articles, QA pairs, and more. Matching text with LIKE queries can be inefficient and limit the search logic you can implement.

Let’s look at examples of creating and querying SQLite FTS5 tables!

Creating SQLite Full-Text Search Virtual Tables

To use full-text search in SQLite, you first need to create special virtual tables that will store the indexed tokens for your text. This involves just a bit of SQL:

-- Create table
CREATE VIRTUAL TABLE documents USING FTS5();

-- Insert data
INSERT INTO documents (title, body) VALUES
  ('SQL Tutorial', 'Intro to working with SQLite databases...'),
  ('Full Text Search Guide', 'How to add FTS capabilities to SQLite tables...'); 

That’s it! With just the USING FTS5() directive, SQLite will convert the text values into indexed tokens behind the scenes.

By default, FTS5 will tokenize multiple words, strip affixes to reduce to root form, remove stop words, normalize case, and handle compound tokenization – providing a great starting point for search relevancy.

We could also specify custom tokenization rules and configuration options:

CREATE VIRTUAL TABLE docs USING FTS5(
  tokenize='porter alphanumeric',
  prefix='2,4',
  columnsize=20, notindexed=title
);

We’ve configured a custom Porter stemming tokenizer, specified min/max index prefixes, set the storage column size, and chosen not to index the title column.

Creating Multi-Column Full Text Indexes

A useful capability provided by SQLite’s FTS modules is the ability to create multi-column full-text indexes. This allows rows to match queries by content across multiple columns.

For example, in an articles database, we want to match both title and content:

CREATE VIRTUAL TABLE articles USING FTS5(title, content);

INSERT INTO articles (title, content) VALUES
  ('Intro SQL', 'Tutorial using SQLite databases...'),
  ('SQLite FTS', 'Guide to full text search in SQLite...');
  
SELECT * FROM articles
WHERE articles MATCH 'sqlite database guide';

Creating multi-column FTS indexes greatly expands the documents and content that will match search queries.

Now let’s look at examples of different types of textual searches possible with SQLite’s full-text modules!

Basic Exact Match Text Queries in SQLite

The most basic FTS query in SQLite attempts to match exact search terms or phrases in documents:

-- Single term
SELECT * FROM docs WHERE docs MATCH 'databases';

-- Multiple search terms
SELECT * FROM docs WHERE docs MATCH 'sqlite full text'; 

-- Exact match phrase
SELECT * FROM docs WHERE docs MATCH '"sqlite full text"';

By default these will match any documents containing all of the words/tokens provided. Results are ordered by an internal ranking – which we can customize using special FTS functions.

The real power comes from combining this simple term matching with rich prefix queries, boolean logic, phrase matching, and more – which we’ll cover next!

Flexible Prefix Queries in SQLite

Finding documents by partial words or prefixes is a common need for providing search auto-completions. SQLite FTS Allows flexible prefix matching using the * wildcard.

For example, getting search suggestion terms while using WHERE.

SELECT * FROM docs WHERE docs MATCH 'sqli*';

Results match any documents with words starting with “sqli”, like “sqlite” or “sqli tutorial”.

We can also specify a minimum prefix length – ensuring only reasonably complete words match:

SELECT * FROM docs WHERE docs MATCH 'sqli*2';

Now at least 2 characters must match to return a result. Prefix queries give users search suggestions as they type and can improve relevancy.

Boolean Search Logic in SQLite

Matching multiple search criteria is critical for relevancy. SQLite FTS supports boolean ORANDNOT operators:

-- OR - match EITHER condition
SELECT * FROM docs WHERE docs MATCH 'database OR sqlite';

-- AND - match BOTH conditions  
SELECT * FROM docs WHERE docs MATCH 'database AND sqlite';

-- NOT - exclude a term
SELECT * FROM docs WHERE docs MATCH 'database NOT postgres';

Arbitrary boolean logic combinations are possible for advanced matching!

Negation with NOT is helpful for “anti-search” to exclude irrelevant content from queries.

Sometimes, we want to match documents only where all terms appear together in a complete phrase.

Wrap search phrases in double quotes:

SELECT * FROM docs WHERE docs MATCH '"sqlite full text search"'; 

This requires the exact sequence of multiple words to appear together to count as a match.

Phrase search prevents matching unrelated words that happen to occur anywhere in the document text.

A more advanced search query supported by FTS5 is proximity or adjacency matching. This allows specifying a maximum separation distance between search terms:

-- Terms must be within 5 tokens  
SELECT * FROM docs WHERE docs MATCH 'sqlite NEAR/5 search';

Proximity search is extremely useful for improving relevancy by requiring related words appear physically close together within documents.

It can also be specified using a minimum/maximum range:

SELECT * FROM docs WHERE docs MATCH 'sqlite NEAR/1 5 search';

Now, words must be between 1-5 tokens away. Try proximity queries in FTS5 for relevance!

Custom Ranking and Scoring in SQLite

By default, FTS will assign an internal ranking score to query matches representing relevance. But we can customize how matches are sorted using special SQLite functions!

To order matches by a document attribute like date:

SELECT *, rank FROM docs 
WHERE docs MATCH 'search tutorial'
ORDER BY date DESC;

More advanced ranking functions are available in FTS5, like bm25() which uses term frequency and inverse document frequency to compute statistical relevance:

SELECT *, bm25(matchinfo(docs)) AS rank 
FROM docs
WHERE docs MATCH 'search tutorial'
ORDER BY rank DESC;

Many ranking formulas are available – allowing fully custom ordering based on Lucene-style scoring algorithms.

Optimizing SQLite Full-Text Queries

To improve full-text search efficiency in SQLite, there are some database-level configuration options we should consider:

  • Faster tokenizers – Simple tokenizers are faster. Standard Unicode tokenization has a high overhead in some cases.
  • Smaller column sizes – Reducing storage column sizes improves index efficiency. Only index prefix substrings are needed for search terms.
  • Asynchronous indexing – A write-ahead log and background threads for tokenization/indexing reduce blocking.
  • Separate content and indexes – Store tokenized content and indexes in separate database files to improve performance.
  • Read-only indexes – Immutable indexes allow memory mapping for faster reads. Rebuild indexes only when necessary.

Try tweaking the FTS optimization PRAGMAs for better performance!

Choosing SQLite FTS4 vs FTS5

SQLite has two core full-text search modules: FTS4 and FTS5. FTS5 is the latest generation with advanced features, but FTS4 may make sense for some applications.

Some critical differences between SQLite FTS4 vs FTS5 include:

FTS4FTS5
StorageLess efficient storage, larger db sizeMore efficient compressed storage
CustomizationFewer tokenizer optionsAdvanced tokenizer configuration
QueriesBasic MATCH queriesAdvanced positional/proximity queries
RankingNo builtin ranking funcsAdvanced ranking formulas
PerformanceSlower at write & query timeFaster indexing & queries
SQLite FTS4 vs FTS5

For most new applications, FTS5 is recommended for full-text search in SQLite unless you have special customization needs supported better in FTS4.

Both versions allow core tokenization and term matching – but FTS5 has advantages for large text corpuses, better efficiency, and advanced queries.

Applying Full-Text Search in SQLite

Now that you understand SQLite’s FTS capabilities, what are some real-world use cases?

Here are a few examples of full-text search applications:

  • Document Databases – Store and query large corpuses of text documents like research papers, news articles, manuals, etc.
  • FAQ Search – Allow self-service querying of support knowledge bases and forums.
  • Product Catalog – Enable detailed attribute faceted search for ecommerce stores.
  • Code Search – Index source code symbols and documentation for codebase search.
  • Chat Search – Application searching message history in chat apps.

Virtually any application that needs text retrieval can benefit from SQLite FTS to add search functionality!

Comparison of SQLite’s Full-Text Search Capabilities

To summarize, here is a comparison of FTS versus regular SQLite queries:

FeatureRegular QueriesFTS Queries
IndexingManual text processing/indexingAutomatic tokenized indexing
StorageTables contain original textExtra virtual tables with indexed tokens
Basic term matchSlow LIKE queries on textFast indexed exact term lookup
Multiple termsComplex concatenation logicSimple AND / OR boolean logic
Prefix matchingSlow LIKE startwith filtersFast wildcard prefix searches
Phrase matchingImpossibleFast multi-term phrase matching
Proximity searchVery slow custom logicBuiltin proximity filters on tokens
RankingHard to implementCustom ranking formulas
PerformanceSlow on large textOptimized for search workloads

SQLite’s full-text search modules enable significantly more advanced, efficient, and customizable text querying capabilities than standard SQL queries.

Summary

I hope this guide has provided a comprehensive overview of implementing performant full-text search functionality within SQLite databases. The key concepts to remember are:

  • Use FTS virtual tables to enable indexed text tokens
  • Configure tokenization appropriately for your domain
  • Utilize simple match queries, boolean logic, phrase search, proximity filters, and custom ranking
  • Optimize with write-ahead logging, external content storage, and read-only indexes

Now you can add powerful text search and retrieval to your SQLite database applications! Let me know if you have any other questions.