Searching textual data efficiently is critical for many applications – whether it’s a database of documents, a catalog of products, a collection of support articles, or anything else that contains long-form text.
Finding relevant records using textual keywords can be slow and cumbersome without full-text search capabilities.
SQLite’s full-text search module enables you to perform advanced linguistic queries against text stored in SQLite database tables. Instead of basic LIKE
queries that do substring matching, SQLite FTS allows you to search for words, phrases, and keywords in documents stored across multiple columns.
In this comprehensive guide, you’ll learn:
- How to configure full-text search virtual tables in SQLite databases
- Different types of textual queries supported by FTS modules
- Matching records by words, clauses, prefixes, and proximity
- Ranking results by relevance and other metrics
- Optimizing full-text search performance
By the end, you can add robust text search and retrieval capabilities to SQLite applications – no external search servers are required!
What is Full Text Search in SQLite
SQLite supports full-text indexes and searches in its built-in FTS5 module. FTS5 allows you to run advanced linguistic queries by converting textual content into indexed tokens. Some key capabilities provided by FTS5 include:
- Tokenization – Splitting text into meaningful words/tokens using custom rules
- Term matching – Quickly look at exact words or prefixes.
- Boolean queries – Combine terms with AND, OR, and NOT operators
- Phrase search – Match on specific word sequences
- Proximity search – Query for terms within a given distance
- Ranking – Order matches by relevance to query
- Efficient indexing – Uses compressed trie-based indexes
Integrating full-text search into SQLite allows you to add search capabilities directly inside the database engine without needing external tools.
A full-text search is valid when you query tables containing large volumes of text – like catalogs of products, document databases, corpuses of support articles, QA pairs, and more. Matching text with LIKE queries can be inefficient and limit the search logic you can implement.
Let’s look at examples of creating and querying SQLite FTS5 tables!
Creating SQLite Full-Text Search Virtual Tables
To use full-text search in SQLite, you first need to create special virtual tables that will store the indexed tokens for your text. This involves just a bit of SQL:
-- Create table
CREATE VIRTUAL TABLE documents USING FTS5();
-- Insert data
INSERT INTO documents (title, body) VALUES
('SQL Tutorial', 'Intro to working with SQLite databases...'),
('Full Text Search Guide', 'How to add FTS capabilities to SQLite tables...');
That’s it! With just the USING FTS5()
directive, SQLite will convert the text values into indexed tokens behind the scenes.
By default, FTS5 will tokenize multiple words, strip affixes to reduce to root form, remove stop words, normalize case, and handle compound tokenization – providing a great starting point for search relevancy.
We could also specify custom tokenization rules and configuration options:
CREATE VIRTUAL TABLE docs USING FTS5(
tokenize='porter alphanumeric',
prefix='2,4',
columnsize=20, notindexed=title
);
We’ve configured a custom Porter stemming tokenizer, specified min/max index prefixes, set the storage column size, and chosen not to index the title column.
Creating Multi-Column Full Text Indexes
A useful capability provided by SQLite’s FTS modules is the ability to create multi-column full-text indexes. This allows rows to match queries by content across multiple columns.
For example, in an articles database, we want to match both title
and content
:
CREATE VIRTUAL TABLE articles USING FTS5(title, content);
INSERT INTO articles (title, content) VALUES
('Intro SQL', 'Tutorial using SQLite databases...'),
('SQLite FTS', 'Guide to full text search in SQLite...');
SELECT * FROM articles
WHERE articles MATCH 'sqlite database guide';
Creating multi-column FTS indexes greatly expands the documents and content that will match search queries.
Now let’s look at examples of different types of textual searches possible with SQLite’s full-text modules!
Basic Exact Match Text Queries in SQLite
The most basic FTS query in SQLite attempts to match exact search terms or phrases in documents:
-- Single term
SELECT * FROM docs WHERE docs MATCH 'databases';
-- Multiple search terms
SELECT * FROM docs WHERE docs MATCH 'sqlite full text';
-- Exact match phrase
SELECT * FROM docs WHERE docs MATCH '"sqlite full text"';
By default these will match any documents containing all of the words/tokens provided. Results are ordered by an internal ranking – which we can customize using special FTS functions.
The real power comes from combining this simple term matching with rich prefix queries, boolean logic, phrase matching, and more – which we’ll cover next!
Flexible Prefix Queries in SQLite
Finding documents by partial words or prefixes is a common need for providing search auto-completions. SQLite FTS Allows flexible prefix matching using the *
wildcard.
For example, getting search suggestion terms while using WHERE.
SELECT * FROM docs WHERE docs MATCH 'sqli*';
Results match any documents with words starting with “sqli”, like “sqlite” or “sqli tutorial”.
We can also specify a minimum prefix length – ensuring only reasonably complete words match:
SELECT * FROM docs WHERE docs MATCH 'sqli*2';
Now at least 2 characters must match to return a result. Prefix queries give users search suggestions as they type and can improve relevancy.
Boolean Search Logic in SQLite
Matching multiple search criteria is critical for relevancy. SQLite FTS supports boolean OR
, AND
, NOT
operators:
-- OR - match EITHER condition
SELECT * FROM docs WHERE docs MATCH 'database OR sqlite';
-- AND - match BOTH conditions
SELECT * FROM docs WHERE docs MATCH 'database AND sqlite';
-- NOT - exclude a term
SELECT * FROM docs WHERE docs MATCH 'database NOT postgres';
Arbitrary boolean logic combinations are possible for advanced matching!
Negation with NOT
is helpful for “anti-search” to exclude irrelevant content from queries.
Phrase Search in SQLite
Sometimes, we want to match documents only where all terms appear together in a complete phrase.
Wrap search phrases in double quotes:
SELECT * FROM docs WHERE docs MATCH '"sqlite full text search"';
This requires the exact sequence of multiple words to appear together to count as a match.
Phrase search prevents matching unrelated words that happen to occur anywhere in the document text.
Proximity Search in SQLite
A more advanced search query supported by FTS5 is proximity or adjacency matching. This allows specifying a maximum separation distance between search terms:
-- Terms must be within 5 tokens
SELECT * FROM docs WHERE docs MATCH 'sqlite NEAR/5 search';
Proximity search is extremely useful for improving relevancy by requiring related words appear physically close together within documents.
It can also be specified using a minimum/maximum range:
SELECT * FROM docs WHERE docs MATCH 'sqlite NEAR/1 5 search';
Now, words must be between 1-5 tokens away. Try proximity queries in FTS5 for relevance!
Custom Ranking and Scoring in SQLite
By default, FTS will assign an internal ranking score to query matches representing relevance. But we can customize how matches are sorted using special SQLite functions!
To order matches by a document attribute like date:
SELECT *, rank FROM docs
WHERE docs MATCH 'search tutorial'
ORDER BY date DESC;
More advanced ranking functions are available in FTS5, like bm25() which uses term frequency and inverse document frequency to compute statistical relevance:
SELECT *, bm25(matchinfo(docs)) AS rank
FROM docs
WHERE docs MATCH 'search tutorial'
ORDER BY rank DESC;
Many ranking formulas are available – allowing fully custom ordering based on Lucene-style scoring algorithms.
Optimizing SQLite Full-Text Queries
To improve full-text search efficiency in SQLite, there are some database-level configuration options we should consider:
- Faster tokenizers – Simple tokenizers are faster. Standard Unicode tokenization has a high overhead in some cases.
- Smaller column sizes – Reducing storage column sizes improves index efficiency. Only index prefix substrings are needed for search terms.
- Asynchronous indexing – A write-ahead log and background threads for tokenization/indexing reduce blocking.
- Separate content and indexes – Store tokenized content and indexes in separate database files to improve performance.
- Read-only indexes – Immutable indexes allow memory mapping for faster reads. Rebuild indexes only when necessary.
Try tweaking the FTS optimization PRAGMAs for better performance!
Choosing SQLite FTS4 vs FTS5
SQLite has two core full-text search modules: FTS4 and FTS5. FTS5 is the latest generation with advanced features, but FTS4 may make sense for some applications.
Some critical differences between SQLite FTS4 vs FTS5 include:
FTS4 | FTS5 | |
---|---|---|
Storage | Less efficient storage, larger db size | More efficient compressed storage |
Customization | Fewer tokenizer options | Advanced tokenizer configuration |
Queries | Basic MATCH queries | Advanced positional/proximity queries |
Ranking | No builtin ranking funcs | Advanced ranking formulas |
Performance | Slower at write & query time | Faster indexing & queries |
For most new applications, FTS5 is recommended for full-text search in SQLite unless you have special customization needs supported better in FTS4.
Both versions allow core tokenization and term matching – but FTS5 has advantages for large text corpuses, better efficiency, and advanced queries.
Applying Full-Text Search in SQLite
Now that you understand SQLite’s FTS capabilities, what are some real-world use cases?
Here are a few examples of full-text search applications:
- Document Databases – Store and query large corpuses of text documents like research papers, news articles, manuals, etc.
- FAQ Search – Allow self-service querying of support knowledge bases and forums.
- Product Catalog – Enable detailed attribute faceted search for ecommerce stores.
- Code Search – Index source code symbols and documentation for codebase search.
- Chat Search – Application searching message history in chat apps.
Virtually any application that needs text retrieval can benefit from SQLite FTS to add search functionality!
Comparison of SQLite’s Full-Text Search Capabilities
To summarize, here is a comparison of FTS versus regular SQLite queries:
Feature | Regular Queries | FTS Queries |
---|---|---|
Indexing | Manual text processing/indexing | Automatic tokenized indexing |
Storage | Tables contain original text | Extra virtual tables with indexed tokens |
Basic term match | Slow LIKE queries on text | Fast indexed exact term lookup |
Multiple terms | Complex concatenation logic | Simple AND / OR boolean logic |
Prefix matching | Slow LIKE startwith filters | Fast wildcard prefix searches |
Phrase matching | Impossible | Fast multi-term phrase matching |
Proximity search | Very slow custom logic | Builtin proximity filters on tokens |
Ranking | Hard to implement | Custom ranking formulas |
Performance | Slow on large text | Optimized for search workloads |
SQLite’s full-text search modules enable significantly more advanced, efficient, and customizable text querying capabilities than standard SQL queries.
Summary
I hope this guide has provided a comprehensive overview of implementing performant full-text search functionality within SQLite databases. The key concepts to remember are:
- Use FTS virtual tables to enable indexed text tokens
- Configure tokenization appropriately for your domain
- Utilize simple match queries, boolean logic, phrase search, proximity filters, and custom ranking
- Optimize with write-ahead logging, external content storage, and read-only indexes
Now you can add powerful text search and retrieval to your SQLite database applications! Let me know if you have any other questions.