SQLite is a popular open-source database that is embedded in many applications. As a relational database, SQLite allows you to query data using SQL syntax. One common SQL clause is DISTINCT, which removes duplicate rows from query results. In this beginner’s guide, we’ll cover the basics of using DISTINCT in SQLite.
To demonstrate DISTINCT, we’ll create a table called
CREATE TABLE users ( id INTEGER PRIMARY KEY, name TEXT, age INTEGER ); INSERT INTO users VALUES (1, 'John', 30), (2, 'Jane', 25), (3, 'Bob', 20), (4, 'Mary', 30);
This gives us a small table with some sample user data:
id|name|age 1|John|30 2|Jane|25 3|Bob|20 4|Mary|30
Now let’s see how DISTINCT works in practice.
What's in this article
Removing Duplicate Rows with DISTINCT
The DISTINCT clause eliminates duplicate rows from the result set. This is useful when you only want unique values returned. For example, to get a list of unique ages in the
SELECT DISTINCT age FROM users;
This would return:
age 30 25 20
Without the DISTINCT clause, the query would return every row:
SELECT age FROM users;
age 30 25 20 30
As you can see, DISTINCT removed the duplicate value
30. This condenses the result set down to only the unique values.
DISTINCT applies to the entire selected row. So if two rows have the same values (but in a different order), only one will be returned.
Combining SQLite DISTINCT with Other Clauses
For example, getting unique names for users over the age of 25:
SELECT DISTINCT name FROM users WHERE age > 25 ORDER BY name;
name Jane John Mary
The WHERE clause filtered the rows, DISTINCT removed duplicates, and ORDER BY sorted the final results.
Another example using DISTINCT with LIMIT:
SELECT DISTINCT age FROM users ORDER BY age DESC LIMIT 2;
age 30 25
DISTINCT removed duplicates, then ORDER BY sorted the ages descending, and finally LIMIT restricted it to 2 rows.
Counting Unique Values with SQLite DISTINCT
A common use case for DISTINCT is counting the number of unique values in a column. For example, to count unique ages:
SELECT COUNT(DISTINCT age) FROM users;
This counts the number of unique ages, excluding any duplicates. Without DISTINCT, it would count all rows, including duplicates:
SELECT COUNT(age) FROM users;
Being able to count distinct values efficiently is useful for analytics queries.
Performance Considerations of SQLite DISTINCT
It’s important to note that DISTINCT can negatively impact performance if used improperly. This is because removing duplicates requires SQLite to sort and scan the entire result set.
For large tables, it’s often faster to use an indexed column in the WHERE clause instead. For example:
SELECT age FROM users WHERE age > 25;
This returns unique ages over 25 by filtering on an indexed column. The results are not scanned and sorted like they would be with DISTINCT.
In general, only use SQLite DISTINCT when you specifically need to remove duplicates. Avoid using it for arbitrary filtering or limiting results – that’s better handled in the WHERE clause.
DISTINCT is a simple but powerful tool for condensing result sets down to unique values. We learned:
- DISTINCT removes duplicate rows from a query
- It can be combined with WHERE, ORDER BY, LIMIT, etc.
- COUNT(DISTINCT col) calculates the number of unique values in a column
- DISTINCT can impact performance if used injudiciously
With this foundation, you can now apply DISTINCT to simplify your queries and analytics in SQLite. Try it out on your own data to see the results. And remember, only use DISTINCT when you specifically need unique values – don’t make it a habit without cause.