How to select unique records by SQL
Get the unique rows in SQL by using the DISTINCT keyword inside your SELECT query. For unique values in a single column, such as email, use:
To retrieve unique rows combined across several columns, such as first_name and last_name, use:
These queries avoid duplicate values, with the second one treating uniqueness as a composite of the two columns.
Getting the hang of DISTINCT
Utilizing DISTINCT is a piece of 🍰 for basic cases. Here's how to handle joins and multiple columns, because we don't want those pesky duplicates:
This query guarantees unique row combinations when joining tables, keeping the data clean and crisp.
Meet the GROUP BY command
When working with aggregate functions or large datasets, GROUP BY walks in like a boss:
This query returns the population of each city in the customers table. GROUP BY clubs rows with the same column value into a single group.
ROW_NUMBER(): the secret sauce
Need spicier uniqueness? Window functions coupled with ROW_NUMBER() within a Common Table Expression (CTE) can save the day:
This beast of a query tags customers based on their signup date, then selects those rocking the most recent unique emails.
Mastering SQL's dialect
Every SQL dialect has unique twists. The DISTINCT ON operation for PostgreSQL returns unique events based on location:
When in doubt, always validate your approach against the SQL syntax of the database management system (DBMS) you're using.
Addressing performance speed bumps
When handling large datasets, size does matter. Compare the efficiency of DISTINCT against GROUP BY or subqueries:
Although a column with an index will generally make DISTINCT run faster, testing is key when deciding between using DISTINCT, GROUP BY, or subqueries.
Existing in the real world
In the real world, concrete use-cases guide your choice of columns in DISTINCT:
In a scenario where different customers share the same address, choosing addresses indiscriminately with DISTINCT could bind them. Examine your data model with a fine-tooth comb.
Juggling multiple columns
When handling compound keys—combinations of columns that should be unique together—DISTINCT is still your friend:
This query gives you unique city records for each country.
Was this article helpful?