Explain Codes LogoExplain Codes Logo

How to select unique records by SQL

sql
distinct
group-by
sql-syntax
Nikita BarsukovbyNikita Barsukov·Oct 20, 2024
TLDR

Get the unique rows in SQL by using the DISTINCT keyword inside your SELECT query. For unique values in a single column, such as email, use:

SELECT DISTINCT email FROM customers;

To retrieve unique rows combined across several columns, such as first_name and last_name, use:

SELECT DISTINCT first_name, last_name FROM customers;

These queries avoid duplicate values, with the second one treating uniqueness as a composite of the two columns.

Getting the hang of DISTINCT

Utilizing DISTINCT is a piece of 🍰 for basic cases. Here's how to handle joins and multiple columns, because we don't want those pesky duplicates:

-- "Uniqueness level: John Wick" SELECT DISTINCT c.customer_id, o.order_id FROM customers c JOIN orders o ON c.customer_id = o.customer_id;

This query guarantees unique row combinations when joining tables, keeping the data clean and crisp.

Meet the GROUP BY command

When working with aggregate functions or large datasets, GROUP BY walks in like a boss:

-- "GROUP-ing like it's Friday night!" SELECT city, COUNT(*) as population FROM customers GROUP BY city;

This query returns the population of each city in the customers table. GROUP BY clubs rows with the same column value into a single group.

ROW_NUMBER(): the secret sauce

Need spicier uniqueness? Window functions coupled with ROW_NUMBER() within a Common Table Expression (CTE) can save the day:

-- "Lord of the Rows - The Return of the Row_Number" WITH ranked_customers AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY signup_date DESC) AS rn FROM customers ) SELECT * FROM ranked_customers WHERE rn = 1;

This beast of a query tags customers based on their signup date, then selects those rocking the most recent unique emails.

Mastering SQL's dialect

Every SQL dialect has unique twists. The DISTINCT ON operation for PostgreSQL returns unique events based on location:

-- "The Perfect DISTINCTion" SELECT DISTINCT ON (location) location, event_date FROM events ORDER BY location, event_date DESC;

When in doubt, always validate your approach against the SQL syntax of the database management system (DBMS) you're using.

Addressing performance speed bumps

When handling large datasets, size does matter. Compare the efficiency of DISTINCT against GROUP BY or subqueries:

-- "Fast and DISTINCTious: SQL Drift" SELECT DISTINCT title FROM books;

Although a column with an index will generally make DISTINCT run faster, testing is key when deciding between using DISTINCT, GROUP BY, or subqueries.

Existing in the real world

In the real world, concrete use-cases guide your choice of columns in DISTINCT:

-- "When life gives you addresses, make DISTINCT addresses." SELECT DISTINCT addresses FROM customers;

In a scenario where different customers share the same address, choosing addresses indiscriminately with DISTINCT could bind them. Examine your data model with a fine-tooth comb.

Juggling multiple columns

When handling compound keys—combinations of columns that should be unique together—DISTINCT is still your friend:

-- "DISTINCTly Radiant" SELECT DISTINCT country, city FROM addresses;

This query gives you unique city records for each country.