Explain Codes LogoExplain Codes Logo

How to select only the first rows for each unique value of a column?

sql
prompt-engineering
join
best-practices
Nikita BarsukovbyNikita BarsukovยทJan 3, 2025
โšกTLDR

Quickly fetch the first row per unique column value by leveraging the ROW_NUMBER() function together with PARTITION BY. This technique assigns a row number within each unique column value, enabling the easy identification of the first row.

SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY unique_column ORDER BY order_criteria) AS rn FROM your_table ) WHERE rn = 1;

Just replace unique_column with your specific column and order_criteria with the criteria for determining the "first" row. Piece of cake, right? ๐Ÿฐ

Step-it-up: Advanced applications and alternatives

Precise ordering: Multilevel sorting

You can add multiple sorting criteria in your ORDER BY clause. Excellent for those situations when the "first" row depends on more than one column:

ORDER BY primary_criteria, secondary_criteria DESC

This lets your SQL statement say "Ok, now we're getting serious!"

Old school: When window functions are not your friend

Dealing with an older database or SQL version without ROW_NUMBER()? No problem, let's go retro with GROUP BY and MIN():

SELECT your_table.* FROM your_table JOIN ( SELECT unique_column, MIN(order_criteria) AS first_criteria FROM your_table GROUP BY unique_column ) AS subquery ON your_table.unique_column = subquery.unique_column AND your_table.order_criteria = subquery.first_criteria;

Who said old-school isn't cool? ๐Ÿ˜Ž

Time travelers: Selecting rows based on insertion time

"First" row might mean the earliest inserted row. If that's your jam, use an inserted_time column, if available:

ORDER BY inserted_time ASC

Your SQL statement is now a time machine! ๐Ÿ•ฐ

Speedy Gonzales: Considerations for performance

Remember that window functions on large datasets can pressure the SQL hamsters ๐Ÿน running your server. Improve their lives by indexing the columns used in PARTITION BY and ORDER BY.

What if...?: Situations and scenarios

PostgreSQL and DISTINCT ON

If you're using PostgreSQL, you might like DISTINCT ON. It's like SQL's version of a clean shave:

SELECT DISTINCT ON (unique_column) * FROM your_table ORDER BY unique_column, order_criteria;

Data integrity: Fighting duplicates

Guard your table against sneakily-inserted duplicates that might confuse our "first row" logic with constraints or indexes.

JOINing the party: Handling complex conditions

Need to consider another table as part of your conditions? Fear not, SQL's got you covered with JOINS:

WITH RankedData AS ( SELECT t1.*, t2.some_column, ROW_NUMBER() OVER (PARTITION BY t1.unique_column ORDER BY t1.order_criteria) AS rn FROM your_table t1 JOIN other_table t2 ON t1.foreign_key = t2.primary_key ) SELECT * FROM RankedData WHERE rn = 1;

It's like getting a VIP pass backstage! ๐ŸŽ‰

Bells and whistles: Making your queries shine

A clean house: Selecting specific columns

Instead of *, select only the required columns. Your query runs faster and wins beauty contests. A real SQL superstar! ๐ŸŒŸ

Pitfall prevention: Avoiding common traps

Beware of the hidden sinkholes with GROUP BY - improper use might result in mixing oranges and apples. Not a tasty SQL salad! ๐Ÿฝ

Extra spice: Combining aggregate functions with partitioning

Give your queries extra kick by combining aggregate functions with PARTITION BY:

SUM(column_name) OVER (PARTITION BY unique_column ORDER BY order_criteria)

This method lets you say to your data "Who's your aggregate function?"