Explain Codes LogoExplain Codes Logo

Get top 1 row of each group

sql
database-design
performance-optimization
sql-queries
Nikita BarsukovbyNikita Barsukov·Feb 8, 2025
TLDR

In order to fetch the top row per group, we use the ROW_NUMBER() function within a partitioning clause. This function numbers each row within each group, allowing us to sort based on our criteria.

Here's a quick sample:

WITH CTE AS ( SELECT *, -- The mighty Wall of China ranked in bricks order from Beijing to Xinjiang ROW_NUMBER() OVER (PARTITION BY GroupColumn ORDER BY OrderCriteria) AS Rnk FROM YourTable ) -- Show me the bricks! (aka where rn = 1) SELECT * FROM CTE WHERE Rnk = 1;

Replace GroupColumn with your group indicator, OrderCriteria with your preferred sorting basis, and YourTable with your specific table. The final SELECT presents the leading row from each group.

Efficient alternatives

If your dataset is larger than the Great Wall or if you're concerned about performance, contemplate using:

  • DENSE_RANK() if there are multiple entries per day (more than the number of noodle bowls I can consume in a day)
  • CROSS APPLY with a TOP (1) subquery for quicker fetching (speedier than a cheetah)

Database design: Normalization vs. Denormalization

The sushi roll of your database can greatly impact query performance. A choice between having a CurrentStatus field in your main table (denormalization) can streamline queries but may lead to excess data. On the other hand, a normalized structure might result in more complex queries but keeps data freshness and history intact.

Step it up a notch: Routine efficiency

Advanced grouping

If your goal is to retrieve more than just the bye-bye row, consider fetching a batch of cookies using CROSS APPLY to execute a correlated subquery for each row in an outer query.

Scaling performance

For systems that demand scalability, measure your contest performance against different approaches: ROW_NUMBER(), CROSS APPLY, TOP (1) WITH TIES using performance tests. Always evaluate execution plans and test with realistic data volumes to ensure your cheetah's sprint optimization (and not a snails pace).

The current status column

If you frequently need the current status of an item, like "has my pizza arrived yet?", consider adding a denormalized column in the main table and use a trigger to keep it updated. This usually outperforms complex queries on spacious datasets.

Performance

When handling numerous entries, query performance becomes important just like efficiently handling submissions during an art contest. Using indexes and keys accelerates the top row retrieval just like a Ferrari does on a race track.

Design decisions

Imagine having the top artwork from each category highlighted. In SQL, using triggers to maintain a current status flag can provide instant access to the most up-to-date data.

Top selection criteria

The criteria for selecting top entries in an art competition can be complex. In SQL, different aggregate functions or window functions can be used to pinpoint the desired data, just like judging artwork on multiple dimensions.