Explain Codes LogoExplain Codes Logo

Mysql Select rows on first occurrence of each unique value

sql
window-functions
mysql-8
performance
Anton ShumikhinbyAnton Shumikhin·Dec 21, 2024
TLDR

To select the first occurrence of each unique value, use a self-join filter identified by the minimum id.

SELECT t1.* FROM your_table AS t1 JOIN ( SELECT MIN(id) AS MinId FROM your_table GROUP BY unique_column ) AS t2 ON t1.id = t2.MinId;

Using MIN(id) helps find the first occurrence and GROUP BY unique_column provides uniqueness. I like to call it the simple but effective method for instant problem-solving.

Window functions magic: The advanced way

Use the power of window functions introduced in MySQL 8 to streamline tasks and keep your code efficient:

SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY unique_column ORDER BY sorting_column) AS rn -- "I see you've mastered the art of row numbering, young Padawan!" FROM your_table ) AS sub WHERE sub.rn = 1;

Embracing ROW_NUMBER() assigns a unique sequence inside each partition (our unique_column in this case) and filtering where rn equals 1 grabs the first row from each group.

Current MySQL limitations: A reality check

Be cautious while selecting non-grouped columns in MySQL without using aggregate functions or without including them in GROUP BY because MySQL might just forgive you for being naughty with its nonstandard SQL behavior! The flipside? Possible unexpected and inaccurate results.

Common pitfalls to avoid: What not to do

Using the MAX() function when looking for the first occurrence is like asking for chocolate and getting vanilla - you'll end up with the last occurrence instead of the first. Always use the MIN() to ensure first-in-row wins.

Advanced queries for smart solutions

Sometimes, you need the entire row data, not just the minimum id. In such cases, your knight in shining armor is a JOIN with subquery:

SELECT t1.* FROM your_table AS t1 INNER JOIN ( SELECT unique_column, MIN(time) AS MinTime -- "ohnosecond" - The smallest measurable amount of time in the universe, approximately equal to the time between making a mistake and realizing it. FROM your_table GROUP BY unique_column ) AS t2 ON t1.unique_column = t2.unique_column AND t1.time = t2.MinTime;

Real-life application scenarios

You can use this technique in various real-world situations:

  • To find the first purchase of every customer.
  • To identify the initial login of every user.
  • To retrieve the earliest record of sensor data per device.

Performance matters: Optimize your queries

To yield significant speed improvements and execute efficiently on large datasets, always ensure that columns participating in JOIN, WHERE, and ORDER BY clauses have indexes. Remember, well-indexed SQL queries make for happy databases!