Explain Codes LogoExplain Codes Logo

Distinct for only one column

sql
prompt-engineering
best-practices
join
Anton ShumikhinbyAnton Shumikhin·Jan 14, 2025
TLDR

To get unique values in one column and still display others, use the power of SQL's window functions:

SELECT DISTINCT ON (col1) col1, col2, col3 FROM your_table ORDER BY col1, col2;

This will make col1 return distinct entries, while col2, col3 follow suit, ordered by col2.

Practical uses of window functions

To fetch distinct values from a specific column while including additional columns in your result set, window functions are your friend. Specifically, the function ROW_NUMBER() when used alongside PARTITION BY churns out a perfect numbering system for rows sharing identical values in the partitioned column.

Handling unique email entries

Let's say a users table left us with multiple entries for each user, distinguished only by diverse timestamps. Need the most recent entry for each email? No problem, check this out:

/* Let's do something magical: number the emails. Abracadabra! */ WITH RankedEmails AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY timestamp DESC) as rn FROM Users ) SELECT id, email, timestamp FROM RankedEmails WHERE rn = 1; /* Just like in Highlander: in the end, there can be only one! */

Filter first, ask questions later

There may be scenarios where you want to filter rows before applying window functions, keeping things manageable:

/* A conjuring trick: separate the active users before numbering emails. Poof! */ WITH FilteredUsers AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY timestamp DESC) as rn FROM Users WHERE isActive = 1 /* Vanish inactive users. Now you see them, now you don't! */ ) SELECT id, email, timestamp FROM FilteredUsers WHERE rn = 1; /* Just like in Survivor: outwit, outplay, outlast. */

The joy of readability with CTEs

Common Table Expressions (CTEs) are the magic wand that deciphers complex queries, enhancing your spell readability.

/* Behold, a spell to produce the most recent user data. */ WITH EmailsWithRank AS ( SELECT email, MAX(timestamp) as MaxTimestamp FROM Users GROUP BY email ), LatestUserData AS ( SELECT U.* FROM Users U INNER JOIN EmailsWithRank E ON U.email = E.email AND U.timestamp = E.MaxTimestamp ) SELECT * FROM LatestUserData; /* Abracadabra, and just like magic, it all makes sense now. */

GROUP BY technique with accompanying acts

Not fond of window functions? Fear not. GROUP BY and aggregate functions will come to your rescue:

SELECT email, MAX(timestamp) AS LatestTimestamp FROM Users GROUP BY email; /* Let's group emails like a herd of sheep. Baa baa black sheep! */

Remember though, GROUP BY not only affects all columns but also requires pairing with additional aggregates to maintain the sanctity of related data.

Faster queries with magical indexes

By applying indexes on columns involved in filtering and grouping, your queries might just fire off faster than a speeding bullet. This might mean implementing indexes on email and timestamp columns for your GROUP BY or window function quests.

Post-query considerations

DECIPHERING the "last" row factor

When attempting to select the "last" row per partition, be wary of your "last" logic. If it's based on a timestamp, ensure proper sorting. If you depend on identity numbers, ensure higher numbers indeed indicate later entries.

DISTINCT navigation with ORDER BY

Avoid using ORDER BY devoid of limits such as TOP 1 right after using DISTINCT, as DISTINCT treats the whole row, not just single columns, and might not behave well with unordered datasets.

Lemniscatic impact of DISTINCT

The DISTINCT charm has a lemniscatic effect on the result set. So, if you must fetch specific column values, using DISTINCT alone is insufficient. Instead, choose from the spectrum of targeted techniques illustrated earlier.