Explain Codes LogoExplain Codes Logo

Distinct ON in an aggregate function in postgres

sql
prompt-engineering
join
lateral-subqueries
Alex KataevbyAlex Kataev·Sep 26, 2024
TLDR

Implement DISTINCT ON in a subquery or Common Table Expression (CTE):

WITH DistinctRows AS ( SELECT DISTINCT ON (column_A) * FROM your_table ORDER BY column_A, column_B ) SELECT SUM(column_C) FROM DistinctRows;

This WITH statement grabs only distinct "column_A" values in "your_table". The SUM(column_C) will then add up all the distinct entries.

Dealing with multiple rows (AKA herding cats)

Ever tried to take a selfie with more than one cat? Same problem when dealing with multiple rows. We need to make every row sit nice and unique, but still group together. For such "herding cats" tasks, use LATERAL subqueries. Here's how you'd do it:

SELECT p.photo_id, COALESCE(tag_agg.tags, '[]') as tags, COALESCE(comment_agg.comments, '[]') as comments FROM photos p LEFT JOIN LATERAL ( /* Make the tags sit nice, right here */ SELECT JSON_AGG(DISTINCT t.name) as tags FROM tags t WHERE p.photo_id = t.photo_id ) tag_agg ON TRUE LEFT JOIN LATERAL ( /* Make the comments join the fun */ SELECT JSON_AGG(DISTINCT c.text) as comments FROM comments c WHERE p.photo_id = c.photo_id ) comment_agg ON TRUE;

The LATERAL keyword gives you the ability to make subqueries reference the main query, a tool as helpful as catnip in a selfie session!

One marble per bucket (Visualisation not for cats)

Let's visualize columns as buckets and rows as marbles:

Buckets (⚙️): [Brand, Year, Model] Marbles (🔵): [Different Car Instances]

We're after only ONE marble per Brand (the Newest Model). We can use DISTINCT ON with MAX(Model):

SELECT DISTINCT ON (Brand) Brand, Year, MAX(Model) FROM Cars GROUP BY Brand, Year;

Outcome:

🏆 Brand Bucket: [...Newest Model per Brand]

The DISTINCT ON clause nabs the first row for each group based off the specified ordering (MAX(Model) in our case). It's similar to picking the prettiest marble from each bucket. (And no, cat, you can't play with the marbles).

Refined JSON responses (Hold the cat hair)

Some cats like their dishes served in a particular way (just ask Garfield!). json_build_object lets you custom-create JSON objects with a fine touch:

SELECT photo_id, COALESCE(json_agg(DISTINCT tag.tag_name), '[]') AS tags, COALESCE( (SELECT json_agg(DISTINCT comment) FROM ( SELECT comment_id, text FROM comments WHERE photo_id = p.photo_id ) comment ), '[]' ) AS comments FROM photos p LEFT JOIN tags t USING (photo_id) GROUP BY photo_id;

This serves up a custom JSON object that would make even a finicky cat purr. Remember: COALESCE keeps things neat and tidy by swapping null arrays with empty arrays.

Troubleshooting like a boss-cat: Avoid common hairballs

  • Avoid duplicates: Ensure all distinct fields are in your GROUP BY clause.
  • Code grooming: Use CTEs for better readability.
  • Performance tuning: Balance complexity with indexing and execution plan considerations.
  • Null avoidance: Treat nulls with COALESCE to handle nulls in JSON arrays.