Distinct ON in an aggregate function in postgres

sql

prompt-engineering

join

lateral-subqueries

byAlex Kataev·Sep 26, 2024

Implement DISTINCT ON in a subquery or Common Table Expression (CTE):

WITH DistinctRows AS (
  SELECT DISTINCT ON (column_A) *
  FROM your_table
  ORDER BY column_A, column_B
)
SELECT SUM(column_C) FROM DistinctRows;

This WITH statement grabs only distinct "column_A" values in "your_table". The SUM(column_C) will then add up all the distinct entries.

Dealing with multiple rows (AKA herding cats)

Ever tried to take a selfie with more than one cat? Same problem when dealing with multiple rows. We need to make every row sit nice and unique, but still group together. For such "herding cats" tasks, use LATERAL subqueries. Here's how you'd do it:

SELECT p.photo_id, COALESCE(tag_agg.tags, '[]') as tags, COALESCE(comment_agg.comments, '[]') as comments
FROM photos p
LEFT JOIN LATERAL (  /* Make the tags sit nice, right here */
  SELECT JSON_AGG(DISTINCT t.name) as tags
  FROM tags t
  WHERE p.photo_id = t.photo_id
) tag_agg ON TRUE
LEFT JOIN LATERAL ( /* Make the comments join the fun */
  SELECT JSON_AGG(DISTINCT c.text) as comments
  FROM comments c
  WHERE p.photo_id = c.photo_id
) comment_agg ON TRUE;

The LATERAL keyword gives you the ability to make subqueries reference the main query, a tool as helpful as catnip in a selfie session!

One marble per bucket (Visualisation not for cats)

Let's visualize columns as buckets and rows as marbles:

Buckets (⚙️): [Brand, Year, Model] Marbles (🔵): [Different Car Instances]

We're after only ONE marble per Brand (the Newest Model). We can use DISTINCT ON with MAX(Model):

SELECT DISTINCT ON (Brand) Brand, Year, MAX(Model)
FROM Cars
GROUP BY Brand, Year;

Outcome:

🏆 Brand Bucket: [...Newest Model per Brand]

The DISTINCT ON clause nabs the first row for each group based off the specified ordering (MAX(Model) in our case). It's similar to picking the prettiest marble from each bucket. (And no, cat, you can't play with the marbles).

Refined JSON responses (Hold the cat hair)

Some cats like their dishes served in a particular way (just ask Garfield!). json_build_object lets you custom-create JSON objects with a fine touch:

SELECT photo_id, 
       COALESCE(json_agg(DISTINCT tag.tag_name), '[]') AS tags,
       COALESCE(
         (SELECT json_agg(DISTINCT comment)
          FROM (
            SELECT comment_id, text
            FROM comments WHERE photo_id = p.photo_id
          ) comment
         ), '[]'
       ) AS comments
FROM photos p
LEFT JOIN tags t USING (photo_id)
GROUP BY photo_id;

This serves up a custom JSON object that would make even a finicky cat purr. Remember: COALESCE keeps things neat and tidy by swapping null arrays with empty arrays.

Troubleshooting like a boss-cat: Avoid common hairballs

Avoid duplicates: Ensure all distinct fields are in your GROUP BY clause.
Code grooming: Use CTEs for better readability.
Performance tuning: Balance complexity with indexing and execution plan considerations.
Null avoidance: Treat nulls with COALESCE to handle nulls in JSON arrays.