Explain Codes LogoExplain Codes Logo

Select a column in SQL not in Group By

sql
join
subqueries
ctes
Alex KataevbyAlex Kataev·Sep 13, 2024
TLDR

Utilize SQL aggregate functions like MAX() or MIN(). Use them to include columns not in GROUP BY, obtaining one value from multiple grouped rows. In case of specific values tied to grouped records, use a correlated subquery. These two are keys when you need a column not specified in GROUP BY.

Snippet with aggregate function, easy peasy:

--We all have that one favorite customer, isn't it? SELECT customer_id, MAX(last_order_date) FROM orders GROUP BY customer_id;

Snippet with correlated subquery:

--Peek-a-boo: I see all your orders, dear customer! SELECT customer_id, (SELECT MAX(order_date) FROM orders o2 WHERE o2.customer_id = o1.customer_id) FROM orders o1 GROUP BY customer_id;

Journey deeper into complexity

CTEs (Common Table Expressions) and window functions like ROW_NUMBER() or RANK() can handle complex data or row-level details with grouped data.

CTE and window functions: Heads up!

--So, who's the ONE? Or should we say RANK = 1? WITH RankedOrders AS ( SELECT order_id, customer_id, order_date, RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rank FROM orders ) SELECT order_id, customer_id, order_date FROM RankedOrders WHERE rank = 1;

Avoiding pitfalls

Make sure that all columns in the SELECT statement are in an aggregate function or the GROUP BY clause. Don't you forget about those duplicate rows when the table is not unique. Your WHERE clause needs its coffee: wake it up with exact filter conditions when you are narrowing down the results.

Bridging the gaps: the join strategy

To pull in columns not included in the GROUP BY, join the results with the original dataset. Maintain the row-level details along with the grouped data as per your requirements.

Join on, folks!

--Order in orders, please! SELECT o1.customer_id, o1.order_id, o1.order_details, o.max_date FROM orders o1 JOIN (SELECT customer_id, MAX(order_date) AS max_date FROM orders GROUP BY customer_id) o ON o1.customer_id = o.customer_id AND o1.order_date = o.max_date;

Dodge bullets: duplicates ahead!

When the max date and group ID combination aren't unique, duplicates may creep in. To avoid a thrilling "duplicate chase", wrap additional columns in aggregate functions, or use them as extra criteria in the JOIN condition.

Subqueries: your secret weapon

When you need to isolate the grouped data, a subquery or CTE can be your saving grace. Join it with the main table to fetch non-grouped columns.

--Grouped but not forgotten! SELECT o.order_id, o.customer_id, o.order_details, a.max_order_date FROM orders o JOIN (SELECT customer_id, MAX(order_date) AS max_order_date FROM orders GROUP BY customer_id) a ON o.customer_id = a.customer_id;