Explain Codes LogoExplain Codes Logo

In SQL, how can you "group by" in ranges?

sql
group-by
bucketing
sql-queries
Anton ShumikhinbyAnton Shumikhin·Feb 24, 2025
TLDR

Perform range grouping using SQL CASE, creating data buckets:

SELECT CASE WHEN total_sale < 100 THEN '0 - 99' WHEN total_sale < 200 THEN '100 - 199' WHEN total_sale < 300 THEN '200 - 299' ELSE '300+' END AS sale_range, COUNT(*) AS order_count FROM orders GROUP BY sale_range;

This approach categorizes sales into defined ranges and counts the orders falling in these ranges.

Basics of Bucketing

Relying on a CASE statement in SQL permits the creation of defined bucket boundaries and aggregated data. This way, scores can be grouped by 10-interval ranges:

SELECT FLOOR(score/10)*10 AS score_range_start, COUNT(*) AS total_counts FROM results GROUP BY score_range_start ORDER BY score_range_start; -- Coffee time! All these scores have found their bucket home!

Molding Ranges to Your Needs

Real-world data is often diverse, requiring dynamic range groupings. With the CASE statement, handle the varying ranges and custom intervals efficiently:

SELECT CASE WHEN age < 20 THEN 'Under 20' WHEN age BETWEEN 20 AND 29 THEN '20s' WHEN age BETWEEN 30 AND 39 THEN '30s' WHEN age BETWEEN 40 AND 49 THEN '40s' WHEN age >= 50 THEN '50+' END AS age_group, COUNT(*) AS count FROM users GROUP BY age_group; -- Finally, age is not just a number anymore. It's a group!

Up Your Game with Advanced Groupings

For more complex scenarios, subqueries or Common Table Expressions (CTEs) can preprocess your data for efficient grouping. Imagine the power of defining your groupings before the actual count:

WITH ScoreRanges AS ( SELECT id, CASE WHEN score < 50 THEN 'Failing' WHEN score < 70 THEN 'Passing' WHEN score >= 70 THEN 'Above Average' END AS ScoreGroup FROM StudentScores ) SELECT ScoreGroup, COUNT(id) AS NumberOfStudents FROM ScoreRanges GROUP BY ScoreGroup; -- CTEs be like: "A little prep won't hurt. Ready, Set, Group!"

Dealing with Dynamic Grouping

While static ranges are handy, sometimes you need dynamic bucketing. For instance, using NTILE distributes data into equal-sized buckets—a real lifesaver for percentile or quantile analyses:

SELECT NTILE(4) OVER (ORDER BY value) AS quartile, COUNT(*) AS count FROM measurements GROUP BY quartile; -- SQL-QA: "How many buckets you need, boss?" You: "Just NTILE it!"

Winning Over Outliers

Outliers, our uninvited guests! When dealing with outliers or skewed data, don't rely on static ranges. Group using a statistical measure, like standard deviations:

SELECT CASE WHEN value < (AVG(value) - STDDEV(value)) THEN 'Below Average' WHEN value BETWEEN (AVG(value) - STDDEV(value)) AND (AVG(value) + STDDEV(value)) THEN 'Average' ELSE 'Above Average' END AS ValueRange, COUNT(*) AS Frequency FROM data_points GROUP BY ValueRange; -- Outliers walking in... SQL: "Hold my query, watch this!"