Explain Codes LogoExplain Codes Logo

Using union and count(*) together in an SQL query

sql
join
performance
best-practices
Anton ShumikhinbyAnton Shumikhin·Oct 9, 2024
TLDR

To aggregate data from different datasets or apply different conditions, use UNION ALL with COUNT(*) in subqueries, and sum the results in a top-level query. A sample SQL script is as follows:

SELECT SUM(sub.count) AS total_count FROM ( SELECT COUNT(*) AS count FROM table1 WHERE condition1 UNION ALL SELECT COUNT(*) AS count FROM table2 WHERE condition2 ) sub

Note: UNION ALL is used to include all output records, even duplications. Using UNION only can result in decreased aggregates due to its implicit deduplication.

Mastering the basics

Your fast-track guide to understanding and using UNION and UNION ALL. While UNION returns the unique records, UNION ALL preserves all the records, including duplicates. It's really a point to understand when you are after maintaining the entire count from multiple tables or queries.

Across multiple tables

Joining counts from several tables poses a common challenge. You can use the following SQL statement, which properly orders and keeps the GROUP BY function intact:

SELECT name, SUM(count) FROM ( SELECT name, COUNT(*) AS count FROM table1 GROUP BY name UNION ALL SELECT name, COUNT(*) AS count FROM table2 GROUP BY name ) AS combined GROUP BY name ORDER BY name;

Need specific counts?

You may need to tally distinct values across several unions. Dive in with nested subqueries to solve this issue:

SELECT DISTINCT(sub.item), COUNT(*) FROM ( SELECT DISTINCT item FROM table1 UNION ALL SELECT DISTINCT item FROM table2 ) sub GROUP BY sub.item

Common pitfalls and remedies

Compatibility across databases is essential. Did you know Oracle Database doesn't allow AS for table or column aliasing with UNION statements? So, be friendly and welcome all databases with open arms by omitting AS with UNION:

-- Universally compatible, much like the author's sense of humor. SELECT column_name FROM table1 UNION ALL SELECT column_name FROM table2

Hint: If you're wrangling with multiple columns and functions, always ensure consistency in data types and the count of columns among all UNION parts.

Optimization essentials

With larger data sets, indexing important columns is ideal to minimize query execution time. Keep your transactions low by considering batch processing for large tables—less locking and fewer overflows.

Simplifying complex scenes

Sometimes combining counts and performing additional computations become essential. This is where derived tables and CASE statements can be powerful:

SELECT CASE WHEN sub.category = 'A' THEN 'Primary' WHEN sub.category = 'B' THEN 'Secondary' ELSE 'Other' END AS CategoryType, SUM(sub.count) AS TotalCount FROM ( SELECT 'A' as category, COUNT(*) as count FROM table1 WHERE condition_A UNION ALL SELECT 'B', COUNT(*) FROM table2 WHERE condition_B ) sub GROUP BY sub.category;

This brings complicated scenarios under control with elegance and humor.