Explain Codes LogoExplain Codes Logo

Group BY - do not group NULL

sql
null-handling
group-by
sql-queries
Alex KataevbyAlex Kataev·Aug 26, 2024
TLDR

To prevent NULL values from sticking together, you can convert each NULL into a unique value using COALESCE and a unique identifier such as NEWID() (for SQL Server) or UUID() (for MySQL). Let's look at this in practice:

SELECT COALESCE(YourColumn, CONCAT('Unique_', NEWID())), COUNT(*) FROM YourTable GROUP BY 1;

This is the magic wand moment where the COALESCE function replaces each NULL with a unique string by fusing 'Unique_' with a new UUID. Each NULL now has an identity to flaunt its uniqueness.

Handling more complex NULL situations

A simple COALESCE may fall short in dynamic scenarios. Here's what you do:

  • When you need a unique NULL replacement across sessions, consider using a session or context-specific identifier.
  • If UUIDs aren't your style or your database doesn't generate them, consider using a sequence number or row-specific data hash.
-- This guy sequence number claims to be as unique as a snowflake during a snowstorm. Let's see how it works! SELECT COALESCE(YourColumn, CONCAT('Unique_', NEXT VALUE FOR MySequence)), COUNT(*) FROM YourTable GROUP BY 1;

Caring for NULLs in multiple columns

When dealing with multiple columns that could be NULL:

SELECT COALESCE(Column1, CONCAT('UniqueC1_', NEWID())), COALESCE(Column2, CONCAT('UniqueC2_', NEWID())), COUNT(*) FROM YourTable GROUP BY 1, 2;

Let's get more technical

Utilizing GROUP_CONCAT for aggregation

Creating a string with the GROUP_CONCAT function can ensure no data gets left behind:

SELECT YourColumn, GROUP_CONCAT(OtherColumn ORDER BY OtherColumn) FROM YourTable GROUP BY YourColumn;

Working smarter with CASE statements

Using a CASE statement in the grouping criterion allows for greater customization:

SELECT CASE WHEN YourColumn IS NULL THEN CONCAT('Unique_', NEWID()) ELSE YourColumn END, COUNT(*) FROM YourTable GROUP BY 1;

Having WHERE clause as a filtering buddy

To include rows with NULL values but limit groups by another field, bring in the WHERE clause:

SELECT YourColumn, COUNT(*) FROM YourTable WHERE AnotherColumn IS NOT NULL GROUP BY YourColumn;

Identifiers must really be unique

Ensure your generated identifiers are unique. Collisions might make NULLs feel less special.