Explain Codes LogoExplain Codes Logo

Remove duplicates from SQL union

sql
sql-union
sql-best-practices
sql-performance
Alex KataevbyAlex Kataev·Aug 28, 2024
TLDR

For quick duplicate extermination, use UNION instead of UNION ALL when combining datasets:

SELECT col FROM table1 UNION SELECT col FROM table2;

The above code pieces together table1 and table2, producing only unique rows. Unlike, UNION ALL, which, like welcoming vampires into your house, happily accepts duplicates. UNION operates like garlic to your vampire-duplicates!

UNION and UNION ALL, the twin warriors

Remember the twin warriors; one loves everybody (UNION ALL) even clones, another loves uniqueness (UNION). The latter is like an oracle that can invisibly run SELECT DISTINCT on your result set to deliver unique records.

After planning your party (merging datasets), if you find surprise guests (duplicates), you want the twin who loves uniqueness. He spots those duplicates faster than an eagle spotting a field mouse! But don’t forget, operations like JOIN and conditions in WHERE clauses can also invite these party crashers.

When UNION ALL crashes your party

Some debugging scenarios to consider:

  • Double trouble with JOINs: Make sure your JOIN keys are correct and you're not accidentally inviting twins (duplicates). Some guests have the same name (join keys) but not the same identity!
  • WHERE’s the filter: If you have overlapping conditions in WHERE clauses across the UNIONed queries, remember, you might invite the SAME guest twice! Use mutually exclusive WHERE clauses for a duplicates-free environment.
  • Subquery chaos: Use clear aliasing to ensure you're combining distinct rows correctly. It's like making a proper guest list.

The art of query grooming

Making queries is like grooming bonsai; trim excess, bring shape, and maintain beauty. If your UNION queries are knotted like messy hair, it’s time for a cut. Simplify to promote efficiency and improvement in readability. Ask yourself, is there an OR condition that could replace a UNION?

As your query stands tall, handsome, and readable, it runs faster, saving precious time for a cup of coffee.

Being exact with UNION

Beware! When using UNION, it checks for exact duplicate rows, not just a single column. It’s like a bouncer who checks entire dresses, not just ties. A tiny difference will mark it distinct, even null values (SQL’s way of saying "I don’t know"). Because for SQL, NULL is not equal to NULL!

Know Your SQL

Each SQL environment schooling has its own perks. Some have specific UNION shortcuts and quirks. Refer to the UNION manual of your SQL school: