Explain Codes LogoExplain Codes Logo

How to select all records from one table that do not exist in another table?

sql
join
performance
best-practices
Anton ShumikhinbyAnton Shumikhin·Sep 29, 2024
TLDR
SELECT t1.* FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id WHERE t2.id IS NULL;

Quick and dirty LEFT JOIN gives us those sweet unique records from table1 not matched in table2. Prep your WHERE clause with t2.id IS NULL to catch only the finance department's (cough I mean unmatched) records. Now that's what I call a surgical extraction!

Alternative tactics

Ain't no one-size-fits-all, folks! Here are some other attacks you can deploy, completely tailored to your own battlefield conditions.

Plan B: NOT EXISTS

SELECT t1.* FROM table1 t1 WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t1.id = t2.id);

We've got a correlated subquery here, folks! Like a good covert operative, this bad boy ain't seen or heard, especially handy when dealing with big ol' datasets over 800k records.

Option C: NOT IN

SELECT t1.* FROM table1 t1 WHERE t1.id NOT IN (SELECT id FROM table2);

NOT IN gets straight to the point. Just be careful with NULL values, they're like the black sheep of SQL family reunions, they can turn your performance into a nightmare faster than drunk Uncle Bob spilling secrets!

Strategy D: EXCEPT

SELECT id FROM table1 EXCEPT SELECT id FROM table2;

You like elegant solutions? Here's one. Like a bouncer at a fancy club, EXCEPT operator separates the A-listers from the wannabes, returning only the unique identifiers from table1.

Select wisely

Ensure you're comparing proper keys or unique identifiers. Otherwise, you'll end up like me searching for my keys every morning. It's all about accuracy, people!

Test to perform better

Performance tuning isn't voodoo, folks, you need to test the different methods and understand their execution plans.

Here's some tips:

  • Decode the execution plan for potential bottlenecks.
  • Index keys or join columns to burn some SQL rubber!
  • Understand your data's cardinality to choose the best-fit exclusion strategy.
  • Use real-world datasets for benchmarking to avoid nasty surprises

Pondering over large datasets

Brick-like datasets require different handling:

  • Deleting in batches isn't a party trick. It's a survival technique for mammoth datasets.
  • Utilise indexes to drive up execution speed.
  • Consider using temporal tables. It's like mining for gold in a mountain of data!

Choices and compromises

Remember, NOT EXISTS, NOT IN, and LEFT JOIN/IS NULL are just tools in your toolbox. Use them wisely with discretion:

  • NOT EXISTS typically outperforms NOT IN when handling NULL values in the subquery result set.
  • LEFT JOIN/IS NULL may outmuscle NOT EXISTS depending on your SQL engine's optimizer.
  • Apply NOT IN with caution when there are nullable columns. Null values got a knack for ruining parties!