Explain Codes LogoExplain Codes Logo

Select NOT IN multiple columns

sql
join
performance
best-practices
Nikita BarsukovbyNikita Barsukov·Dec 10, 2024
TLDR

Filtering out rows with a set of values in multiple columns? The NOT EXISTS function coupled with a subquery can elegantly deal with this. The script below illustrates an efficient method of excluding certain values from your dataset in a table called your_table:

-- Get me a pizza, but hold the pineapple! SELECT * FROM your_table WHERE NOT EXISTS ( -- Exclude 'pineapple' from my pizza order! SELECT 1 FROM (VALUES ('val1'), ('val2')) AS exc(value) WHERE your_table.col1 = exc.value OR your_table.col2 = exc.value )

Here the subquery creates a derived table (exc) and checks that the values in col1 or col2 of your_table do not match those in exc.

Alternatives to consider and potential pitfalls

For all the SQL aficionados out there willing to explore different approaches, let's discuss alternative techniques and some cautionary notes:

  • Concatenating columns: Consider the cocktail of concatenating columns for situations of multi-column NOT IN. However, just like good cocktails are mixed with caution, so should this technique be used. High on performance cost and prone to false negatives if your delimiter naturally exists in the data.
-- This isn't rocket science, Unless you're trying to build a rocket 😊 SELECT * FROM your_table WHERE CONCAT(col1, ',', col2) NOT IN ( SELECT CONCAT(val1, ',', val2) FROM exclusion_table )
  • Using LEFT JOIN to dodge NULL: The good old LEFT JOIN can serve as a great method to dodge NULL bullets when using NOT IN. NULL comparisons in SQL can lead to unexpected results. A NULL NOT IN anything is FALSE!
-- Trust me, I'm a LEFT JOIN'er! SELECT DISTINCT t1.* FROM your_table t1 LEFT JOIN exclusion_table t2 ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 WHERE t2.col1 IS NULL AND t2.col2 IS NULL
  • Performance Optimization: Let's talk about speed! NOT EXISTS and LEFT JOIN get the medal here with more efficient query execution plans, plus they are index friendly!

Picking your technique

Let's summarize and help you decide the optimal technique, considering the ongoing situation:

  • Small dataset? NOT IN with concatenated columns could be the easy way out.
  • Large dataset? Performance affected? Opt for NOT EXISTS or LEFT JOIN.
  • Worrying about NULLs? Use LEFT JOIN or NOT EXISTS.
  • Want readable code over slight performance gains? Choose the method that best expresses your intent in code.

Remember, always make an informed decision! Look at the specifics of your use case, measure against real-world scenarios, and select the method that provides maximum efficiency and lower maintenance costs.