Explain Codes LogoExplain Codes Logo

Sql query to find record with ID not in another table

sql
join
performance
best-practices
Nikita BarsukovbyNikita Barsukov·Oct 3, 2024
TLDR

Identify records in Table1 but absent in Table2 using the LEFT JOIN method or the NOT EXISTS query.

LEFT JOIN (often cherished for its readability):

-- I'm opposite of a stage magician. I love when things disappear. So let's vanish those IDs found in the second table. SELECT t1.* FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.ID = t2.ID WHERE t2.ID IS NULL;

NOT EXISTS (a precise subquery way):

-- Checking for non-existent things. Sometimes feels like looking for my motivation to exercise. SELECT * FROM Table1 t1 WHERE NOT EXISTS ( SELECT 1 FROM Table2 t2 WHERE t2.ID = t1.ID );

Tailor your choice based on your coding style and performance demands; both methods yield identical results—a catalog of Table1 IDs missing in Table2.

Dive into alternatives and their idiosyncrasies

When dealing with substantial datasets or complex requests, understanding the nuances of different approaches can be a tremendous resource saver.

NOT IN is another prevalent method, but it has its eccentricities:

-- List of all my friends who are astronauts, celebrities, and lottery winners. SELECT ID, Name FROM Table1 WHERE ID NOT IN (SELECT ID FROM Table2);

Yet, NOT IN can misbehave if the subquery returns NULL values, potentially turning your result set into a ghost town.

Consider EXCEPT keyword:

-- An EXCEPTionally good way to find out how much fun others are having without me. SELECT ID FROM Table1 EXCEPT SELECT ID FROM Table2;

The EXCEPT clause pits the results of two queries against each other and brings back distinct values from the first query that weren't output by the second. However, it's wise to check its compatibility with your RDBMS and compare its performance traits with JOIN operations.

Gear up for better performance

Keep an eagle eye on execution plans for different methods to whip your query performance into shape. RDBMS-specific attributes can often come to the rescue providing significant speed boosts. For PostgreSQL users, NOT EXISTS can often sprint faster than NOT IN, especially when the subquery result could include NULLs or the datasets are chunky. Replacing EXCEPT with a join operation can sometimes get you that much-needed performance gain.

-- I like my queries how I like my race cars: Fast and efficient! SELECT * FROM Table1 t1 WHERE NOT EXISTS ( SELECT TOP 1 NULL FROM Table2 t2 WHERE t2.ID = t1.ID );

The SELECT TOP 1 NULL in the subquery often speeds up the NOT EXISTS check by signaling the query optimizer to hit the brakes as soon as it finds a match.

Practical use cases

By exploring practical scenarios where these SQL patterns come in handy, we can grasp their versatility:

Hunting down elusive customers

In a customer-orders dataset, find customers who haven't placed an order:

-- Operation 'Find Ghost Customers' – Initiated! SELECT CustomerID, CustomerName FROM Customers c WHERE NOT EXISTS ( SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID );

Syncing across systems

During data syncing, like that between a CRM and a mailing list, find records in the CRM that haven't made it to the mailing list yet:

-- Fasten your seatbelts! We're syncing. SELECT CRM_ID FROM CRM_Contacts LEFT JOIN MailingList ON CRM_Contacts.CRM_ID = MailingList.ContactID WHERE MailingList.ContactID IS NULL;

Cleaning up orphan records

Maintain database integrity by identifying orphaned records without a corresponding foreign key reference:

-- Orphanage for Lost Records SELECT * FROM ChildTable WHERE NOT EXISTS ( SELECT 1 FROM ParentTable WHERE ParentTable.ID = ChildTable.ParentID );

Differential queries are sensitive to NULLs and data distribution—factors you must navigate with care:

NULL handling

NULL values can play spoilsport making NOT IN churn out unexpected outcomes. The NOT EXISTS and LEFT JOIN...IS NULL techniques sidestep this issue, making them mission-critical choices.

Data distribution

The size and spread of data can dramatically dictate the query's performance—keeping an eye on indexes, cardinality, and possible duplicate IDs empowers you to make informed choices.