Select rows which are not present in other table

sql

left-join

query-optimization

not-in

byAnton Shumikhin·Aug 17, 2024

To fetch rows from table1 absent in table2 we use NOT EXISTS. By deploying a subquery in table2, we can optimally compare our entries from table1:

SELECT *
FROM table1 t1
WHERE NOT EXISTS (
    SELECT 1
    FROM table2
    WHERE table2.id = t1.id
);

This sprucely omits table1 records that lack a match in table2 based on id.

Expanded Inquiry: Alternatives and Common Pitfalls

The `LEFT JOIN / IS NULL` Trick

The LEFT JOIN / IS NULL technique functions well with SQL engines that can optimize joins effectively:

/* Believe in yourself! You're the SQL whisperer */
SELECT t1.*
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.id IS NULL;

`EXCEPT` and `ALL`: A Powerful Duo

If distinct rows are your goal, EXCEPT can provide a remarkable shortcut:

/* SQL: The language spoken by nerds to their invisible friends (databases) */
SELECT id 
FROM table1
EXCEPT
SELECT id 
FROM table2;

Remember, EXCEPT returns distinct rows by default. Implement ALL for an unduplicated comparison.

`NOT IN` - Handle with Care

Although NOT IN can be an easy choice, it has its drawbacks. It can be torpedoed by nulls in table2.id, returning no matches, and can also scale poorly with bigger databases:

/* SQL: Helping you solve problems you'd never have without it */
SELECT *
FROM table1
WHERE id NOT IN (SELECT id FROM table2);

The Devil’s in the Details: Query Optimization

Finding the Common Ground

When sifting through data across tables, ensure you’re comparing apples to apples—compare columns with common meanings in both tables.

Power-Packing the WHERE Clause

A thoughtfully crafted WHERE clause holds the key to efficient results. Explicit conditions, help you zoom into your target data subset, enhancing the interpreter's ability to optimize the query.

Scoping your SELECT Statement

Data relevancy is the golden rule for SELECTing columns. Narrowing down to only the columns needed for your inspection helps reduce the load on your server, yielding faster results.

Practical Extensions

For instance, if we need to find the IP addresses from a login_log table not captured in an ip_location table:

/* Because friends don't let friends hard code IP addresses */
SELECT ip
FROM login_log
WHERE NOT EXISTS (
    SELECT 1
    FROM ip_location
    WHERE ip_location.ip = login_log.ip
);

This aids in tracking unregistered IPs which might be set for a security audit or data cleanup.

Logic and Debugging

Syntax errors and inefficiencies may creep in unnoticed. Using built-in tools of your RDMS like EXPLAIN plans can aid in flagging potential performance issues.

Extra Conditions for Extra Precision

Occasionally, appending additional filter conditions can hone in on the most relevant data:

/* Query is the new sexy... */
SELECT ip
FROM login_log
WHERE NOT EXISTS (
   SELECT 1
   FROM ip_location
   WHERE ip_location.ip = login_log.ip
)
AND login_log.date >= '2023-01-01';