Left join without duplicate rows from the left table

sql

left-join

outer-apply

distinct

byAlex Kataev·Aug 15, 2024

Avoiding duplicates in a LEFT JOIN is a breeze with this quick solution. We'll employ a subquery with the window function, ROW_NUMBER(). By utilizing PARTITION BY on the column causing duplicates and filtering via a WHERE clause for the top-ranked row, we escape the realm of repetition.

SELECT clean.*
FROM (
    SELECT t1.*, ROW_NUMBER() OVER (PARTITION BY t1.duplicate_field ORDER BY t1.preferred_order) as row_num
    FROM left_table t1
    LEFT JOIN right_table t2 ON t1.id = t2.foreign_id
) AS clean
WHERE clean.row_num = 1;

duplicate_field is our repeat offender column and preferred_order is our arbitrator deciding which duplicate has the spotlight.

Utilizing `OUTER APPLY` to prevent duplicates

SQL Server provides OUTER APPLY, running a subquery for each row of the left table. Coupling it with TOP 1 ensures only one matching row is returned from the right table, enforcing the no clones zone.

SELECT t1.*, oa.*
FROM left_table t1
OUTER APPLY (
    SELECT TOP 1 *
    FROM right_table t2
    WHERE t1.id = t2.foreign_id
    ORDER BY t2.rank_criteria
) oa;

The power of `DISTINCT`

When your query is straightforward, with a desire to eliminate the doppelgängers after a standard LEFT JOIN, the DISTINCT keyword comes to the rescue. Keep in mind, using DISTINCT is like asking your SQL Server to find a needle in a haystack on larger data sets - it might take some time!

SELECT DISTINCT t1.*
FROM left_table t1
LEFT JOIN right_table t2 ON t1.id = t2.foreign_id;

Fine-tuning with selective joining

In a multiverse of tables, the content-media relationship can get tangled. Select carefully which columns to join on and avoid bringing twin rows into being:

SELECT t1.content_id, t1.title, t2.media_id, t2.url
FROM tbl_Contents t1
LEFT JOIN 
(
    SELECT media_id, content_id, url, ROW_NUMBER() OVER (PARTITION BY content_id ORDER BY priority DESC) row_num
    FROM tbl_Media
) t2
ON t1.content_id = t2.content_id AND t2.row_num = 1;

Handling the rogue NULLs and juggling order

A LEFT JOIN comes with occasional unwanted NULLs and unknowing disorder. Maintain equilibrium using mindful ordering:

SELECT t1.content_id, t1.title, ISNULL(t2.media_id, 'No media') AS media_id
FROM tbl_Contents t1
LEFT JOIN tbl_Media t2 ON t1.content_id = t2.content_id
ORDER BY t1.content_id, t2.priority;

Ensuring completeness

We aim for a query retrieving all relevant rows from the left table without duplicates, even when the right table shows NULLs:

SELECT t1.*, COALESCE(t2.attribute, 'Default value') AS attribute
FROM left_table t1
LEFT JOIN right_table t2 ON t1.matching_column = t2.matching_column
WHERE t2.unique_column IS NULL OR t2.unique_column = (/*your worth a million dollars column*/);