Explain Codes LogoExplain Codes Logo

Left join without duplicate rows from the left table

sql
left-join
outer-apply
distinct
Alex KataevbyAlex Kataev·Aug 15, 2024
TLDR

Avoiding duplicates in a LEFT JOIN is a breeze with this quick solution. We'll employ a subquery with the window function, ROW_NUMBER(). By utilizing PARTITION BY on the column causing duplicates and filtering via a WHERE clause for the top-ranked row, we escape the realm of repetition.

SELECT clean.* FROM ( SELECT t1.*, ROW_NUMBER() OVER (PARTITION BY t1.duplicate_field ORDER BY t1.preferred_order) as row_num FROM left_table t1 LEFT JOIN right_table t2 ON t1.id = t2.foreign_id ) AS clean WHERE clean.row_num = 1;

duplicate_field is our repeat offender column and preferred_order is our arbitrator deciding which duplicate has the spotlight.

Utilizing OUTER APPLY to prevent duplicates

SQL Server provides OUTER APPLY, running a subquery for each row of the left table. Coupling it with TOP 1 ensures only one matching row is returned from the right table, enforcing the no clones zone.

SELECT t1.*, oa.* FROM left_table t1 OUTER APPLY ( SELECT TOP 1 * FROM right_table t2 WHERE t1.id = t2.foreign_id ORDER BY t2.rank_criteria ) oa;

The power of DISTINCT

When your query is straightforward, with a desire to eliminate the doppelgängers after a standard LEFT JOIN, the DISTINCT keyword comes to the rescue. Keep in mind, using DISTINCT is like asking your SQL Server to find a needle in a haystack on larger data sets - it might take some time!

SELECT DISTINCT t1.* FROM left_table t1 LEFT JOIN right_table t2 ON t1.id = t2.foreign_id;

Fine-tuning with selective joining

In a multiverse of tables, the content-media relationship can get tangled. Select carefully which columns to join on and avoid bringing twin rows into being:

SELECT t1.content_id, t1.title, t2.media_id, t2.url FROM tbl_Contents t1 LEFT JOIN ( SELECT media_id, content_id, url, ROW_NUMBER() OVER (PARTITION BY content_id ORDER BY priority DESC) row_num FROM tbl_Media ) t2 ON t1.content_id = t2.content_id AND t2.row_num = 1;

Handling the rogue NULLs and juggling order

A LEFT JOIN comes with occasional unwanted NULLs and unknowing disorder. Maintain equilibrium using mindful ordering:

SELECT t1.content_id, t1.title, ISNULL(t2.media_id, 'No media') AS media_id FROM tbl_Contents t1 LEFT JOIN tbl_Media t2 ON t1.content_id = t2.content_id ORDER BY t1.content_id, t2.priority;

Ensuring completeness

We aim for a query retrieving all relevant rows from the left table without duplicates, even when the right table shows NULLs:

SELECT t1.*, COALESCE(t2.attribute, 'Default value') AS attribute FROM left_table t1 LEFT JOIN right_table t2 ON t1.matching_column = t2.matching_column WHERE t2.unique_column IS NULL OR t2.unique_column = (/*your worth a million dollars column*/);

COALESCE provides a safety net when there's a gaping hole in the right table - holding the integrity of the result set intact.