Explain Codes LogoExplain Codes Logo

Sql Return only duplicate rows

sql
subquery
join
window-functions
Alex KataevbyAlex Kataev·Oct 1, 2024
TLDR

For duplicates in a single column, utilize GROUP BY along with HAVING for selective filtering:

SELECT your_column, COUNT(*) -- Count 'em all, dear SQL FROM your_table GROUP BY your_column HAVING COUNT(*) > 1; -- More than one, have some fun

This simplifies your results, showing only repeated your_column in your_table.

One step deeper: Finding duplicates

Sometimes, we need to move beyond GROUP BY for duplicate identification, involving all columns. A subquery or a self-join is your hero in that case:

SELECT * FROM your_table WHERE (SELECT COUNT(*) -- Count duplicates FROM your_table as sub WHERE sub.your_column = your_table.your_column) > 1; -- Keep only the 'cheaters'

Here, the subquery matches full duplicates across all columns.

When 'all columns' matters

When considering duplicates across all columns, here's your lifesaver:

SELECT * FROM ( SELECT your_columns, COUNT(*) OVER (PARTITION BY your_columns) as duplicate_count -- Count per costume FROM your_table ) as subquery WHERE duplicate_count > 1; -- Match the twins

This application of window functions gives you detailed control over the entire row.

Practical walk-through

Detailing with Self-joins

A self-join can help get a richer picture, one that screams context:

SELECT a.* FROM your_table a JOIN your_table b ON a.your_column = b.your_column AND a.id <> b.id; -- Exclude self-matches, don't become a meme

This self-join doesn't match a row with itself, instead, it prefers its non-identical twin brother.

Subqueries for efficient picks

To get away from the costly scan a join can result in, subqueries come to the rescue:

SELECT * FROM your_table WHERE your_column IN ( SELECT your_column FROM your_table GROUP BY your_column HAVING COUNT(*) > 1 -- Keep the 'double agents' );

This fetches just those rows that have duplicate buddies. The subquery does the heavy lifting.

Accounting for edge cases

If your database design means nullable fields or uneven data, be sure to plan for those. Your duplicates query must not overlook these cases.