Sql Return only duplicate rows

sql

subquery

join

window-functions

byAlex Kataev·Oct 1, 2024

For duplicates in a single column, utilize GROUP BY along with HAVING for selective filtering:

SELECT your_column, COUNT(*)  -- Count 'em all, dear SQL
FROM your_table
GROUP BY your_column
HAVING COUNT(*) > 1;  -- More than one, have some fun

This simplifies your results, showing only repeated your_column in your_table.

One step deeper: Finding duplicates

Sometimes, we need to move beyond GROUP BY for duplicate identification, involving all columns. A subquery or a self-join is your hero in that case:

SELECT *
FROM your_table
WHERE (SELECT COUNT(*)   -- Count duplicates 
       FROM your_table as sub
       WHERE sub.your_column = your_table.your_column) > 1;  -- Keep only the 'cheaters'

Here, the subquery matches full duplicates across all columns.

When 'all columns' matters

When considering duplicates across all columns, here's your lifesaver:

SELECT * 
FROM (
  SELECT your_columns, 
         COUNT(*) OVER (PARTITION BY your_columns) as duplicate_count  -- Count per costume
  FROM your_table
) as subquery
WHERE duplicate_count > 1;  -- Match the twins

This application of window functions gives you detailed control over the entire row.

Practical walk-through

Detailing with Self-joins

A self-join can help get a richer picture, one that screams context:

SELECT a.*
FROM your_table a
JOIN your_table b ON a.your_column = b.your_column AND a.id <> b.id;  -- Exclude self-matches, don't become a meme

This self-join doesn't match a row with itself, instead, it prefers its non-identical twin brother.

Subqueries for efficient picks

To get away from the costly scan a join can result in, subqueries come to the rescue:

SELECT *
FROM your_table
WHERE your_column IN (
  SELECT your_column
  FROM your_table
  GROUP BY your_column
  HAVING COUNT(*) > 1  -- Keep the 'double agents'
);