How to find duplicate records in PostgreSQL

sql

prompt-engineering

best-practices

join

byAnton Shumikhin·Sep 30, 2024

To highlight duplicates in PostgreSQL, combine GROUP BY with HAVING to single out rows that occur more than once:

SELECT column, COUNT(*)
FROM table
GROUP BY column
HAVING COUNT(*) > 1;

Swap column and table with your specific field and data source. This SQL line detects and flags duplicates efficiently.

Expanding the basic query

Our fast answer works well, but what if duplicates are spread across multiple columns? Adjust your GROUP BY to include all relevant fields:

SELECT column1, column2, COUNT(*)
FROM yourTable
GROUP BY column1, column2
HAVING COUNT(*) > 1;

For a clearer picture, add unique identifiers to each duplicate group. Bring into play the ROW_NUMBER() window function, in tandem with PARTITION BY:

SELECT *, ROW_NUMBER()
OVER(PARTITION BY column ORDER BY id) AS Row
FROM yourTable
WHERE Row > 1;

Efficient duplicate extermination

Sometimes, you not only need to spot duplicates, but also to wipe them out. Here's how to siege the castle of duplication and keep the oldest prisoner:

DELETE FROM yourTable
USING (
    SELECT MIN(id) as id, column 
    FROM yourTable 
    GROUP BY column HAVING COUNT(*) > 1
) AS t1
WHERE yourTable.id > t1.id AND yourTable.column = t1.column;

"Off with their heads!" — Keep in mind this method keeps the earliest record (i.e., the "head" and let go of the rest).

Dealing with multi-field duplicates

Duplication can often occur across multiple pillars of your database—namely, multiple fields. Here is how to tackle this Duplicate Hydra with your SELECT sword:

SELECT id, year, COUNT(*)
FROM yourTable
GROUP BY id, year
HAVING COUNT(*) > 1 AND year = '2023';

In this example, duplicates are grouped both by id and year, slicing only into current year records. Now you have a sharp blade to chop off any multi-field duplicate heads.

Advanced maneuvers for duplicate detection

Situations may arise when you need more precision. You might want a lineup of the culprits or manage them in a set order. Here's your SQL flashlight in the darkness of duplicate dungeons:

WITH DuplicateRecords AS (
      SELECT * , 
      ROW_NUMBER() OVER (PARTITION BY column ORDER BY id) AS rn
      FROM yourTable
)
SELECT * 
FROM DuplicateRecords
WHERE rn > 1;

The rn column is now your duplicate detective, catching them in the act and numbering them. Now you can prosecute these dupers individually or in bulk.

Framing the duplicate findings

Lastly, it may prove useful to display your findings. Here's an example of how to do just that:

SELECT column, ARRAY_AGG(id) AS duplicate_ids
FROM (
     SELECT column, id, ROW_NUMBER() 
     OVER (PARTITION BY column ORDER BY id) AS rn
     FROM yourTable
) AS t
WHERE rn > 1
GROUP BY column;

With ARRAY_AGG, you can assemble all duplicate ids into a neat little line-up, linked to the prime 'id' deputy. Now you're ready to display or interrogate them as you see fit.

explain-codes / Sql / How to find duplicate records in PostgreSQL

Linked

How to delete duplicate rows in SQL Server?



Mysql Select rows on first occurrence of each unique value



Delete all Duplicate Rows except for One in MySQL?



Left join without duplicate rows from the left table



Sql Return only duplicate rows

