How to find duplicate records in PostgreSQL
To highlight duplicates in PostgreSQL, combine GROUP BY
with HAVING
to single out rows that occur more than once:
Swap column
and table
with your specific field and data source. This SQL line detects and flags duplicates efficiently.
Expanding the basic query
Our fast answer works well, but what if duplicates are spread across multiple columns? Adjust your GROUP BY
to include all relevant fields:
For a clearer picture, add unique identifiers to each duplicate group. Bring into play the ROW_NUMBER()
window function, in tandem with PARTITION BY
:
Efficient duplicate extermination
Sometimes, you not only need to spot duplicates, but also to wipe them out. Here's how to siege the castle of duplication and keep the oldest prisoner:
"Off with their heads!" — Keep in mind this method keeps the earliest record (i.e., the "head" and let go of the rest).
Dealing with multi-field duplicates
Duplication can often occur across multiple pillars of your database—namely, multiple fields. Here is how to tackle this Duplicate Hydra with your SELECT
sword:
In this example, duplicates are grouped both by id
and year
, slicing only into current year records. Now you have a sharp blade to chop off any multi-field duplicate heads.
Advanced maneuvers for duplicate detection
Situations may arise when you need more precision. You might want a lineup of the culprits or manage them in a set order. Here's your SQL flashlight in the darkness of duplicate dungeons:
The rn
column is now your duplicate detective, catching them in the act and numbering them. Now you can prosecute these dupers individually or in bulk.
Framing the duplicate findings
Lastly, it may prove useful to display your findings. Here's an example of how to do just that:
With ARRAY_AGG
, you can assemble all duplicate ids into a neat little line-up, linked to the prime 'id' deputy. Now you're ready to display or interrogate them as you see fit.
Was this article helpful?