Explain Codes LogoExplain Codes Logo

How do I find duplicate values in a table in Oracle?

sql
duplicate-values
data-quality
sql-queries
Anton ShumikhinbyAnton Shumikhin·Nov 6, 2024
TLDR

To rapidly locate duplicates in an Oracle table, leverage SQL commands GROUP BY and HAVING. Here's the basic SQL query to find repetitive entries in any column named dup_col:

SELECT dup_col, COUNT(*) FROM your_table GROUP BY dup_col HAVING COUNT(*) > 1;

This little magic spell will target dup_col values with multiple appearances and their occurrence frequency. Time to find out who's the doppelgänger!

Identifying and Handling Duplicates

Before diving in, let's identify our duplicates. In SQL, duplicates are rows having same values in some or all columns. However, identifying and handling duplicates could vary depending on context:

  • Simple: Same value in one column
  • Compound: Same values across multiple columns
  • Complete: All column values are identical

Single Column Duplicates

When your mission is to detect suspects with the same name (or same whatever), the simplest weapon of a Jedi is GROUP BY and HAVING:

SELECT name, COUNT(*) FROM jedi GROUP BY name HAVING COUNT(*) > 1;

Who said being a Jedi was tricky? Use the SELECT FORCE, Luke!

Multi-column and Complete Duplicates

Join the Jedi council, aka multiple columns, into your mission:

SELECT col1, col2, ..., colN, COUNT(*) FROM jedi GROUP BY col1, col2, ..., colN HAVING COUNT(*) > 1;

Now you're thinking with columns!

To detect complete clone troopers (complete row duplicates):

SELECT a.* FROM troopers a JOIN (SELECT col1, col2, ..., colN FROM troopers GROUP BY col1, col2, ..., colN HAVING COUNT(*) > 1) b ON a.col1 = b.col1 AND a.col2 = b.col2 AND ... AND a.colN = b.colN WHERE a.ROWID > b.ROWID;

Remove the helmet and see who’s underneath! Now we're storming with JOINs!

Efficient Duplicate Handling

Time to put our Clone Wars knowledge to use and delete duplicates, keeping one of each:

DELETE FROM troopers WHERE ROWID NOT IN ( SELECT MIN(ROWID) FROM troopers GROUP BY col1, col2, ..., colN );

Death Star: Duplicates! Unlike Stormtroopers, this statement never misses its targets!

Exploring duplication patterns

Track when and how these duplicates are appearing. Consistent duplicated data could indicate underlying issues with your data ingestion processes or system logic. Exploring and examining your duplicates can lead you to the root of your issue, improve your data quality and save you from the dark side of tomorrow’s duplicates.