Select statement to find duplicates on certain fields

sql

duplicates

subqueries

window-functions

byAnton Shumikhin·Aug 11, 2024

Quickly identify duplicates using GROUP BY for target fields and HAVING for counts above one.

-- "Cause nothing says fun like GROUP BY"
SELECT field1, field2, COUNT(*)
FROM table
GROUP BY field1, field2
HAVING COUNT(*) > 1;

In the script above, the records are grouped by field1 and field2 and then filtered to select groups—hence, duplicates—with more than one record.

Advanced duplicate handling

As pirates love to say, "It's more than 'arrr' in GROUP BY."

Dealing with obstinate duplicates

Sometimes, duplicates refuse to walk the plank. Use subqueries and the EXISTS clause to ensure these renegades walk the plank (get excluded), especially the first duplicate entry based on a unique identifier (e.g., id).

-- "Let's play hide 'n' seek with duplicates"
SELECT *
FROM table t1
WHERE EXISTS (
    SELECT 1 
    FROM table t2 
    WHERE t1.field1 = t2.field1 AND 
          t1.field2 = t2.field2 AND 
          t1.id > t2.id
);

Harnessing window functions to manage duplicates

Window functions, like RANK() or ROW_NUMBER(), offer more control, finer detail, and a fancier hat—every pirate's dream!

-- "Ranking duplicates, cause every pirate needs a rank!"
SELECT field1, field2, field3, 
       RANK() OVER(PARTITION BY field1, field2, field3 ORDER BY id) AS rank
FROM table;

Remember: null values can alter duplication checks, so handle with care—like a fragile treasure map!

Dealing with unique Duplicate Scenarios

Preserving that one special duplicate

Sometimes a pirate needs to keep one duplicate for leverage—or, y'know, memories. In such cases:

-- "Keep one, throw the rest overboard"
SELECT MIN(id) as UniqueID, field1, field2
FROM table
GROUP BY field1, field2

Structuring with Common Table Expressions (CTEs)

CTEs are like treasure maps—they guide you to your goal, in this case, handling duplicates:

-- "Treasure maps for duplicates, arrrr!"
WITH DuplicateRecords AS (
    SELECT field1, field2, field3, 
           ROW_NUMBER() OVER (PARTITION BY field1, field2, field3 ORDER BY id) AS row_num
    FROM table
)
SELECT * FROM DuplicateRecords
WHERE row_num > 1;