How to delete duplicate rows in SQL Server?

sql

data-deletion

sql-server

best-practices

byAlex Kataev·Oct 10, 2024

To quickly delete duplicate rows in SQL Server, use a CTE (Common Table Expression) with ROW_NUMBER(). This ranks duplicate records, making them easy to remove.

WITH CTE AS (
   -- Here we make sure every record feels unique by giving them a number.
   SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS rn
   FROM your_table
)
-- Sadly, not everyone makes the cut. 😔
DELETE FROM CTE WHERE rn > 1;

Replace column1 with your unique identifier and column2 with your sort preference. This code retains the first entry and removes the rest.

Detailed explanation

Handle duplicates with multiple columns

When considering multi-column duplicates, include all relevant fields in your PARTITION BY clause. This step helps identify duplicates effectively.

Verify before you delete

It's essential to double-check which rows you're deleting. Swap DELETE with SELECT to see which records are flagged for removal.

SELECT * FROM CTE WHERE rn > 1;

Alternative techniques

Sometimes, ROW_NUMBER() may not serve your purposes. In such scenarios, the RANK() function could be a preferable alternative.

WITH CTE AS (
   -- Just like in school, we are ranking everyone!
   SELECT RANK() OVER (PARTITION BY column1 ORDER BY column2) AS rnk
   FROM your_table
)
-- Time to bid farewell to anyone not on top 😥
DELETE FROM CTE WHERE rnk > 1;

Dealing with more complex duplicates

Group by and max ID

When tables do not have a distinct identifier, you can utilize GROUP BY with MAX(id) to effectively delete duplicates while preserving unique rows.

DELETE y
FROM your_table y
LEFT JOIN (
   -- We're giving a special pass to rows with MaxID
   SELECT MAX(id) as MaxID FROM your_table GROUP BY column1, column2, column3
) AS KeepRows ON y.id = KeepRows.MaxID
WHERE KeepRows.MaxID IS NULL;