Explain Codes LogoExplain Codes Logo

How to delete duplicate rows in SQL Server?

sql
data-deletion
sql-server
best-practices
Alex KataevbyAlex Kataev·Oct 10, 2024
TLDR

To quickly delete duplicate rows in SQL Server, use a CTE (Common Table Expression) with ROW_NUMBER(). This ranks duplicate records, making them easy to remove.

WITH CTE AS ( -- Here we make sure every record feels unique by giving them a number. SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS rn FROM your_table ) -- Sadly, not everyone makes the cut. 😔 DELETE FROM CTE WHERE rn > 1;

Replace column1 with your unique identifier and column2 with your sort preference. This code retains the first entry and removes the rest.

Detailed explanation

Handle duplicates with multiple columns

When considering multi-column duplicates, include all relevant fields in your PARTITION BY clause. This step helps identify duplicates effectively.

Verify before you delete

It's essential to double-check which rows you're deleting. Swap DELETE with SELECT to see which records are flagged for removal.

SELECT * FROM CTE WHERE rn > 1;

Alternative techniques

Sometimes, ROW_NUMBER() may not serve your purposes. In such scenarios, the RANK() function could be a preferable alternative.

WITH CTE AS ( -- Just like in school, we are ranking everyone! SELECT RANK() OVER (PARTITION BY column1 ORDER BY column2) AS rnk FROM your_table ) -- Time to bid farewell to anyone not on top 😥 DELETE FROM CTE WHERE rnk > 1;

Dealing with more complex duplicates

Group by and max ID

When tables do not have a distinct identifier, you can utilize GROUP BY with MAX(id) to effectively delete duplicates while preserving unique rows.

DELETE y FROM your_table y LEFT JOIN ( -- We're giving a special pass to rows with MaxID SELECT MAX(id) as MaxID FROM your_table GROUP BY column1, column2, column3 ) AS KeepRows ON y.id = KeepRows.MaxID WHERE KeepRows.MaxID IS NULL;

This solution discards duplicates while preserving the row with the highest ID.

Deletion with no unique key identifiers

Without unique IDs, deduplicate by using the above methods, like CTE combined with ROW_NUMBER() or GROUP BY together with aggregate functions.

Best practices for SQL masters

Back it up

Make a habit of backing up your table before data deletion. It's also known as "the Undo button SQL Server forgot."

Test, test

Execute in a non-production environment first. Your production environment isn't a playground!