How to delete duplicate rows in SQL Server?
To quickly delete duplicate rows in SQL Server, use a CTE (Common Table Expression) with ROW_NUMBER(). This ranks duplicate records, making them easy to remove.
Replace column1 with your unique identifier and column2 with your sort preference. This code retains the first entry and removes the rest.
Detailed explanation
Handle duplicates with multiple columns
When considering multi-column duplicates, include all relevant fields in your PARTITION BY clause. This step helps identify duplicates effectively.
Verify before you delete
It's essential to double-check which rows you're deleting. Swap DELETE with SELECT to see which records are flagged for removal.
Alternative techniques
Sometimes, ROW_NUMBER() may not serve your purposes. In such scenarios, the RANK() function could be a preferable alternative.
Dealing with more complex duplicates
Group by and max ID
When tables do not have a distinct identifier, you can utilize GROUP BY with MAX(id) to effectively delete duplicates while preserving unique rows.
This solution discards duplicates while preserving the row with the highest ID.
Deletion with no unique key identifiers
Without unique IDs, deduplicate by using the above methods, like CTE combined with ROW_NUMBER() or GROUP BY together with aggregate functions.
Best practices for SQL masters
Back it up
Make a habit of backing up your table before data deletion. It's also known as "the Undo button SQL Server forgot."
Test, test
Execute in a non-production environment first. Your production environment isn't a playground!
Was this article helpful?