How to delete duplicate rows in SQL Server?
To quickly delete duplicate rows in SQL Server, use a CTE (Common Table Expression) with ROW_NUMBER(). This ranks duplicate records, making them easy to remove.
Replace column1
with your unique identifier and column2
with your sort preference. This code retains the first entry and removes the rest.
Detailed explanation
Handle duplicates with multiple columns
When considering multi-column duplicates, include all relevant fields in your PARTITION BY
clause. This step helps identify duplicates effectively.
Verify before you delete
It's essential to double-check which rows you're deleting. Swap DELETE
with SELECT
to see which records are flagged for removal.
Alternative techniques
Sometimes, ROW_NUMBER()
may not serve your purposes. In such scenarios, the RANK()
function could be a preferable alternative.
Dealing with more complex duplicates
Group by and max ID
When tables do not have a distinct identifier, you can utilize GROUP BY
with MAX(id)
to effectively delete duplicates while preserving unique rows.
This solution discards duplicates while preserving the row with the highest ID.
Deletion with no unique key identifiers
Without unique IDs, deduplicate by using the above methods, like CTE
combined with ROW_NUMBER()
or GROUP BY
together with aggregate functions.
Best practices for SQL masters
Back it up
Make a habit of backing up your table before data deletion. It's also known as "the Undo button SQL Server forgot."
Test, test
Execute in a non-production environment first. Your production environment isn't a playground!
Was this article helpful?