Delete all Duplicate Rows except for One in MySQL?
To eliminate duplicates efficiently, you can use a self-join deletion along with a grouping subquery. Here's a concise solution:
Replace your_table
with the name of your table, unique_column
with the column responsible for duplication, and id
with the unique identifier. The script selects the lowest id
(the keeper) for each duplicate group and then deletes entries with higher id
values, ensuring one unique record per duplicate group.
Which Approach to Pick for Large Tables?
For bulky datasets, the speed of execution is significant. An alternate approach that could be quick and safer involves using INSERT INTO
a new table with SELECT DISTINCT
. This strategy creates a new table with unique records only:
Swap the old table with this new distinct one:
Remember, always test this approach on a backup copy to avoid any data-integrity issues.
Version Matters - Efficiency Variation in MySQL
MySQL versions can impact the efficiency of de-duplication operations. Make sure to test with a staging environment to prevent performance issues in your production setup.
Handling MySQL Error 1093
You may encounter MySQL error 1093 while deleting duplicates. The workaround is to use a subquery with an extra SELECT
layer:
This structure keeps distinct records intact, letting MySQL handle the duplicates smoothly.
Don't Just Delete! Always Test First
Prior to executing any deletion on the main table, ensure to test it on a cloned table first. Remember, prevention is better than cure...or rather data retrieval in this case!
More Ways to De-duplicate
Forget Self-Join, Try GROUP BY
and HAVING
For those who frown upon self-joins, grouping and filtering to the rescue:
This command preserves the records with the highest id
and deletes the rest.
Staging Area with Temporary Tables
Break a leg and use the temporary tables stage:
User Variables - The Unconventional Way
Get innovative and target sequential duplicates using user variables:
Was this article helpful?