Explain Codes LogoExplain Codes Logo

Delete all but top n from database table in SQL

sql
performance
best-practices
data-integrity
Alex KataevbyAlex Kataev·Nov 29, 2024
TLDR

Let's quickly dive into retaining the top 'n' records and deleting everything else using a SQL Server-based CTE and DELETE statement:

WITH CTE AS ( SELECT TOP (n) [PrimaryKey] FROM [YourTable] ORDER BY [Criteria] DESC ) DELETE FROM [YourTable] WHERE [PrimaryKey] NOT IN (SELECT [PrimaryKey] FROM CTE);

Just replace n with your desired number of rows to keep, [YourTable] with your table's name, [Criteria] for sorting, and [PrimaryKey] for row identification.

Step-by-step guide and various techniques

Performance considerations and alternatives

If you process a large dataset, the CTE method may lack efficiency. For more scalable solutions, consider using a temporary table, reducing performance drain when handling big data:

CREATE TABLE #TempTable (PrimaryKey DataType); -- Our swift little helper INSERT INTO #TempTable -- Taking 'n' row-id vacation SELECT TOP (n) [PrimaryKey] FROM [YourTable] ORDER BY [Criteria] DESC; DELETE FROM [YourTable] WHERE [PrimaryKey] NOT IN (SELECT [PrimaryKey] FROM #TempTable); -- Bye-bye, unwanted records DROP TABLE #TempTable; -- We're tidy, not messy!

Remember to replace DataType with the correct data type of your primary key.

Maintaining data integrity with unique identifiers

It’s crucial to keep top 'n' records relying on specific sorting criteria using a unique identifier or an alias for the primary sorting column, safeguarding data integrity:

-- When the going gets SQL, the SQL gets unique DELETE t1 FROM [YourTable] t1 LEFT OUTER JOIN ( SELECT UNIQUE_IDENTIFIER FROM ( SELECT UNIQUE_IDENTIFIER, ROW_NUMBER() OVER (ORDER BY [Criteria] DESC) AS rn FROM [YourTable] ) t WHERE rn <= n ) t2 ON t1.UNIQUE_IDENTIFIER = t2.UNIQUE_IDENTIFIER WHERE t2.UNIQUE_IDENTIFIER IS NULL; -- NULL, the ultimate cockblocker

Optimized handling of frequent deletions

If you’re frequently performing this operation, consider partitioning the table for more efficient deletions. It's like regular cleaning – less clutter, more efficiency!

Differing syntax and techniques per database

Specific databases have distinct manners. With PostgreSQL, for instance, use LIMIT instead of TOP:

DELETE FROM [YourTable] WHERE [PrimaryKey] NOT IN ( SELECT [PrimaryKey] FROM [YourTable] ORDER BY [Criteria] DESC LIMIT n );

Remember to replace [YourTable], [PrimaryKey], and [Criteria] with your table's name, primary key column, and sorting order.

Edge cases, potential issues, and SQL-life hacks

ORDER BY needs careful, loving handling

To determine top 'n' correctly, ensure the ORDER BY column has unique values or you may not get 'n' records due to duplicated sorting criteria — like ordering by favorite color in a monochrome world.

Avoid performance traps

Avoid using SELECT TOP n directly in DELETE statement. It's like being promised a fast lane but ending up in a traffic jam.

Always test before executing

Perform a trial run of your DELETE queries using SELECT, ensuring the right records are targeted for removal — it's like SQL's simulation mode, all the excitement, none of the drama.

Backup - Your Fail-Safe Guardian

Before bulk deletion, always have a backup. Because Ctrl+Z isn't a luxury we enjoy in the SQL cosmos.