Explain Codes LogoExplain Codes Logo

Sql query: Delete all records from the table except latest N?

sql
subquery
delete-records
database-performance
Nikita BarsukovbyNikita Barsukov·Dec 12, 2024
TLDR

Trim down your database table and retain only the newest N rows using a DELETE operation, which sparingly retains the newest entries. Set up this selective criteria by utilizing ORDER BY to target the latest entries and LIMIT N to preserve your desired count. Here's how to implement it:

DELETE FROM my_table WHERE id NOT IN ( SELECT id FROM ( SELECT id FROM my_table ORDER BY updated_at DESC -- like ranking the 'cool kids' LIMIT N -- The 'in' crowd. Don't touch them! ) fresh_data );

Substitute my_table with your actual table name, updated_at with the column of date/timestamp you're keeping track of, and N with the count of recent rows you want to keep.

Advanced techniques are also available, like the LEFT JOIN/IS NULL method and Alex Barrett's double subquery/<= method, which may be applied when dealing with complex scenarios or facing performance constraints.

Mastering the double subquery technique

The NOT IN operator might seem a bit laggy while working with vast tables. A better performing alternative is using a double subquery along with a comparison operation:

DELETE FROM my_table WHERE id <= ( --<=: not just for kids! SELECT id FROM ( SELECT id FROM my_table ORDER BY updated_at DESC -- who's cool now! LIMIT 1 OFFSET N-1 -- crowd control ) inner_table );

This pattern identifies the Nth newest entry then trashes anything older, as a result sparing the latest N records. Ensure that id is a sequential key for this approach to be foolproof.

Erring on the side of caution

Deleting can be tricky, therefore preventing disastrous data losses by running a SELECT statement prior to any DELETE operation is strongly recommended:

SELECT * FROM my_table WHERE id NOT IN ( -- Peek before you remove SELECT id FROM ( SELECT id FROM my_table ORDER BY updated_at DESC --Popularity contest again! LIMIT N ) draft_table );

Double-checking that this matches your expected outcome before proceeding with the DELETE command is considered best practice.

Pro tips for optimal operations and security

Safety first: Ensure backups are in place before a mass delete. For very large datasets, consider these advanced patterns:

Left join maneuvers for mindful removals

Very useful when integrating complex foreign key associations:

DELETE a FROM my_table a LEFT JOIN ( SELECT id FROM my_table ORDER BY updated_at DESC -- Sorting the 'cool' from the 'not-so-cool' LIMIT N -- The cool kids club ) b ON a.id = b.id WHERE b.id IS NULL; -- Null? Sorry, out of the club!

Subquery-triggered temporary tables

In some cases, especially with certain RDBMS such as MySQL, using subqueries can create temporary tables leading to performance gains during deletion:

DELETE FROM my_table WHERE id IN ( SELECT id FROM ( SELECT id FROM my_table ORDER BY updated_at DESC LIMIT N, 18446744073709551615 -- A super large number to select the 'uncool' ) as temp_table );

Note the use of a largely intimidating number to select all other remaining records. Modify this according to your RDBMS's upper limit.