Explain Codes LogoExplain Codes Logo

Delete duplicate rows from small table

sql
delete-duplicates
postgresql
data-management
Anton ShumikhinbyAnton ShumikhinยทOct 6, 2024
โšกTLDR

To eliminate duplicate entries in a small SQL table like ninja swipes out enemies, consider using CTE (Common Table Expressions) with ROW_NUMBER(). This approach assigns a numeric rank to every row in duplicate groups, permitting targeted removal:

-- Quick solution for small tables; like a ninja with shurikens WITH RankedDuplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY unique_column ORDER BY id) AS rn FROM table_name ) DELETE FROM RankedDuplicates WHERE rn > 1; -- There can be only one! (Highlander reference ๐Ÿ—ก๏ธ)

unique_column should be replaced with the column you're deduplicating, and swap table_name with your actual table name. This process retains the first occurrence of duplicates and removes the rest.

Approach breakdown: How-to guides for various scenarios

Take pre-emptive action with UNIQUE constraints

To actively discourage new duplicates like a schoolyard monitor, add UNIQUE constraints to keep your data clean:

-- Adding unique constraints are like strict teachers in a schoolyard; no funny business allowed! ALTER TABLE table_name ADD CONSTRAINT constraint_name UNIQUE(unique_column1, unique_column2);

This exercise is highly effective in warding off new duplicates.

Utilise CTID to efficiently handle duplicates

When dealing with PostgreSQL, consider CTID - the pole vault stick that leaps over duplicates. Use it this way:

-- CTID is like your personal pole vaulter that easily leaps over duplicates DELETE FROM table_name WHERE ctid NOT IN ( SELECT MIN(ctid) FROM table_name GROUP BY unique_column );

The ctid field is a unique row identifier and a particularly handy tool in PostgreSQL for deleting duplicates.

Large dataset? No worries

On encountering a large table, consider creating a new one housing distinct entries - like vacuuming only the wanted particles:

-- For large tables, creating a new one with distinct entries is like Thanos' snap; half of the duplicates disappear CREATE TABLE new_table AS SELECT DISTINCT ON (unique_column) * FROM table_name;

After which, rename the new_table to replace the old. It's efficient, especially if your original table is a hot mess of duplicates.

Hot potato: dealing with cases without keys

Addressing a table with no keys

If your table started life sans keys, panic not. EXIST operators or NOT EXISTS can come to your rescue in DELETE statements:

-- This is like using a metal detector to find hidden treasures DELETE FROM table_name t1 WHERE EXISTS ( SELECT 1 FROM table_name t2 WHERE t2.unique_column = t1.unique_column AND t2.id < t1.id );

Adding keys post-design

Missing primary keys from the onset? No sweat! Add them later for smoother data manipulation:

-- Adding primary key later is like giving a name to a pet; they feel part of the family now! ALTER TABLE table_name ADD COLUMN id SERIAL PRIMARY KEY;

User-friendly guidance with references

Gear up with the PostgreSQL wiki and DB Fiddle resources: they are your guiding stars:

  • Loaded with practical examples.
  • Offers step-by-step guides for implementing giant-killing SQL solutions.
  • Understand how to maintain sharp and efficient data management.