Explain Codes LogoExplain Codes Logo

Bulk/batch update/upsert in PostgreSQL

sql
bulk-inserts
performance-optimization
postgresql
Nikita BarsukovbyNikita Barsukov·Oct 18, 2024
TLDR

Execute a batch upsert in PostgreSQL with INSERT ... ON CONFLICT, which either inserts new rows or updates existing ones based on a unique constraint like id. Here's a concise example:

INSERT INTO your_table(id, column1, column2) VALUES (1, 'ValueA', 'ValueB'), (2, 'ValueC', 'ValueD') ON CONFLICT (id) DO UPDATE SET column1 = EXCLUDED.column1, column2 = EXCLUDED.column2;

As highlighted here, if an id already exists, the corresponding column1 and column2 will be updated with the new values. New rows are inserted when there isn't any conflict. The EXCLUDED pseudo-table signifies the propositioned insert.

The unnest() method for bulk inserts

PostgreSQL 8.4 and above support unnest() function, a handy tool for unpacking arrays into rows for bulk inserts. Let's dive into a simple illustration:

INSERT INTO your_table(id, column1, column2) SELECT * FROM unnest( array[1, 2], array['ValueA', 'ValueC'], array['ValueB', 'ValueD'] ); -- You have just got "unnested"! Like an Easter egg but less chocolate, more data.

This array trick reduces SQL operations, improving performance with large datasets.

Employ UPDATE...FROM for various conditions

For a more verbose update scenario where various conditions have to be met, UPDATE...FROM has got your back:

UPDATE your_table SET column1 = data.column1, column2 = data.column2 FROM (VALUES (1, 'NewValueA', 'NewValueB'), (2, 'NewValueC', 'NewValueD') ) AS data(id, column1, column2) WHERE your_table.id = data.id; -- The SQL equivalent of "Kill many birds with one stone"...just less birdy.

This UPDATE...FROM technique minimizes unnecessary operations and accelerates your work.

Optimizing with Hibernate's batch

When coupling JDBC with Hibernate, strategically flushing your session and configuring Hibernate’s @BatchSize can do wonders:

@BatchSize(size = 50) // Choose your batch size public class YourEntity { ... } ... session.save(newEntity); if (i % 50 == 0) { session.flush(); session.clear(); } // Hibernate and chill: Rock your SQL operations Netflix style.

By doing this, we are dramatically improving the SQL operations’ efficiency.

JSON data with batch updates

For the PostgreSQL 9.3 users, json_array_elements() is a gift you should unwrap to perform batch updates with JSON data:

UPDATE your_table SET column1 = data.value->>'column1', column2 = data.value->>'column2' FROM json_array_elements('[ {"id": 1, "column1": "JsonValA", "column2": "JsonValB"}, {"id": 2, "column1": "JsonValC", "column2": "JsonValD"} ]') AS data WHERE your_table.id = (data.value->>'id')::int; -- JSON meets PostgreSQL: A fairytale for devs.

This command converts a JSON array into rows, making your update life easier.

Multiple rows insertion with VALUES

The VALUES keyword encourages multiple row insertions in one query:

INSERT INTO your_table(id, column1, column2) VALUES (3, 'ValueE', 'ValueF'), (4, 'ValueG', 'ValueH'), (5, 'ValueI', 'ValueJ'); -- SQL's BUY ONE, GET TWO: Insert three rows at the cost of one query!

The keyword streamlines your bulk operations and keeps your database smiling.

Complicated batch? Use DO block

PostgreSQL brings ‘DO’ block to the rescue whenever a more complicated update involving loops is required:

DO $$ DECLARE rec RECORD; BEGIN FOR rec IN SELECT * FROM your_table WHERE condition LOOP UPDATE your_table SET column1 = rec.column1 || '_updated' WHERE id = rec.id; END LOOP; END $$; -- Recursion, the SQL way. The loop goes round and round, and your updates get done.

But remember, tucking logic in database might turn your maintenance into a hurdle.

Hibernate's 'IN' operator to reduce SELECTs

Instead of executing many SELECT statements, get smart and use IN operator:

List<YourEntity> entities = session.createQuery("FROM YourEntity WHERE id IN :ids") .setParameter("ids", idList) .list(); // The Hibernate "IN" move: Less SELECTs, more time for SELECT-ing your next Netflix series.

This lowers query count and matters when id list is already known.

Specific column update with column-list syntax

PostgreSQL's column-list syntax permits targeted updates via row-valued expressions, which optimizes SQL execution:

UPDATE your_table SET (column1, column2) = (row_data.column1, row_data.column2) FROM (VALUES (1, 'TargetValueA', 'TargetValueB'), (2, 'TargetValueC', 'TargetValueD') ) AS row_data(id, column1, column2) WHERE your_table.id = row_data.id; -- Get to the point. Update without upsetting other columns.

This feature works wonders when you intend to update several but not all columns.