How to speed up insertion performance in PostgreSQL

sql

performance

best-practices

optimization

byAlex Kataev·Nov 26, 2024

Accelerate PostgreSQL insertion performance by employing bulk operations using the COPY command for sizable data loads. This approach importantly reduces overhead compared to executing multiple INSERT commands.

# Who needs slow INSERTs when you have COPY, right?
COPY my_table FROM '/file.csv' WITH (FORMAT csv);

In the Python ecosystem, utilize Psycopg2's copy_expert function to directly pass a CSV file-like object for in-application bulk data insertions:

import psycopg2

conn = psycopg2.connect(dsn)
cursor = conn.cursor()

with open('data.csv', 'r') as file:
    cursor.copy_expert("COPY my_table FROM STDIN WITH CSV", file)
conn.commit()

# Python and PostgreSQL, BFFs forever!

Perfectly adjust the work_mem setting to optimize memory usage during insertions, and calibrate the batch sizes to delicately juggle resource usage.

Setting server parameters for improved insertion strategy

Tweak PostgreSQL's configurations such as max_wal_size to reduce checkpoint occurrences during a bulk load. By setting synchronous_commit to off, the transaction can return a successful status before WAL records are persisted to disk, delivering a notable write performance boost:

# Sorry, COMMIT, we're breaking up. It's not you, it's my performance.
ALTER SYSTEM SET synchronous_commit TO off;

Fine-tune the handling of the Write-Ahead Log (WAL) by extending commit_delay when inserting multiple concurrent transactions. This allows for group commit optimization, and think about archiving your WAL on a separate disk for enhanced I/O performance.

Optimizing tables and indexes for faster data loading

During heavy data insertions, exploit UNLOGGED tables to temporarily bypass the overhead of WAL logging:

# Faster than Flash on caffeine
CREATE UNLOGGED TABLE my_table_temp AS SELECT * FROM my_table WITH NO DATA;

Ensure to drop and recreate indexes post the mass data load to ensure rapid insertion performance. Furthermore, make sure to reenable triggers post data load:

# Who likes manual labor anyway?
DROP INDEX my_index;
-- Insert operation
CREATE INDEX my_index ON my_table USING btree (my_column);

Result: Enjoy a speedier journey (insertion) from point A to point B for your data! 🏎️💨

Choosing the right gear: Hardware considerations

Invest in top-tier SSDs for swifter data writes. Dedicate separate disks for data and WAL to distribute the I/O load. Upgrading other hardware elements, like increasing RAM, can also amplify the overall database performance.

Power move: Using advanced data formats

Exercise the COPY command featuring binary data formats to enhance data loading speed. In the realm of Python, mobilize the binary copy functionality of popular libraries like psycopg2.

Python insertions, optimized and styled

For those favoring Python, harnessing server-side prepared statements can intensify performance. Remember to disable auto-commit to minimize transaction overhead and experiment with techniques like SQLBulkOperations.

# Python, PostgreSQL, and performance. The three Ps of awesomeness!
cursor.execute('PREPARE bulk_insert (int, text) AS INSERT INTO my_table VALUES ($1, $2);')

for record in records:
    cursor.execute('EXECUTE bulk_insert (%s, %s)', record)
    
conn.commit()

Perfecting indexing strategies for UUIDs

If UUIDs are your choice, harness the power of functions like gen_random_uuid(). Keep in mind its effect on indexing and potential shuffling that might hamper insertion performance.

explain-codes / Sql / How to speed up insertion performance in PostgreSQL

Linked

How to insert multiple rows in SQLite?



Finding and Deleting Duplicate Rows in PostgreSQL



Aggregating save()s in Django?



How to update selected rows with values from a CSV file in Postgres?



Postgresql - change the size of a varchar column to lower length

