How to update selected rows with values from a CSV file in Postgres?

sql

best-practices

data-integrity

performance

byAlex Kataev·Sep 29, 2024

To update rows in Postgres using CSV data, import the CSV into a temporary staging table then join it with your actual table to update matching rows. Here's an example code snippet:

-- Temporary table for CSV
CREATE TEMP TABLE temp_csv(id INT, new_value TEXT);

-- Load CSV
COPY temp_csv FROM '/file.csv' DELIMITER ',' CSV HEADER;

-- Update main table
UPDATE main
SET column = temp_csv.new_value
FROM temp_csv
WHERE main.id = temp_csv.id;

-- Cleanup by dropping the temporary table
DROP TABLE temp_csv;

Strategy and best practices for CSV imports in PostgreSQL

Importing CSV data to update PostgreSQL tables can require some tactical strategies. Below are the best practices to ensure a smooth and efficient update operation.

Using a staging table is a smart move

For agility in data management and processing, create a temporary staging table. This way, you can isolate and pre-process the CSV data before affecting the main table.

Turbo-boost your performance

For large data imports, optimize your performance by adjusting the temp_buffers:

-- Pump up the RAM for temporary tables
SET temp_buffers = '256MB'; -- Adjust the size accordingly

Goes without saying: indexing your temp table will boost your UPDATE speed:

-- Yes, temporary tables like cool indexes too
CREATE INDEX temp_csv_id_idx ON temp_csv(id);

Before updating, run ANALYZE. This is like stretching before a race: it warms up the planner for better performance:

-- Who needs a big plan? temp_csv does
ANALYZE temp_csv;

Bash scripting - the multipurpose toolbox

When you're automating this process as a script, ensure all SQL commands are executed in the same psql session. It's like keeping all your tools in the same toolbox; easier to manage:

psql -d yourdatabase << EOF
CREATE TEMP TABLE...
COPY...
UPDATE...
DROP TABLE...
EOF

And of course, the user role running the script must have the necessary permissions for the SQL COPY command.

Preserving data integrity

No lost sheep here. Use the UPDATE ... FROM ... WHERE syntax to accurately target and update rows.

Check, double-check, and triple-check the file path and extension before importing to avoid that dreaded "file not found" error.

During scripting, ensure proper error handling to maintain data integrity in case of any hiccups.

Advanced usage: in dark corners of CSV imports

Your use-cases can be diverse and complex. Let's unravel some additional techniques that you can use.

Transfiguration class: UPDATE with transformations

Transform your data as you update. PostgreSQL's library of functions can manipulate and convert data formats, or apply logic during update.

-- All spells are taught at Hogwarts
UPDATE main
SET column = UPPER(temp_csv.new_value)
FROM temp_csv
WHERE main.id = temp_csv.id;

Keeping the peace during high concurrency

High concurrency environments require careful juggling. Use explicit locking strategies or serializable transactions to maintain harmony.

After the party: the post-update cleanup

After updating your tables, dropping temporary tables helps maintain peak database performance. It's akin to cleaning up after a party: the sooner you do, the better.

Further considerations: Beyond the horizon

Let's glance a little further over the horizon and see how we can handle more complex scenarios.

Using Foreign Data Wrappers

If you have CSV data in external databases, apply Foreign Data Wrappers (FDWs) to join and update tables directly, thereby bypassing intermediate CSV file usage.

Periodic updates

If you're making periodic updates, place your operation logic inside a function. Subsequent updates then become a simple matter of invoking the function. It's like pre-programming your smart home.

explain-codes / Sql / How to update selected rows with values from a CSV file in Postgres?

Linked

Is it better to create an index before filling a table with data, or after the data is in place?



Python pandas to_sql with sqlalchemy: how to speed up exporting to MS SQL?



Finding and Deleting Duplicate Rows in PostgreSQL



Getting random row through SQLAlchemy



How to speed up insertion performance in PostgreSQL

