Explain Codes LogoExplain Codes Logo

Update or Insert (multiple rows and columns) from subquery in PostgreSQL

sql
subqueries
upsert
performance
Alex KataevbyAlex Kataev·Nov 11, 2024
TLDR

When it comes to PostgreSQL, you can upsert data using the INSERT and ON CONFLICT clauses. Here's the typical SQL pattern for upserting:

INSERT INTO target_table (column_list) SELECT data_to_insert FROM source_data ON CONFLICT (conflict_columns) DO UPDATE SET column_to_update = EXCLUDED.column_to_update;

Tweak this template to fit your context:

  • Substitute target_table with your actual table name.
  • column_list represents the list of columns involved in your insert or update operation.
  • source_data denotes the subquery or table providing the new data.
  • conflict_columns identify unique rows, often equating to the primary key.
  • column_to_update: The column to be updated in the event of a conflict.

Impressively, the DO UPDATE SET part ensures existing rows receive new values. EXCLUDED is used to point to the row that would have been inserted barring any duplication.

Deep dive into the upsert operation

The term "upsert" cleverly combines "update" and "insert". Essentially, it checks for the existence of a row. If found, it performs an update; if not, it inserts a new row. PostgreSQL carries out this dual functionality in a single SQL statement. Notably, the efficiency in this approach lies in condensing two potential operations into one, thereby saving resources and execution time.

Mastering complex subqueries

In handling subqueries that veer towards the complex end of the spectrum:

  • Always use an alias for your subquery as a fail-proof way to make referencing it in INSERT statements simpler.
  • Double-check data types and column matches—mismatched data types between main table and subquery are a common source of errors.
  • Be precise with your join conditions within the subquery to prevent undesirable data updates.

Using temporary tables to enhance performance

When dealing with intense data manipulation (large datasets or complex joins, for instance), resource usage can skyrocket. Creating a temporary table can give a much-needed boost to your upsert operation's performance. This temporary table should contain the outcome of your intricate operations, resulting in a smoother insert or update process:

CREATE TEMP TABLE tmp_subquery AS SELECT complex_operation_result FROM some_large_dataset; INSERT INTO target_table (column_list) SELECT * FROM tmp_subquery ON CONFLICT (conflict_columns) DO UPDATE SET column_to_update = EXCLUDED.column_to_update; DROP TABLE tmp_subquery; -- clean-up time: don't leave your toys out!

Optimization tricks at your fingertips

Harnessing JOINs in subqueries

Make JOINs your best friend when dealing with subqueries. They allow updates across multiple tables and collate data from different sources. Here, data reconciliation is taken care of within the subquery itself, which is more resource-saving than doing so post-insertion.

Streamlining INSERTs minus VALUES

When you're performing numerous INSERTions via subquery, you can ditch the VALUES syntax for a SELECT subquery, resulting in a smoother:

INSERT INTO target_table (column1, column2) SELECT column1, column2 FROM source_table;

Such an approach is far more straightforward and lean, especially when dealing with a large number of records.

For each truck:

  • Check if it already has the SAME data box
  • If yes: Update the box inside (🔄)
  • If no: Load a new data box (➕)

**UPSERT visualised** (via `ON CONFLICT`):
```markdown
🚚🔄🚚➕🚚➕...
# Each truck gets updated with NEW data or receives ADDITIONAL data boxes

The key takeaway here is to efficiently handle these updates and insertions in bulk—ensure each box meets the right truck just once!

Making every update precise and efficient

The art of correct column referencing

When detailing columns in the SET clause, it's pivotal to match them properly between the main table and the excluded (conflicted) subquery. Incorrect cross-references can thwart the operation.

How the WHERE clause refines updates

In certain scenarios, not every conflict should automatically switch to an update. Use the WHERE clause to filter these conditions and guarantee that only specific rows are updated:

ON CONFLICT (conflict_columns) DO UPDATE SET column_to_update = EXCLUDED.column_to_update WHERE (condition_to_update);

The virtue of understanding transaction isolation

A solid grasp of PostgreSQL's transaction isolation levels can empower you to design your upsert operations to be both performant and reliable. Side bonus: it helps to minimize risks of deadlocks and anomalies.