Explain Codes LogoExplain Codes Logo

Insert ... ON DUPLICATE KEY (do nothing)

sql
database
performance
best-practices
Alex KataevbyAlex Kataev·Sep 25, 2024
TLDR

To INSERT without causing any action on encountering duplicates, you can use two methods - INSERT IGNORE or the ON DUPLICATE KEY UPDATE clause followed by a non-operational update. This means updating a column value to its own value.

Here's how you use INSERT IGNORE:

-- Inserting values while Hogwarts School IGNORES "you-know-who": INSERT IGNORE INTO table_name (column1, column2) VALUES (value1, value2);

And here's how you use ON DUPLICATE KEY UPDATE (with a no-op):

-- Win the duplicate key battle by doing nothing. Genius, isn't it? INSERT INTO table_name (column1, column2) VALUES (value1, value2) ON DUPLICATE KEY UPDATE column1 = column1;

Both these methods allow your system to perform an insert operation while silently skipping any action when a duplicate key is detected.

Strategies for Warner Bros. (Preventing Duplicate Inserts)

When you aim to optimize both the performance and reliability of your INSERT operations, there are two main strategies you should consider in line with the design of your database and the nature of your application's requirements:

Strategy 1: For Light Weights- INSERT IGNORE

INSERT IGNORE provides a gentle approach, allowing your database to quietly bypass any duplicate key scenarios. Just be mindful of the fact that by choosing to ignore, you might miss some other, possibly important, types of errors as well—such as hitting an autoincrement limit.

Strategy 2: For Cautious Optimists- ON DUPLICATE KEY UPDATE

The ON DUPLICATE KEY UPDATE strategy accompanied by a benign, no-operation update like 'column1 = column1', effectively turns the action into a no-op. This means that it manages to avoid errors while ensuring a logical correctness in your operation.

Choose Wisely: You Shall not Pass (Duplicate Keys)

Before executing either strategy, it is advisable to make sure that your unique keys correctly reflect the uniqueness of your data. This is vital to maintain the integrity of your information and safeguard your system's operations against user error or application glitches.

Speak Softly, Carry a Big Stick: Handling Errors

While INSERT IGNORE does not raise an error for a duplicate key, the ON DUPLICATE KEY UPDATE id=id technique can achieve a similar result without entirely suppressing all errors. The best choice will depend on your error-handling approach and your DBMS's level of resilience against unreported errors.

Handling Bulk Inserts and Performance Impacts

For dealing with high volumes of data, understanding the potential performance differences between the two methods is crucial:

INSERT IGNORE: Light as a Feather

INSERT IGNORE is a more frugal choice in terms of resources because it doesn't need to perform an actual update when a duplicate is detected. When dealing with countless rows, saving every bit of resources can lead to significant overall performance improvements.

No-Op UPDATE: Safety First

On the other hand, ON DUPLICATE KEY UPDATE id=id favors reliability over marginal performance gains. The slight drop in speed might be a small price to pay for a robust and error-free operation. In hindsight, wouldn't you agree that it most often is?

A Special Note for InnoDB Users: innodb_autoinc_lock_mode

If you're using InnoDB tables, do check the innodb_autoinc_lock_mode setting. It determines how InnoDB handles autoincrement locking, influencing the behavior and potential efficiency of INSERT IGNORE statements.

Checklist Before Making Your Move

Whichever strategy you choose, a thorough review of its impacts on your existing systems is a must, especially when considering load performance, application-layer error handling, and data uniqueness.