How to change the CHARACTER SET (and COLLATION) throughout a database?
To promptly convert your entire database to a new CHARACTER SET and COLLATION, here's a handy script using ALTER TABLE commands. This one changes everything to utf8mb4
and utf8mb4_unicode_ci
:
Output of the command above needs to be executed on your SQL client. Don't forget to replace 'your_db_name'
with the name of your database. Like every film hero would say, "Safety first". Backup your database before diving into the process.
Primer on CHARACTER SET and COLLATION
So you're getting cozy with utf8mb4
and utf8mb4_0900_ai_ci
. The utf8mb4
CHARACTER SET takes UTF-8 to its fullest potential, even including emojis and a wide array of Asian languages. It's effectively a power-up, a Super Mario Mushroom if you will, for the older utf8
set, capped at a three-byte Unicode subset.
With respect to COLLATION, utf8mb4_0900_ai_ci
implements accent-insensitive and case-insensitive sorting. It's the referee that decides sorting and comparison rules for your text data, and varies from its utf8_general_ci
and utf8_unicode_ci
precursors.
And like ordering a quadruple cheeseburger, getting more means consuming more - data size requirements could increase with utf8mb4
. Don't forget to reevaluate and adjust your table column lengths before proceeding.
Suit Up for the Migration
Batch tackling tables and columns
An ensemble of ALTER TABLE MODIFY
commands is needed for each column that'll change. Flex your scripting skills with Perl, or a batch SQL script, to create these commands. And here's a pro tip: bump up the group_concat_max_len
session variable to handle lengthy ALTER statements, kind of like loosening your belt before Thanksgiving dinner.
System schemas and default values
Remember the 'sys' and 'mysql' schemas? The lifeblood of MySQL functionalities, they're best left unaltered in this process. And while you're at it, make sure default values are properly escaped to avoid accidental changes, kind of like making sure your coffee doesn't spill on your laptop.
Enum and set data types
Say hello to our special guests: enum and set data types. They tend to demand explicit redefinition during the collation change, like a celebrity with a lengthy rider.
Exclude views, prevent corruption
When changing tables, views must be excluded. Think of them as mirrors reflecting the underlying tables, altering them directly might just shatter the reflection. Once the tables are updated, the changes ripple to the views too.
Before and after the collation changes, ensure you maintain the purity of your data. Consider storing a representation of your data in memory or someplace safe for detailed comparison post-migration. It's like a before and after photo in a weight loss ad.
From Latin1 to UTF-8
If you're moving from latin1
to utf8mb4
, brace yourself for a significant character support upgrade. Though not foolproof, there's potential for data loss. As always, secure a backup before proceeding.
Was this article helpful?