Explain Codes LogoExplain Codes Logo

Convert output of MySQL query to utf8

sql
utf8-encoding
mysql-query
data-integrity
Nikita BarsukovbyNikita Barsukov·Dec 5, 2024
TLDR

To ensure utf8 encoding for MySQL outputs, start the database connection using SET NAMES 'utf8'. Further, modify the table character set to utf8 using ALTER TABLE tablename CONVERT TO CHARACTER SET utf8:

SET NAMES 'utf8'; -- Wrapping things in a utf8 blanket ALTER TABLE tablename CONVERT TO CHARACTER SET utf8; -- utf8 makeover for the table

Utf8-encoding a specific column

At times, utf8 encoding might be required for certain columns in a query which can be achieved using the CONVERT function:

SELECT column1, CONVERT(column2 USING utf8) FROM my_table WHERE my_condition;

This approach is typically useful when the table's default encoding cannot be modified.

Complex encodings

If the direct CONVERT isn't effective due to complex encodings, you can perform a two-step conversion with CAST to BINARY and then converting to utf8:

SELECT column1, CONVERT(CAST(column2 AS BINARY) USING utf8) FROM my_table;

Here, decoding the column to binary prior to utf8 conversion can help resolve certain complex character set issues.

Configuring your environment

Before starting the encoding conversion, ensure your MySQL server supports UTF8. Check the current configuration using SHOW variables LIKE 'character_set_%';.

To better accommodate for unicode characters and emoji, convert your database character set to utf8mb4:

ALTER DATABASE dbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

Pitfalls of encoding conversion

Converting encodings can lead to data corruption, particularly when the original encoding isn't backed up. Always back up data before undertaking conversions.

UTF-8 vs. UTF8MB4, CAST vs. CONVERT

  • utf8 in MySQL does not fully support the standard UTF-8 encoding. Always use utf8mb4.
  • In case of binary safety, stick with CAST. For charset conversion, CONVERT is your soldier.

Data integrity maintenance

  • Test and examine your data post-conversion to ensure data integrity.
  • Character sets should be consistent at server, database, table, and column levels.
  • Connection, database, tables, and columns should be set to utf8mb4 to avoid encoding issues.