Explain Codes LogoExplain Codes Logo

How to get UTF-8 working in Java webapps?

java
utf-8
encoding
internationalization
Anton ShumikhinbyAnton Shumikhin·Oct 27, 2024
TLDR

To ensure UTF-8 support in Java web apps, set both request and response encodings to UTF-8 using a Filter:

public class Utf8Filter implements Filter { public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException { req.setCharacterEncoding("UTF-8"); // UTF-8 is my mantra res.setContentType("text/html; charset=UTF-8"); // UTF-8 or bust! chain.doFilter(req, res); } // Missing init and destroy, RIP }

In your web pages, specify UTF-8 encoding with a meta tag:

<meta charset="UTF-8">

This ensures your application's data handling and display comply with UTF-8, effectively steering clear of common character encoding issues.

Digging deeper into UTF-8 implementation

Beyond our immediate fix, a comprehensive implementation of UTF-8 support in your application must cover a number of technical bases.

UTF-8 in Tomcat's server.xml

In Tomcat, configure server.xml to use UTF-8 for URL parameter encoding.

<Connector port="8080" URIEncoding="UTF-8" ... />

Setting UTF-8 encoding for JSP pages in web.xml

In web.xml, specify UTF-8 as the encoding for JSP pages.

<jsp-config> <jsp-property-group> <url-pattern>*.jsp</url-pattern> <page-encoding>UTF-8</page-encoding> <!-- UTF-8 wins the encoding pageant --> </jsp-property-group> </jsp-config>

JDBC connections speak UTF-8, too

Define your JDBC connections to use UTF-8 encoding in context.xml.

<Resource ... connectionProperties="characterEncoding=UTF-8;" />

MySQL: UTF-8 is not an afterthought

In defining your MySQL database and tables, specify UTF-8:

CREATE DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Default character set in MySQL server: UTF-8

In your MySQL server configurations (my.cnf or my.ini), set the character_set_server to UTF-8:

[mysqld] character_set_server=utf8mb4

Consistency is key in MySQL procedures

For UTF-8 consistency, specify the character set while defining MySQL procedures and functions.

CREATE FUNCTION myfunction() RETURNS VARCHAR(100) CHARACTER SET utf8mb4 BEGIN // CODE: Beware, UTF-8 magic happening here... END;

A special note on string handling in special cases

GET requests with non-ASCII characters

For URLs containing non-ASCII characters, your application must be able to decode UTF-8 from GET requests correctly.

UTF-8 in Apache+Tomcat+mod_JK settings

With Apache paired with mod_JK, additional settings are essential:

In server.xml for Tomcat:

<Connector port="8080" URIEncoding="UTF-8" ... />

In httpd.conf for Apache:

AddDefaultCharset UTF-8

UTF-8: Going the extra mile

Internationalization libraries and UTF-8

Consider leveraging libraries like ICU for enhanced Unicode and globalisation support.

File I/O with UTF-8 in Java

When performing file I/O operations, UTF-8 encoding eliminates unexpected surprises:

try (Writer writer = new OutputStreamWriter(new FileOutputStream("output.txt"), StandardCharsets.UTF_8)) { writer.write("UTF-8 fun facts inside"); // Easter egg: UTF-8 jokes here }

Logs in UTF-8

Configure logging frameworks (e.g., Log4j, SLF4J) to use UTF-8 encoding for easy log analysis.

Troubleshooting UTF-8 implementation

Database migration to UTF-8

Be thorough during database migration: data, stored procedures, and backups need proper recoding to UTF-8.

Reverse proxies and UTF-8

If you're using reverse proxies, ensure they don't disrupt encoding. Adjust Nginx or Apache settings to preserve UTF-8.

Email and exports in UTF-8

Set UTF-8 encoding for emails and exports (CSV, Excel, etc.) - because they deserve to be beautiful on the inside too.

Onto the references...

References