Write to UTF-8 file in Python
Use open
with 'w'
and encoding='utf-8'
:
The above code precisely writes an UTF-8 encoded file.txt
. But what about the byte order mark (BOM) you ask? That's where 'utf-8-sig'
comes to the rescue.
UTF-8 with BOM: friend not foe
To create a UTF-8 file complete with a BOM, use 'utf-8-sig'
:
'utf-8-sig'
quietly adds BOM, no manual labor necessary.
Why stop at the basics?
Inspecting file encoding
Python doesn't include a built-in tool for detecting file encoding, but you can run external commands in a pinch:
External commands: Where there's a subprocess
, there's a way.
Unicode in disguise
To add the BOM manually, go for:
Or, better yet, you can summon it by name:
Declaring script encoding: because manners matter
Start your Python script with an encoding declaration to ensure UTF-8 handling without a hitch:
No non-ASCII character left behind!
Remember to clean up
Python's context managers like with
close files for you, but when using codecs.open
or file
, remember to close()
:
Your OS will thank you for not leaving file descriptors hanging.
Venturing into special cases
Keeping it simple: UTF-8 without BOM
A BOM can sometimes shake things up, leading to problems. Keep it simple with encoding='utf-8'
:
UnicodeDecodeError: Not on my watch
Occasionally a UnicodeDecodeError can sneak by, often when bytes and strings are mistaken for each other. Make sure your input encoding matches your output.
Writing exotic characters: Tame the beast
Python isn't fazed by unusual characters. For the truly exotic, use Unicode escape sequences or named characters:
Was this article helpful?