Write to UTF-8 file in Python
Use open with 'w' and encoding='utf-8':
The above code precisely writes an UTF-8 encoded file.txt. But what about the byte order mark (BOM) you ask? That's where 'utf-8-sig' comes to the rescue.
UTF-8 with BOM: friend not foe
To create a UTF-8 file complete with a BOM, use 'utf-8-sig':
'utf-8-sig' quietly adds BOM, no manual labor necessary.
Why stop at the basics?
Inspecting file encoding
Python doesn't include a built-in tool for detecting file encoding, but you can run external commands in a pinch:
External commands: Where there's a subprocess, there's a way.
Unicode in disguise
To add the BOM manually, go for:
Or, better yet, you can summon it by name:
Declaring script encoding: because manners matter
Start your Python script with an encoding declaration to ensure UTF-8 handling without a hitch:
No non-ASCII character left behind!
Remember to clean up
Python's context managers like with close files for you, but when using codecs.open or file, remember to close():
Your OS will thank you for not leaving file descriptors hanging.
Venturing into special cases
Keeping it simple: UTF-8 without BOM
A BOM can sometimes shake things up, leading to problems. Keep it simple with encoding='utf-8':
UnicodeDecodeError: Not on my watch
Occasionally a UnicodeDecodeError can sneak by, often when bytes and strings are mistaken for each other. Make sure your input encoding matches your output.
Writing exotic characters: Tame the beast
Python isn't fazed by unusual characters. For the truly exotic, use Unicode escape sequences or named characters:
Was this article helpful?