Unicode (UTF-8) reading and writing to files in Python
Dealing with UTF-8 in Python is easy as pie, just use open
and don't forget encoding='utf-8'
:
Read file:
Write file:
Remember, 'utf-8'
is your best friend when dealing with Unicode characters in files.
Python 2 and 3: Dealing with differences
Using Python 2? io.open is your lifesaver
Python 2 doesn't have native encoding
in open()
. Fear not, io.open
is here to rescue:
Python 3 encoding: Simple and elegant
Python 3 saw encoding and said, "I got this, fam!":
codecs: An old friend in need
codecs
serves as io
's alternative and can do wonders for you:
One word of caution: Mixing read()
and readline()
could brew a chaotic concoction with codecs.open
.
Encounter of the encoding kind
Handling errors: An art
The errors
parameter in open
might just save your day if encoding/decoding errors arise:
When bytes bite
If special characters are involved, open as bytes and decode:
Python way of escape
It's all about escape
When ASCII to Unicode comes in, understanding escape sequences is no less than a magic trick:
In Python 2.x:
In Python 3.x:
The Great Unicode Divide
Knowledge is key: Understanding Unicode handling in Python 2.x and 3.x can save a day's worth of headache. Python 2 gives you a u
for Unicode strings. But in Python 3, every string is a beautiful unicorn... I mean, Unicode.
References
Was this article helpful?