Unicodeencodeerror: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)
To combat a UnicodeEncodeError
due to a character like u'\xa0'
or others, use the str.encode('utf-8')
method to encode your string to UTF-8. This encoding turns Unicode into a byte-string, representing the correct characters in UTF-8 format which has a far larger repertoire than ASCII.
Example:
This solution guarantees that your_string
with its menagerie of non-ASCII characters is converted safely.
Spotting and Squashing Escape Characters
A prevalent instigator in UnicodeEncodeError
instances within Python is the non-breaking space character u'\xa0'
. This character is a common transplant when copying text from web sources or text applications. Not your usual space, it's designed to prevent line breaks at its location.
It's either best to replace these non-breaking spaces with regular space characters prior to encoding, or to employ an encoding like UTF-8 that can handle this character.
Example:
Locale Tyranny and Encoding Usurpers
An often overlooked aspect when dealing with encoding issues are your locale settings and your Python environment's default encoding. On some occasions, particularly in Unix-like systems, the locale defaults to "C" or "POSIX" which unfortunately do not support UTF-8. You can uncover and rectify this using the following shell commands:
For scripts that have to run on a variety of systems, always make sure that you're either configuring the environment correctly, or defining the encoding explicitly within your Python code.
Your Arsenal for Python Unicode Handling
In addition to encoding to UTF-8, a simple solution to avoid the UnicodeEncodeError
is to avoid using the str()
function implicitly, as it defaults to ASCII encoding. Don't mix character sets unintentionally. Deal with Unicode strings consistently throughout your application and only encode when you have to.
There also exist encoding dark arts like your_string.encode('ascii', 'ignore')
which, while compelling, can lead to data loss. It's worth underlining that although these remedies may fly by errors, they could also be stripping your text of vital characters.
Take careful consideration of multilingual content and internationalization; mishaps in encoding can create substantial headaches for users in different locales.
Was this article helpful?