Explain Codes LogoExplain Codes Logo

How do I check if a string is Unicode or ASCII?

python
functions
best-practices
collections
Nikita BarsukovbyNikita Barsukov·Feb 11, 2025
TLDR

To check if a string contains only ASCII characters in Python 3.7+, use the str.isascii() function:

def is_ascii(s): return s.isascii() # Usage print(is_ascii("ASCII")) # Prints "True", like a well-behaved function print(is_ascii("Únícodę")) # Prints "False", it's a Unicode imposter!

This built-in method is fast, clean, and direct - returning True for pure ASCII and False otherwise.

Checking string types

In Python 3 onwards, all strings are Unicode, but they might contain only ASCII characters or a mix of Unicode and ASCII characters. isinstance(*)(*) can be your knight in shining armor for type checking.

Spotting Unicode string

With the help of isinstance(obj, str), you can find the true identity of a Unicode string in disguise:

def is_unicode_string(obj): return isinstance(obj, str) # The name's String, Unicode String

Catching a bytearray

Bytearrays resemble chameleons, storing ASCII, encoded Unicode, or non-textual data. To catch them red-handed, adorn isinstance(obj, bytes) on your toolkit:

def is_bytestring(obj): return isinstance(obj, bytes) # It ain't heavy, it's my bytestring

Deciphering bytestrings

Use .decode() for Turing-like decoding. If the said bytearray is a UTF-8 or ASCII agent, our decoding trick can suss it out:

def decode_bytestring(byte_data, encoding='utf-8'): try: return byte_data.decode(encoding) # Surrender your secrets, bytearray! except UnicodeDecodeError: return None # Or swallow your secrets if it's indigestible

Dealing with Python 2 & 3

Though Python 2 has retired, it occasionally troubles us. With legacy codes still in the wild, one needs to be agile for accurate checks.

Checking strings in Python 2

In the world of Python 2, we had ASCII str and Unicode unicode. To check for any string or string-like beings, call isinstance(x, basestring):

def is_string_like_in_python2(x): return isinstance(x, basestring) # Are you string disguised as basestring?

Juggling between Python 2 and 3

A wise Pythonist once said, "In the face of ambiguity, use a try-except block." It's a golden rule, especially dealing with encoding:

try: unicode_data = byte_data.decode('utf-8') # Dance, my bytes, as Unicode now! except AttributeError: unicode_data = byte_data # When bytes refuse to dance, accept the reality

When transform bytes to Unicode?

Transforming between bytes and Unicode is like handling radioactive material. Granted, bytes can sometimes don the mask of encoded data, but they shouldn't permanently morph into Unicode. Here's how:

if is_bytestring(data): text = decode_bytestring(data) # Only delegate decoding when it's showtime!

String encoding: the Pythonic approach

Python's elegance lies in its simplicity. Instead of walking on a tightrope to discern between string types, lean on Python's built-in capabilities (encode() and decode()) to handle encodings.

Encoding with .encode()

The .encode() method turns a gentle string into fierce bytes when needed:

unicode_string = "Hello, Unicode!" byte_string = unicode_string.encode('utf-8') # Go bytes! Take the Unicode form

Decoding with .decode()

Similarly, .decode() kindly morphs a bytestring back into a peaceful Unicode string:

recovered_string = byte_string.decode('utf-8') # Bytes relinquished the cloak of Unicode

Avoid the type trap

Avoid the type(x) == y comparison. It's a sneaky trap that doesn't play nice across Python versions. Instead, acclaim the reliability of isinstance() checks.