Explain Codes LogoExplain Codes Logo

How to check if a string in Python is in ASCII?

python
functions
unicode
encoding
Anton ShumikhinbyAnton Shumikhin·Jan 11, 2025
TLDR

To perform an ASCII validation in Python 3.7 and above, utilize the function isascii():

print("ASCII" if my_string.isascii() else "Not ASCII")

In earlier versions of Python, ASCII validation can be done using encoding:

print("ASCII" if my_string.encode('ascii', 'ignore') == my_string.encode() else "Not ASCII")

Select .isascii() for ease or encode() for extended compatibility.

Probing into Python strings

When working with strings in Python, it's essential to understand that Python strings don't inherently contain any properties marking them as ASCII or UTF-8. In Python, by default, every string is a sequence of Unicode characters.

One way to check if your string is ASCII, is to use the built-in function str.isascii() in Python 3.7 and onwards.

Working with decode() and exceptions

One can leverage the decode() function to check if a particular string is ASCII-encoded.

try: my_string.decode('ascii') print("ASCII") except UnicodeDecodeError: print("Not ASCII")

Here, we are trying to decode the string in ASCII. If it fails, it's not ASCII - simple!

ord() function

The ord() function is handy to check whether individual characters fall within the ASCII range. But, hold your horses! It's not your knight in shining armor for checking if your entire string has ASCII-encoded characters or not.

is_ascii = all(ord(c) < 128 for c in my_string)

This approach will check whether each character in your string belongs to the ASCII family.

Digging deeper - understanding your string's origin

Each string has a life story to tell. It may have originated from a file, from a user's keyboard input, or even from some data fetched from a website. Understanding the source of your string helps shine some light on how it's encoded. Let's go Sherlock Holmes on this string!

Additional methods and edge cases

Nonetheless, it's always worthy to know more ways to solve a problem. So, let's explore some more approaches to check if a string is ASCII-encoded.

In the world of encoding

is_ascii = len(my_string) == len(my_string.encode())

Here, we're trying to see if encoding an ASCII string to UTF-8 changes its length. It’s like asking if the string gained some weight after a heavy meal!

Diving in another direction - looking at Unicode

Wait a minute! What's that I see? encode() with an 'ignore' as argument? That's right—this method can also handle the conversion of non-ASCII characters!

ascii_string = my_string.encode('ascii', 'ignore') is_ascii = ascii_string.decode('ascii') == my_string

ord() revisited

Going back to ord(c) < 128, this helps identify whether a character is an ASCII character or not.

for c in my_string: if ord(c) >= 128: print(f"Non-ASCII character found: {c}") # Who's that non-ASCII Pokémon? break

Advanced concept: normalizing Unicode

Normalization ensures that Unicode strings that look the same will also look the same when encoded, this can be especially useful when checking for ASCII strings:

import unicodedata normalized_string = unicodedata.normalize('NFD', my_string) is_ascii = normalized_string.isascii()

Here, we're normalizing the string to 'NFD' which, if it goes well, should convert some characters to ASCII-compatible forms.