How to check if a string in Python is in ASCII?
To perform an ASCII validation in Python 3.7 and above, utilize the function isascii()
:
In earlier versions of Python, ASCII validation can be done using encoding:
Select .isascii()
for ease or encode()
for extended compatibility.
Probing into Python strings
When working with strings in Python, it's essential to understand that Python strings don't inherently contain any properties marking them as ASCII or UTF-8. In Python, by default, every string is a sequence of Unicode characters.
One way to check if your string is ASCII, is to use the built-in function str.isascii()
in Python 3.7 and onwards.
Working with decode() and exceptions
One can leverage the decode() function to check if a particular string is ASCII-encoded.
Here, we are trying to decode the string in ASCII. If it fails, it's not ASCII - simple!
ord() function
The ord()
function is handy to check whether individual characters fall within the ASCII range. But, hold your horses! It's not your knight in shining armor for checking if your entire string has ASCII-encoded characters or not.
This approach will check whether each character in your string belongs to the ASCII family.
Digging deeper - understanding your string's origin
Each string has a life story to tell. It may have originated from a file, from a user's keyboard input, or even from some data fetched from a website. Understanding the source of your string helps shine some light on how it's encoded. Let's go Sherlock Holmes on this string!
Additional methods and edge cases
Nonetheless, it's always worthy to know more ways to solve a problem. So, let's explore some more approaches to check if a string is ASCII-encoded.
In the world of encoding
Here, we're trying to see if encoding an ASCII string to UTF-8 changes its length. It’s like asking if the string gained some weight after a heavy meal!
Diving in another direction - looking at Unicode
Wait a minute! What's that I see? encode()
with an 'ignore' as argument? That's right—this method can also handle the conversion of non-ASCII characters!
ord() revisited
Going back to ord(c) < 128
, this helps identify whether a character is an ASCII character or not.
Advanced concept: normalizing Unicode
Normalization ensures that Unicode strings that look the same will also look the same when encoded, this can be especially useful when checking for ASCII strings:
Here, we're normalizing the string to 'NFD' which, if it goes well, should convert some characters to ASCII-compatible forms.
Was this article helpful?