How do I do a case-insensitive string comparison?

python

case-insensitive-comparison

unicode-normalization

python-string-methods

byAlex Kataev·Aug 29, 2024

For a quick case-insensitive comparison in Python, transform both strings to the same case by using either lower() or upper():

is_equal = "Hello".lower() == "hello".lower()

The result, is_equal, will return a value of True if the two strings match, irrespective of the original case.

In Python 3, for a more accurate case-insensitive comparison, particularly for Unicode strings, use casefold():

is_equal = "Hello".casefold() == "hello".casefold()

Working with Python string methods

Applying the `casefold()`

The best strategy for case-insensitive comparisons in Python 3 is using casefold(). This method is the Jedi Master in managing more complex Unicode scenarios, far beyond the simple English alphabet.

Accented characters and Unicode considerations

With Unicode strings, particularly those with accents or diacritics, unicodedata.normalize is your knight in shining armour:

import unicodedata
normalized_str1 = unicodedata.normalize('NFKD', "Héllo").casefold() # Hello, Héllo! 🇫🇷
normalized_str2 = unicodedata.normalize('NFKD', "hello").casefold()
is_equal = normalized_str1 == normalized_str2

Pitfalls of upper() and lower()

Beware the Dark Side of using a combination of upper() and lower(). It might seem friendly, but:

# This may not be equivalent to `casefold()`
is_equal = "Straße".upper().lower() == "strasse".upper().lower() # German nightmare 🇩🇪

In the labyrinth of Unicode, using casefold() is your safest bet against unforeseen traps.

Advanced Python techniques

Embracing Unicode standard

Python's casefold() treasures the wisdom of the Unicode standard Section 3.13 on caseless matching.

Meeting canon: Canonical and Compatibility matching

Want unmatched precision? Explore canonical or compatibility caseless matching. Various normalization forms like 'NFD' got your back for matching seemingly different but same characters.

Dictionary lookups: A normalization must

Dealings with dictionaries demand normalization functions for accurate case-insensitive lookups. No more surprise outcomes!

Gotta watch out for...

Subtle normalization nuances

Knowing casefold() doesn't guarantee normalization in all cases will save you a lot of debugging time. Some characters may require extra attention to achieve an accurate comparison.

Unintuitive Unicode surprises

Due to Unicode's intricate web, text.lower() not being equal to text.upper().lower() can occur. You've been warned! ⚠️

Localized operations

For case-insensitive comparison in locale-aware scenarios, dive into locale.strxfrm(). For case-insensitive regex matching, re.IGNORECASE plays a vital role.

explain-codes / Python / How do I do a case-insensitive string comparison?

Linked

Elegant Python function to convert CamelCase to snake_case?



How do I check if a string is Unicode or ASCII?



What is the best way to remove accents (normalize) in a Python unicode string?



Convert Unicode to ASCII without errors in Python



How can I percent-encode URL parameters in Python?

