Explain Codes LogoExplain Codes Logo

Remove characters except digits from string using Python?

python
regex
string-manipulation
functions
Alex KataevbyAlex Kataev·Nov 24, 2024
TLDR
# Who needs words when you have numbers, amirite? import re result = re.sub('\D', '', 'input123text') # '123'

Smoothly eradicate all non-digits using Python's re module. The re.sub() function combined with the regex pattern '\D' gets rid of all non-digits, creating the most eloquent silence.

Digesting the regex recipe

Think of regular expressions as a chef's knife for string manipulation in Python. They slice and dice text based on patterns. Looking to discard all but the beloved digits? Just get '\D' into the skirmish with the mighty re.sub()! Here's your bag of bricks:

import re digit_string = re.sub('\D', '', '123abc456') # Returns: '123456'

Now you're cooking with gas! Or electricity, no appliance shaming here.

Throw in some curveballs:

  • Blank slate: re.sub('\D', '', '') returns ''. If there's nothing, you get nothing. Deep, right?
  • Zero digits: re.sub('\D', '', 'abc') returns '', too. Lack of numbers begets lack of numbers. Sad truth.

Not just one door to the party, folks!

The dynamic duo: filter() and join

In Python 3, filter() and str.isdigit bond like PB&J to commit a heist and take only the digits. Mix in ''.join() for extra flair:

res = ''.join(filter(str.isdigit, 'input123text')) # '123'

The mighty mapper: str.translate()

Python 3 invites you to a masquerade ball with str.maketrans(). Create an identity switch for the non-digits. Send them to string heaven:

trans_table = str.maketrans('', '', '0123456789'.translate(str.maketrans('', '', '\D'))) masked_string = 'input123text'.translate(trans_table) # '123'

Unleashing raw power: generator expressions

Get memory-efficient with generator expressions. Just think of it as a small power generator supplying electricity to a remote cabin.

res = ''.join(c for c in 'input123text' if c.isdigit()) # '123'

Exotica: wrangling special number formats

Floating-point numbers are like balloons - light, airy, but need careful handling to prevent accidents. Here's a piece of regex magic sprinkled in:

float_str = '-123.45abc' result = ''.join(re.findall(r'[\d\.-]', float_str)) # '-123.45'

And voila! Decimal digits living happily with their rightful negative sign.

And now, for the Unicode Strings (Python 3)...

Remember that fancy translate() method? It's back but with a different mask. Pass it a mapping dictionary for Unicode strings:

remove_dict = dict.fromkeys(map(ord, '0123456789'.translate(str.maketrans('', '', '\D')))) res = 'input123text'.translate(remove_dict) # '123'

Best practices in preserving value

Be the valiant protector of number integrity. Consider a formatted amount like "$1,234.56". Decimal points, negative numbers, and nasty leading zeros need careful handling. That's how we maintain usable numerical values. Good job!