Explain Codes LogoExplain Codes Logo

How to extract numbers from a string in Python?

python
regex
string-extraction
data-cleaning
Nikita BarsukovbyNikita Barsukov·Feb 17, 2025
TLDR

Quickly extract numbers from a string in Python using re.findall() from the re module with a simple regex pattern \d+, capturing the sequence of any digits:

import re numbers = re.findall(r'\d+', "Example string 123 with 456") print(numbers) # Outputs: ['123', '456']

For transforming the output to integers, we can utilize: list(map(int, numbers)).

More complex extraction: Negative numbers, floats

Positive integers are a piece of cake, but what if the game gets tougher - entering negative numbers and floats? Our regex pattern needs an upgrade: [-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?.

import re complex_numbers = re.findall(r'[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?', "Example -123 with +456.78 and 3.14e-10") print(complex_numbers) # Outputs: ['-123', '+456.78', '3.14e-10']

To convert strings into numerical values, we can do:

numeric_values = [float(num) if '.' in num or 'e' in num.lower() else int(num) for num in complex_numbers] print(numeric_values) # Outputs: [-123, 456.78, 3.14e-10]

The isdigit approach: Extraction via list comprehension

When dealing with simpler situations looking for positive integers, list comprehension along with str.isdigit() method cases presents an alternative approach to regex:

extracted_numbers = [int(part) for part in "123 main st.".split() if part.isdigit()] print(extracted_numbers) # Outputs: [123]

Negotiating with formatted strings: Handling commas and separators

Ran into formatted numbers like "1,000" or "2.5M" in your journey? Worry not! With a few extra tricks, we can get past this obstacle:

import re formatted_numbers = "The event gathered 1,000 enthusiasts and raised 2.5M dollars." # Pre-process: Combat separators and format indicators cleaned_string = formatted_numbers.replace(',', '').replace('M', '000000') # Extract numbers extracted_values = re.findall(r'\d+', cleaned_string) print(extracted_values) # Outputs: ['1000', '2500000']

Remember to adjust this strategy to tackle international number formats or specific application needs.

Robust extraction for Real-world applications: Dealing with Edge cases

Real-world data can be messy and unpredictable. To make our extraction process more robust, we should handle potential errors and unexpected inputs using try-except blocks:

raw_data = ["123", "4.5", "-inf", "NaN", "seven"] def safe_convert(value): try: return float(value) except ValueError: # Oops! Float conversion failed. Return None. return None converted_data = [safe_convert(item) for item in raw_data]

This allows us to gracefully handle non-numeric strings and take care of special cases like "-inf" and "NaN".