How can I find all matches to a regular expression in Python?

python

regex

finditer

match-objects

byNikita Barsukov·Dec 29, 2024

Harness the power of re.findall() to retrieve all occurrences of a regex pattern in a string, in a distilled and streamlined fashion:

import re

matches = re.findall(r"pattern", "search_string")

For instance, if we want words that kick off with an 'S':

matches = re.findall(r"\bS\w+", "The rain in Spain")

The output we get is: ['Spain']. A no-nonsense, effective approach.

Using finditer for better performance

When wrestling with substantial text bodies or requiring more match details, re.finditer() stands as an efficient alternative. It returns an iterator yielding MatchObject instances instead of a list:

matches = re.finditer(r"\bS\w+", "The rain in Spain")
for match in matches:
    print(match.group())   # Prints 'Spain'; finditer is the heavyweight champion.

Squeezing information from MatchObjects

Each MatchObject from re.finditer() is a goldmine of details about each match. You can extract these nuggets of information through methods such as .group(), .start(), .end(), and .groups(). Behold the power of .group():

matches = [m.group() for m in re.finditer(r"(\bS\w+)", "The rain in Spain")]

Findall and capturing groups

If your regular expression incorporates groups, re.findall() brings home just the groups. Given several groups, you receive a list of tuples:

matches = re.findall(r"(\bT\w+)\s(\bS\w+)", "The rain in Spain stays mainly in the plain")

This yields group party pairs: [('The', 'Spain'), ('The', 'stays')].

Beware of regex's greedy nature

Regex can get a bit too eager sometimes. Domesticate its greedy nature with a non-greedy match ? to avoid any surprising findings:

# Greedy match
matches = re.findall(r"<.*>", "<tag>content</tag>")
# Spits: ['<tag>content</tag>']

# Non-greedy match
matches = re.findall(r"<.*?>", "<tag>content</tag>")
# Spits: ['<tag>', '</tag>']
# Greediness cured!

Flags: the secret spices of regex

Meetings with flags like re.IGNORECASE can bring a radical change of attitude. Think of them as secret spices for flavorful results:

matches = re.findall(r"spain", "The rain in Spain", re.IGNORECASE)
# Feeds you: ['Spain']

Dealing with the Unicode dragon

Taming the dragon of Unicode matching is no child's play. Equip yourself with the flag re.UNICODE to secure your regex from any Unicode character inconsistencies:

matches = re.findall(r"\w+", "café", re.UNICODE)
# Lets you enjoy: ['café']  # A hot cup of ‘café’!

explain-codes / Python / How can I find all matches to a regular expression in Python?

Linked

Count number of occurrences of a substring in a string



Is there a simple way to remove multiple spaces in a string?



How to concatenate (join) items in a list to a single string



Finding the index of an item in a list



How to input a regex in string.replace?



Fastest way to check if a value exists in a list



How to filter rows containing a string pattern from a Pandas dataframe



Using finditer for better performance Squeezing information from MatchObjects Findall and capturing groups Beware of regex's greedy nature Flags: the secret spices of regex Dealing with the Unicode dragon