How to test if a string contains one of the substrings in a list, in pandas?

python

pandas

regex

string-matching

byAlex Kataev·Feb 22, 2025

For a quick substring check within a Pandas series, craft a regex pattern from your list, like ['substr1', 'substr2', ...], and employ the str.contains:

import pandas as pd

# DataFrame with a column to check
df = pd.DataFrame({'column': ['text1', 'text2', ...]})

# List of substrings to search for
substrings = ['substr1', 'substr2', ...]

# The "Sherlock Holmes" one-liner to detect substrings
df['matches'] = df['column'].str.contains('|'.join(substrings))

Crazy to think that Sherlock Holmes could solve cases in one line – quite elementary, my dear Watson! When your substrings have special characters, use re.escape to avoid regex smelling a rat:

import re

# CSI team: Escaping special characters in substrings
escaped_substrings = [re.escape(substring) for substring in substrings]
regex_pattern = '|'.join(escaped_substrings)

# Sherlock Holmes strikes again with accurate matches
df['matches'] = df['column'].str.contains(regex_pattern)

Detecting substrings: The detective's guide

Matching substrings can feel like a detective mystery. Let's decipher it:

Discarding case sensitivity

Turn your detective code into Hawaii with the case parameter:

# It's always sunny in Philadelphia, but case insensitive in Python
df['matches'] = df['column'].str.contains(regex_pattern, case=False)

Interpreting missing values (the missing person's case)

When values go missing (NaN), use the na parameter to decide if they're innocent or guilty:

# Missing values tend to run away. Use 'na' to put them back in the line-up
df['matches'] = df['column'].str.contains(regex_pattern, na=False)

Dealing with false positives (The Usual Suspects)

Some words like 'pet' could cause mistaken identities (false positives). To clear their name, use negative lookahead:

# Time to find Keyser Söze among the usual suspects
regex_pattern = r'(?<!pet)' + regex_pattern
df['matches'] = df['column'].str.contains(regex_pattern)

Pandas detective tricks: From rookie to pro

From the rookie's first day on the beat to the seasoned pro, Pandas presents tools for everyone:

Lambda: For the crafty detective

The crafty detective uses a lambda with apply for those tough-to-crack cases:

# Lambda, Lambda, Lambda! Revenge of the Nerds' detective trick
df['matches'] = df['column'].apply(lambda x: any(sub in x for sub in substrings))

Binary storage: No grey areas

For a verdict beyond reasonable doubt, store your results as binary values:

# 1 for guilty, 0 for innocent - welcome to the binary justice system
df['matches'] = df['column'].str.contains(regex_pattern).astype(int)

The 're.compile' hook : When regex strikes back

When regex patterns get twisted, re.compile comes to the rescue:

# Pattern coming through! Make way for your compiled regex
pattern = re.compile(regex_pattern)
df['matches'] = df['column'].str.contains(pattern)

The science of detection

In the world of data, we often find ourselves playing the detective. Luckily, with Python's Pandas library, we have a great forensic toolkit at our disposal:

.str.contains(): the fingerprinting kit, finding direct evidence of substrings.
'|' operator: the forensic combinator, identifying multiple clues at once.
re.escape(): the technical expert, ensuring we don't get tripped up by slippery characters.
apply with lambda: the advanced investigator, performing complicated forensic examinations.

Crafting better queries

Boost your detective skills with these methods:

On-point search with exclusions

Sharpen your findings by excluding unwanted suspects:

# "You are not under arrest" - Negative lookaheads to the rescue
regex_pattern = r'^(?!.*(unwanted1|unwanted2)).*'
df['matches'] = df['column'].str.contains(regex_pattern)

Scaling up with external resources

The external regex library provides both enhanced performance and maneuverability over the built-in re:

import regex

# The art of casting, perfected with regex.search()
df['matches'] = df['column'].apply(lambda x: bool(regex.search(regex_pattern, x)))

Extractive information

Beyond mere detection, str.extract helps harvest the matching substring:

# Extract - not just for delicious honey!
df['extracted_substring'] = df['column'].str.extract(f'({regex_pattern})')

explain-codes / Python / How to test if a string contains one of the substrings in a list, in pandas?

Linked

Drop columns whose name contains a specific string from pandas DataFrame



Search for "does-not-contain" on a DataFrame in pandas



Filter dataframe rows if value in column matches a set list of values



Filter pandas DataFrame by substring criteria



How to filter rows containing a string pattern from a Pandas dataframe



How to determine whether a Pandas Column contains a particular value



Check if multiple strings exist in another string



Detecting substrings: The detective's guide Pandas detective tricks: From rookie to pro Crafting better queries