How to filter rows containing a string pattern from a Pandas dataframe

python

dataframe

regex

pandas

byAnton Shumikhin·Dec 17, 2024

Easily filter rows with a specific string pattern in your DataFrame, use the str.contains() function, which returns a boolean series to efficiently select rows.

# if 'data' is your DataFrame & 'pattern' is your string
filtered_data = data[data['column'].str.contains('pattern', na=False)]  # like finding Waldo in data

This line of Python will sift through your dataframe and return rows where the 'column' contains the 'pattern'.

Extra toppings

Not a fan of case sensitivity? Throw in case=False to ignore it.
For complex sauce, sorry, I mean search, leverage the regular expression power.
If there's risk of NA/NaN values being a party pooper, pass na=False to politely ignore them.

Going beyond plain vanilla: regex and data type checks

Checking data type before the party

To avoid a potential mess, ensure your 'column' has string data type. If not, you can convert it.

data['column'] = data['column'].astype(str)  # like waving a magic wand and changing frogs into princes!

Sprinkling some regex magic

Regular expressions allow for complex pattern matching. If 'ball' is the start of the party:

data[data['column'].str.contains('^ball', regex=True)]  # knock knock. Who's there? The regex train!

When NA/NaN values show up uninvited

To exclude NA/NaN values from your elite party-search, ensure to have na=False:

data[data['column'].str.contains('pattern', na=False)]  # door policy: no NaNs!

When Vanilla doesn't cut it: Advanced moves

Filtering in a custom way

Set a column as the index and let some filter magic happen with .filter(like='pattern'):

filtered_data = data.set_index('ids').filter(like='pattern', axis=0)  # like Sherlock looking for clues

Let Regex do its thing

For regex-powered row filtering, tailor .filter(regex='your_regex') to your need:

filtered_data = data.set_index('ids').filter(regex='^pattern', axis=0)  # first rule of regex club: start with '^'

This pattern filters for indices starting with 'pattern'.

The right tool for the job

Useful scenarios

Need exact phrase matching? Use case-sensitive searches.
For flexible string matching, get your hands dirty with regex.
Multiple conditions? Just chain them.

Tricky situations

Data types misalignment can cause filtering failures.
Ignoring case sensitivity may result in lost matches.
Not considering the impact of NaN elements can induce unexpected outcomes.

explain-codes / Python / How to filter rows containing a string pattern from a Pandas dataframe

Linked

Search for "does-not-contain" on a DataFrame in pandas



How to test if a string contains one of the substrings in a list, in pandas?



Drop columns whose name contains a specific string from pandas DataFrame



Filter pandas DataFrame by substring criteria



Check if string matches pattern



Remove unwanted parts from strings in a column



Hidden features of Python



Going beyond plain vanilla: regex and data type checks When Vanilla doesn't cut it: Advanced moves The right tool for the job