Explain Codes LogoExplain Codes Logo

Drop columns whose name contains a specific string from pandas DataFrame

python
dataframe
list-comprehensions
pandas
Nikita BarsukovbyNikita Barsukov·Feb 6, 2025
TLDR

Remove columns containing a certain substring from a pandas DataFrame using:

df = df.drop([col for col in df.columns if 'substring' in col], axis=1)

This single liner filters out any columns which have 'substring' in their title directly from df.

Enhanced column dropping techniques

Here's a deeper dive into different strategies to employ in varying scenarios.

Match irrespective of case

For case-insensitive substring matches, incorporate lower():

df = df.drop([col for col in df.columns if 'substring'.lower() in col.lower()], axis=1) # Case insensitive: because sometimes, we all scream internally.

Regular expression power move

To use regex patterns for dynamic selections, call str.contains:

df = df[df.columns[~df.columns.str.contains('substring', regex=True)]] # Regex to the rescue, like Spiderman on a web(geddit?)

Exclude list comprehension with filter

Skip list comprehensions and use filter directly:

df = df.drop(df.filter(like='substring').columns, axis=1) # 'Filter': for when you've filtered enough coffee, but need to filter data.

Startswith and endswith magic

To remove columns starting or ending with a string, use these functions:

df = df.drop([col for col in df.columns if col.startswith('substring')], axis=1) df = df.drop([col for col in df.columns if col.endswith('substring')], axis=1) # When you want control: not just any string, but the start or end string!

Make sure to use axis=1 for column-based operations.

Mitigate common pitfalls

Awareness of common issues guides better coding practice.

Dodging SettingWithCopyWarning

SettingWithCopyWarning is a common hiccup when performing operations on a DataFrame. Use copy():

df_clean = df.drop([col for col in df.columns if 'substring' in col], axis=1).copy() # Like taking photocopies: now the original document is safe!

Dynamic selection expansion

With larger datasets or complicated patterns, you may need to combine multiple conditions:

df = df.drop(df.columns[(df.columns.str.contains('substring')) | (df.columns.str.contains('another'))], axis=1) # Combining conditions, because who doesn't love a good combo?

Keeping original DataFrame intact

To maintain the original DataFrame, assign the output to a new variable:

df_filtered = df.drop([col for col in df.columns if 'substring' in col], axis=1) # Keep the old, bring in the new. Life (and data) sorted!

Streamlining your data workflow

Working with pandas is not just about raw power, but also about performing with finesse.

Harness power tools

str.contains, str.startswith, str.endswith. These are highly useful pandas functions that allow your code to dance.

Python wrangling made easy

Python flourishes with features like list comprehensions and lambda functions. Use them to shine with pandas.