How do I select rows from a DataFrame based on column values?

python

dataframe

pandas

performance

byAnton Shumikhin·Sep 30, 2024

Need something quickly? Try boolean indexing to filter DataFrame:

filtered_df = df[df['Column'] == value]

Working with multiple conditionals? Bring in bitwise operators & (and), | (or):

filtered_df = df[(df['Column1'] > value1) & (df['Column2'] < value2)]

Encase conditions in parentheses for a quick subset extraction.

The power of list values and `isin`

We often have a mixture of values for criteria selection. The isin method is like Batman's utility belt for this task – equipped to handle it deftly.

heroes = ['Bruce', 'Clark', 'Peter']  # our subject list
strong_df = df[df['Superhero'].isin(heroes)]  # select with isIn

Bah! Need to banish few villains? isin can do that too with a bit of swag.

strong_df = df[~df['Superhero'].isin(heroes)] # exclude with ~

Leveraging `query` for readability and complexity

When we have more conditions than Infinity stones, the query method unfolds like Dr. Strange's spell:

df.query('Ironman > @ts & Spiderman < @web') # @ts and @web define local variables.

Use backticks when you have columns with spaces:

df.query('`Tony Stark` == "Ironman"') # Backticks to the rescue!

Picking values like Thanos with range selections

Snap your fingers and wipe out the undesired!

infinity_df = df[(df['Stone Power'] >= low) & (df['Stone Power'] <= high)]

Remember! We wrap conditions in parentheses. Like a solid suit, they protect our logical operators from unwanted evils.

Faster than Quicksilver with NumPy

Time to put on your speed-force suit with NumPy when performance matters most:

import numpy as np

mask = np.in1d(df['Speed'].values, speed_list) # Holy Speed!
quicksilver_df = pd.DataFrame(df.values[mask], df.index[mask], df.columns)

The indexing power stone: optimize your data selection

The Infinity Gauntlet of data science – indexing can make your DataFrame selection infinitely quick:

df.set_index('Infinity Stones', inplace=True) # Snap! Infinity stones are your index now.
infinity_df = df.loc[specific_value]

Befriending the unpredictable: avoiding problems with indexing

An unstable index can behave like a Hulk rampage. To calm the beast, sometimes you need to reset the index:

df.reset_index(drop=True, inplace=True)

Text filters: Your lexical Doctor Strange

Working with The Book of Cagliostro a.k.a text data? Pandas can work its magic efficiently:

spelled_df = df[df['Spell Book'].str.contains('Alchemy')] # Filter rows like a proficient magician.

For start and end matches, incantations like .str.startswith() or .str.endswith() are quite handy.

explain-codes / Python / How do I select rows from a DataFrame based on column values?

Linked

Filter dataframe rows if value in column matches a set list of values



Use a list of values to select rows from a Pandas dataframe



Drop columns whose name contains a specific string from pandas DataFrame



Search for "does-not-contain" on a DataFrame in pandas



How to select all columns except one in pandas?



Selecting with complex criteria from pandas.DataFrame



Pandas: filter rows of DataFrame with operator chaining



The power of list values and Leveraging for readability and complexity Picking values like Thanos with range selections Faster than Quicksilver with NumPy The indexing power stone: optimize your data selection Befriending the unpredictable: avoiding problems with indexing Text filters: Your lexical Doctor Strange