Explain Codes LogoExplain Codes Logo

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

python
dataframe
pandas
data-cleaning
Anton ShumikhinbyAnton Shumikhin·Oct 13, 2024
TLDR

To get rid of rows with NaNs in a specified column:

df = df.dropna(subset=['your_column'])

While this one-liner essentially solves the issue, nuances lurk beneath the surface. Let's uncover some gems that deal with NaN values, cater to special situations, and adopt best practices for DataFrame sanitization.

Drop NaN with Different Scopes and Conditions

Selective removal, or how I learned to not kill all the NaNs in sight

Weed out NaN values directly from one or multiple columns:

df = df.dropna(subset=['your_column']) # Single column df = df.dropna(subset=['col1', 'col2', 'col3']) # Multiple columns # "We don’t need no nullification; We don’t need no NaN control" 🎵🎸

Dropping NaN in-place; because, who likes re-assignments?

Save the result back into df without additional line of code:

df.dropna(subset=['your_column'], inplace=True) # 🎶 Hit me NaN one more time! 🎶 Oops..it's gone forever. 😉

Cut-off Thresholds; Data cleaning meets Highjump

Remove rows that don't meet a certain count of non-NaN values (the threshold):

df = df.dropna(thresh=n) # n is the cutoff number of non-NaN values # Rows failing to jump over our set bar (n) are out of the DataFrame race!

All or nothing, the NaN version

A stricter approach—discard rows that have NaN values in all columns:

df = df.dropna(how='all') # "All for NaN, and NaN for all!" said no row ever after this round. 😂

The Boolean Mask, not a new Superhero

Create a mask for rows with valid values and apply it:

mask = df['column_of_interest'].notna() df = df[mask] # My Boolean Mask brings all the rows to the yard, and NaN’s like: It's better than y'all!

The Whole Shebang: Dropping rows/columns, thresholds and masks

Sweep, don't weep

Get rid of any row that contains at least one NaN value with df.dropna(how='any'):

df = df.dropna(how='any') # NaN values are like uninvited party guests, they don’t get past this bouncer.

X-rays for your DataFrame

Before deleting, it helps to know where the NaNs are. With isna().any(axis=1) you see the affected rows:

nan_rows = df.isna().any(axis=1) num_nan_rows = nan_rows.sum() # It’s like playing ‘Where’s Waldo?’ but with NaN values. 🧐

Null-cypher: Dropping columns with NaNs

What if columns rather than rows are jammed with NaN values? You deal with them by just switching the axis in dropna:

df = df.dropna(axis=1) # Drops any column with at least one NaN value

Understand NaNs: Sometimes Absence Makes the Data Grow Fonder

NaNs aren't always a problem; they might signify missing information. Sometimes, imputing with a statistical measure (mean, median) or a constant value therefore can be more beneficial. Also, the pattern of NaNs can provide insights on data quality or bias.