Explain Codes LogoExplain Codes Logo

How do I count the NaN values in a column in pandas DataFrame?

python
pandas
dataframe
missing-values
Anton ShumikhinbyAnton Shumikhin·Jan 1, 2025
TLDR

To count NaN values in a single column, use isna().sum() with pandas:

import pandas as pd nan_count = df['column_name'].isna().sum() # Counts the ghostly NaNs lurking in your data

isna() flags NaNs and sum() adds them up. To extend this to multiple columns, just omit the column specification:

nan_counts = df.isna().sum() # Counts all the invisible NaN ninjas in each column.

Here, nan_counts gives you the NaN count for each column.

Counting NaN in big leagues: Handling large datasets

Even if your DataFrame is the size of the Amazon (the forest, not the company), isna() handles it like a champ. And if you're hungering for a percentage of à la carte NaNs, here's what you can serve up:

nan_percent = 100 * df.isna().sum() / len(df) # NaN values are counted, done up and served in percentages.

To see the count and percentage side by side, think of it like an arranged marriage between the two using pd.concat():

nan_summary = pd.concat([df.isna().sum(), nan_percent], axis=1) nan_summary.columns = ['NaN Count', 'NaN Percentage'] # It's always beneficial to label your twins.

NaN counts: Alternatives, extensions and pro tips

isna() works great but its doppelgänger isnull() can play the same role; it's just semantics. Watch out for the friendly neighborhood performance differences; isna() may be faster in several cases, akin to Flash vs Superman. And yes, isna() and isnull() can spot other missing values like None or NaT, they don't discriminate.

Want to sort the NaNs, sort of like grading your students? .sort_values() is your man (method):

nan_counts_sorted = df.isna().sum().sort_values(ascending=False) # Naughty NaNs, you can't hide anymore!

This is fine, but what about a comprehensive summary? Let's introduce a handy custom function:

def missing_values_summary(dataframe): nan_count = dataframe.isna().sum() nan_percentage = 100 * nan_count / len(dataframe) return pd.DataFrame({'NaN Count': nan_count, 'NaN Percentage': nan_percentage}) summary = missing_values_summary(df)

Beyond counting: NaN handling, cleaning data

Sometimes, you may want to get rid of NaNs (they didn't offend you, they're just not quite all there). Use dropna() to kick them out and fillna() to replace them:

# Drop rows with NaNs in 'column_name' clean_df = df.dropna(subset=['column_name']) # Because they can't sit with us! # Fill NaNs with a specified value filled_df = df.fillna(value=0) # You either die a hero, or you live long enough to see yourself become a zero.

Trickster NaN: When isna() does not behave as expected

Keep in mind, isna() might not spot null objects in a DataFrame's 'object' datatype columns (It's like it's wearing 'NaN-blind' glasses). For string columns, use df.replace('', np.nan) to convert empty strings to np.nan before you unleash the mighty isna().

Non-numeric data: Handling categorical NaN

For your non-numeric data, value_counts() with the dropna=False flag comes handy, although it doesn't directly tally NaNs (It's like an indirect WhatsApp message):

nan_in_strings = df['non_numeric_column'].value_counts(dropna=False)