How do I count the NaN values in a column in pandas DataFrame?
To count NaN values in a single column, use isna().sum()
with pandas:
isna()
flags NaNs and sum()
adds them up. To extend this to multiple columns, just omit the column specification:
Here, nan_counts
gives you the NaN count for each column.
Counting NaN in big leagues: Handling large datasets
Even if your DataFrame is the size of the Amazon (the forest, not the company), isna()
handles it like a champ. And if you're hungering for a percentage of à la carte NaNs, here's what you can serve up:
To see the count and percentage side by side, think of it like an arranged marriage between the two using pd.concat()
:
NaN counts: Alternatives, extensions and pro tips
isna()
works great but its doppelgänger isnull()
can play the same role; it's just semantics. Watch out for the friendly neighborhood performance differences; isna()
may be faster in several cases, akin to Flash vs Superman. And yes, isna()
and isnull()
can spot other missing values like None
or NaT
, they don't discriminate.
Want to sort the NaNs, sort of like grading your students? .sort_values()
is your man (method):
This is fine, but what about a comprehensive summary? Let's introduce a handy custom function:
Beyond counting: NaN handling, cleaning data
Sometimes, you may want to get rid of NaNs (they didn't offend you, they're just not quite all there). Use dropna()
to kick them out and fillna()
to replace them:
Trickster NaN: When isna()
does not behave as expected
Keep in mind, isna()
might not spot null objects in a DataFrame's 'object' datatype columns (It's like it's wearing 'NaN-blind' glasses). For string columns, use df.replace('', np.nan)
to convert empty strings to np.nan
before you unleash the mighty isna()
.
Non-numeric data: Handling categorical NaN
For your non-numeric data, value_counts()
with the dropna=False
flag comes handy, although it doesn't directly tally NaNs (It's like an indirect WhatsApp message):
Was this article helpful?