Python pandas Filtering out nan from a data selection of a column of strings
To exclude rows with NaN values in a pandas DataFrame, create a boolean mask with the .notna()
method and use it to index the DataFrame:
This excellent line of strategy shows NaN who's boss in 'col', while keeping the rest of the DataFrame troops intact.
Spotlights on NaN expulsion techniques
Single vs Multiple Columns: Choose your battlefield
When your battlefront extends to several columns, aim to remove any row that plays host to NaN in all or any of the specified columns:
Exploiting Query Method for Swift Cleanup
Use query
to pick a sweet apple from the tree, eschewing those infested by the nan-worm:
Root out disguised 'NaN' trespassers with RegEx
Beware! Some NaNs come incognito as 'N/A' or empty, ghost-like strings:
Custom Filtering: Bringing out the big guns with list comprehensions
Big problems need big solutions. List comprehensions are heavy-duty machinery when you need to tackle multiple conditions or run a custom function:
Advanced Data Ninja techniques
Custom Placeholders: NaN in disguise
Not every NaN is as obvious. They may be masquerading as an innocent '--'
or a nondescript 'unknown'
:
Nullable Integer types: NaN's favorite hideout
Review nullability of your integers. While every integer is proud of its value, some are shy and hide behind a pd.NA
mask:
Nullable String types: NaN's secret lair
The newer StringDtype
in pandas is a secure vault to lock up the pd.NA:
Was this article helpful?