Search for "does-not-contain" on a DataFrame in pandas
Require to cleanse a DataFrame of rows where our chosen column ('column'
) does not contain 'exclude_this'?
Utilise ~
to indicate "does-not-contain", ensuring we capture everything excluding 'exclude_this'. The na=False
attribute makes sure we also include rows with NaN values (which ordinarily might get lost).
Deep dive into filtering strategies
Combo breaker: Tackling multiple patterns
When you've got multiple words to exclude, it's time for a regular expression tag-team, using |
(OR):
Ensure your patterns
string is fully armed with the necessary patterns, divided by |
symbols to form a powerhouse of pattern removal.
Don't yell: Case insensitivity
Should all cases ('EXCLUDE_THIS', 'Exclude_This', and 'exclude_this') be treated as equals, add case=False
attribute for a case-insensitive search:
Show-off time: Lambda for complex filtering
When you feel the need to introduce advanced logic—perhaps excluding multiple words—lambda functions have got your back:
Potential fix: How to deal with NULLs and TypeErrors
Before filtering with negation, make sure your DataFrame is sterilized from TypeError-inducing null
values and annoying variances in data types:
Boundless conditionality: loc method
Got complicated conditions to meet? Why not use loc
, the swiss-knife of data filtering:
Was this article helpful?