Search for "does-not-contain" on a DataFrame in pandas
Require to cleanse a DataFrame of rows where our chosen column ('column') does not contain 'exclude_this'?
Utilise ~ to indicate "does-not-contain", ensuring we capture everything excluding 'exclude_this'. The na=False attribute makes sure we also include rows with NaN values (which ordinarily might get lost).
Deep dive into filtering strategies
Combo breaker: Tackling multiple patterns
When you've got multiple words to exclude, it's time for a regular expression tag-team, using | (OR):
Ensure your patterns string is fully armed with the necessary patterns, divided by | symbols to form a powerhouse of pattern removal.
Don't yell: Case insensitivity
Should all cases ('EXCLUDE_THIS', 'Exclude_This', and 'exclude_this') be treated as equals, add case=False attribute for a case-insensitive search:
Show-off time: Lambda for complex filtering
When you feel the need to introduce advanced logic—perhaps excluding multiple words—lambda functions have got your back:
Potential fix: How to deal with NULLs and TypeErrors
Before filtering with negation, make sure your DataFrame is sterilized from TypeError-inducing null values and annoying variances in data types:
Boundless conditionality: loc method
Got complicated conditions to meet? Why not use loc, the swiss-knife of data filtering:
Was this article helpful?