Replacing blank values (white space) with NaN in pandas
Breeze through white space replacement with replace()
using regex:
'^\s*$'
matches all white spaces. np.nan
is our replacement. Need it for a specific column named col_name
?
Behind regex
Understanding the regex '^\s*$'
: We're searching for strings with nothing but White Space. The ^
anchors at the beginning, \s*
looks eagerly for any whitespace, and $
seals the end. Even empty strings are embraced and offered a white-space-to-NaN transformation.
Why replace() and not loops?
Why df.replace()
over loops? Well, loops are like the turtle, sure to reach eventually, but the hare of replace()
is beyond the fable, with speed and accuracy combined! replace()
is efficient and Pythonic, with much more pep in its step on larger datasets.
Precautions: Handling data types
Be it str
, NaN
, or else, we sort 'em all! Here's how:
This ensures only strings are given the opportunity for transformation.
On-the-go: Pre-processing with read_csv
pd.read_csv
offers an early bird advantage; it can treat blank spaces as NaN straight away:
Resource management: Focusing on larger data
Got larger datasets? Specific columns bearing white spaces can be the sole target of replace()
, cutting on time:
Diversity: Alternate approach with apply() and lambda
Flirting with apply()
and anonymous love letters of lambda:
The replacement applies across the DataFrame, not discerning among columns.
Good practice for efficient coding
Adopt built-in pandas functions for efficiency and readability. Loops may look tired next to these bright pandas.
Was this article helpful?