Explain Codes LogoExplain Codes Logo

Replacing blank values (white space) with NaN in pandas

python
pandas
dataframe
replace
Anton ShumikhinbyAnton Shumikhin·Mar 4, 2025
TLDR

Breeze through white space replacement with replace() using regex:

# Hey, isn't "nothing" something too? Just kidding, let's replace it with NaN. df.replace('^\s*$', np.nan, regex=True, inplace=True)

'^\s*$' matches all white spaces. np.nan is our replacement. Need it for a specific column named col_name?

# One column at a time! Remember, "Rome was not built in a day." df['col_name'] = df['col_name'].replace('^\s*$', np.nan, regex=True)

Behind regex

Understanding the regex '^\s*$': We're searching for strings with nothing but White Space. The ^ anchors at the beginning, \s* looks eagerly for any whitespace, and $ seals the end. Even empty strings are embraced and offered a white-space-to-NaN transformation.

Why replace() and not loops?

Why df.replace() over loops? Well, loops are like the turtle, sure to reach eventually, but the hare of replace() is beyond the fable, with speed and accuracy combined! replace() is efficient and Pythonic, with much more pep in its step on larger datasets.

Precautions: Handling data types

Be it str, NaN, or else, we sort 'em all! Here's how:

# If it's not a string, then it's not worth my string! - Legendary String df = df.applymap(lambda x: np.nan if isinstance(x, str) and x.isspace() else x)

This ensures only strings are given the opportunity for transformation.

On-the-go: Pre-processing with read_csv

pd.read_csv offers an early bird advantage; it can treat blank spaces as NaN straight away:

# "Hit 'em hard and early" - NaN, probably. df = pd.read_csv('file.csv', na_values=[' '])

Resource management: Focusing on larger data

Got larger datasets? Specific columns bearing white spaces can be the sole target of replace(), cutting on time:

# Just 'notes' to myself. df['notes'] = df['notes'].replace('^\s*$', np.nan, regex=True)

Diversity: Alternate approach with apply() and lambda

Flirting with apply() and anonymous love letters of lambda:

# "If it walks like a duck and quacks like a duck, it's definitely blockchain!" - Confused Objects df = df.apply(lambda x: x.replace('^\s*$', np.nan, regex=True) if x.dtype == "object" else x)

The replacement applies across the DataFrame, not discerning among columns.

Good practice for efficient coding

Adopt built-in pandas functions for efficiency and readability. Loops may look tired next to these bright pandas.