Explain Codes LogoExplain Codes Logo

Convert Pandas column containing NaNs to dtype int

python
pandas
dataframe
best-practices
Nikita BarsukovbyNikita Barsukov·Sep 15, 2024
TLDR

The command below allows you to convert a column with NaN values in pandas to a nullable integer type, i.e., int64:

df['column'] = df['column'].astype('Int64')

This ensures that NaN values are recognized as missing entries enabling seamless integer operations.

Considering mixed data types for conversion

When your dataframe column has different kinds of numeric values mixed with NaN values, it can raise errors while converting to integers. An efficient approach is to first convert to float and then to `Int64:

df['column'] = df['column'].astype(float).astype('Int64') # Float like a butterfly, sting like an int.

In this manner, all numerical values are comprehended accurately before translating to a nullable integer format.

Smart replacement of NaNs

If you want to substitute NaN values with a specific value before conversion:

df['column'] = df['column'].fillna(0).astype('Int64') # Goodbye NaN, hello 0!

However, remember that this changes all NaN values to zeros and could skew your analysis with inflated artificial data.

Advanced usage scenarios

Float as a viable alternative

If strict integer type is not required for your columns, use float:

df['column'] = df['column'].astype(float) # Floating away from the integer world!

Tweaking with object types

If the target column to be converted from object dtype contains strings and NaNs, be sure to handle non-numeric strings:

df['column'] = pd.to_numeric(df['column'], errors='coerce').astype('Int64') # Sorry strings, you're not deemed fit!

With the errors='coerce', non-numeric values are set to NaN, enabling a smooth conversion to nullable integers.

Restore NaNs

After replacing NaNs for conversion, it's possible to swap them back:

df['column'] = df['column'].replace(-1, pd.NA) # NaNs are back on the game!