Explain Codes LogoExplain Codes Logo

Convert floats to ints in Pandas?

python
dataframe
pandas
data-integrity
Nikita BarsukovbyNikita Barsukov·Oct 1, 2024
TLDR

To convert a float to an int in Pandas, use the .astype() function with the argument 'int' or 'Int64'. This can be done either for single columns or the entire DataFrame.

Consider the following examples:

df['col'] = df['col'].astype(int) # Single column df = df.astype(int) # Entire DataFrame

Handling missing values and rounding

When dealing with floating-point numbers, rounding and missing value handling are crucial. For NaN values, use fillna(0.0) before the type conversion. However, to ensure your floating-point data isn't truncated arbitrarily, employ the round() function for precision:

# Captain NaN, the stealthy data pest, beware! df['col'] = df['col'].fillna(0.0).round().astype('Int64') df = df.fillna(0.0).round().astype('Int64') # One round to conquer NaNs! 👊

Bulk conversion: The mighty blow

When the DataFrame grows, we need more robust and efficient ways. Use applymap(np.int64) for better precision in mass conversions, and select_dtypes(include=['float64']) to filter and convert float columns:

# Gather the float squad float_cols = df.select_dtypes(include=['float64']).columns # Train the float squad to be ints df[float_cols] = df[float_cols].applymap(np.int64)

Change display: The master of disguise

But what if you want to keep the float data, but display them as integers to the human eye? Use options.display.float_format:

# "I'm not a float, I swear!" 🕵️‍♂️ pd.options.display.float_format = '{:,.0f}'.format

Controlling the integer type: The control freak

Sometimes your data needs certain integer types due to the size or sign constraints. For this, you can use specific integer aliases like np.int8, np.int16, np.int32, or np.int64:

# "I don't want just any integer. I want YOU, int32!" 😍 df['col'] = df['col'].astype(np.int32)

Handle with care: Data integrity

Converting floats to ints could lead to information loss, akin to losing your luggage at the airport – not so fun! Be cautious of data integrity before converting.

Data import: Type specification

When importing data, specify the dtype directly using dtype='Int64'. It’s like labeling your luggage – you know what you packed!

# "Welcome aboard, Int64 travelers!" 🛫 df = pd.read_csv('file.csv', dtype={'col': 'Int64'})

Post-conversion check: Count the luggage

Receipt check! Did you get all your luggage intact? Use df.dtypes to check:

# "Yes, my data is all here!" – Relieved Data Scientist 🧑‍🔬 print(df.dtypes)

Best performance: Master conversion

Use vectorized operations like applymap() for enhanced performance and efficiency. Don’t forget to reassign the converted columns back to the DataFrame:

# "Applymap, more like a magic map!" – Potentially Confused Data Scientist df[float_cols] = df[float_cols].applymap(lambda x: int(round(x)))