Change column type in pandas
For a quick type conversion in pandas DataFrame, use the astype() function. Converting a column to a string:
Handling NaNs using nullable integers:
For better memory efficiency, switch to categorical:
And if you have multiple columns to wrangle:
Step-by-step conversions
Surviving the Jungle of Numeric Conversion
If your data is like a jungle with numerical data hiding in strings, pd.to_numeric() is your machete:
To keep your data intact, use errors='ignore':
The Gentle Giant of Object Conversion
df.infer_objects() is this gentle giant that can upgrade 'object' dtype to more specific types:
Convert dtypes: Your trusty toolbox
df.convert_dtypes() is your trusty toolbox that identifies the right tool (type) and uses it:
Leave out automatic type inference when not necessary:
Diving the depths of type casting
Avoid sinking your data by casting it to the right dtype. Ensuring safe casting is crucial:
Knowing Your dtypes
Before any type conversion, a quick glance at your current dtypes using df.dtypes:
Then convert only the necessary columns:
There are 'hard' and 'soft' conversions, understand the difference:
Handling mixed columns
Have columns with mixed types or numeric literals? pd.to_numeric() with errors='coerce' clears the clutter:
Memory optimization
To save memory, use pd.to_numeric(df['col'], downcast='integer'):
Converting to string or categoricals can optimize memory too!
Pitfalls of bad casting
Avoid data corruption by not forcing the wrong dtype:
Powerful performance strategies
Prioritize columns for type casting. Handle few columns now for a performance boost!
Was this article helpful?