Change column type in pandas
For a quick type conversion in pandas DataFrame, use the astype()
function. Converting a column to a string:
Handling NaNs using nullable integers:
For better memory efficiency, switch to categorical:
And if you have multiple columns to wrangle:
Step-by-step conversions
Surviving the Jungle of Numeric Conversion
If your data is like a jungle with numerical data hiding in strings, pd.to_numeric()
is your machete:
To keep your data intact, use errors='ignore'
:
The Gentle Giant of Object Conversion
df.infer_objects()
is this gentle giant that can upgrade 'object' dtype to more specific types:
Convert dtypes: Your trusty toolbox
df.convert_dtypes()
is your trusty toolbox that identifies the right tool (type) and uses it:
Leave out automatic type inference when not necessary:
Diving the depths of type casting
Avoid sinking your data by casting it to the right dtype. Ensuring safe casting is crucial:
Knowing Your dtypes
Before any type conversion, a quick glance at your current dtypes using df.dtypes
:
Then convert only the necessary columns:
There are 'hard' and 'soft' conversions, understand the difference:
Handling mixed columns
Have columns with mixed types or numeric literals? pd.to_numeric()
with errors='coerce'
clears the clutter:
Memory optimization
To save memory, use pd.to_numeric(df['col'], downcast='integer')
:
Converting to string or categoricals can optimize memory too!
Pitfalls of bad casting
Avoid data corruption by not forcing the wrong dtype:
Powerful performance strategies
Prioritize columns for type casting. Handle few columns now for a performance boost!
Was this article helpful?