Explain Codes LogoExplain Codes Logo

Keep only date part when using pandas.to_datetime

python
pandas
datetime
date-formatting
Alex KataevbyAlex Kataev·Oct 17, 2024
TLDR

To remove the time part from pandas datetime objects, use the .dt.date attribute:

df['date_only'] = pd.to_datetime(df['column']).dt.date

This will return a series of date objects, devoid of time.

If you want to keep the data type as datetime64[ns] and set the time to midnight, utilize .dt.normalize():

df['date_normalized'] = pd.to_datetime(df['column']).dt.normalize()

Converting to date while preserving datetime64 type

Maintaining the data type as datetime64[ns] is critical for certain operations like time-series analysis. Hence, use .dt.normalize() for transforming the time to midnight, while preserving the datetime64 data type:

df['dates_normalized'] = df['dates'].dt.normalize()

Dodging object dtype conversion

Running .dt.date changes the data type to an object dtype, which is less efficient than datetime64[ns]. This can negatively impact performance when dealing with large datasets, so be cautious:

# Watch out! Object dtype below df['date_objects'] = df['dates'].dt.date

Instead of the above, consider using .dt.floor('d') to floor the datetime:

df['dates_floored'] = df['dates'].dt.floor('d')

This method keeps your data in datetime64[ns], ensuring the efficiency of operations is not compromised.

.to_csv date formatting tricks

When you need to save data into a CSV file sans the time component, you can leverage the magic date_format parameter:

# Abracadabra! No time component. df['dates'].to_csv('dates.csv', date_format='%Y-%m-%d')

Why vectorized methods are neat

Vectorized methods should always be your first choice for operations spanning entire columns. They are significantly faster than converting to datetime.date objects row by row:

# A tortoise could outrun this piece of code! df['dates'] = [d.date() for d in pd.to_datetime(df['column'])]

strftime and its superpowers

You can also format dates with strftime directives. However, avoid its use for operations that can be vectorized:

# Use the superpower wisely, or it might backfire! df['formatted_dates'] = df['dates'].dt.strftime('%Y-%m-%d')

Embracing pandas' progress

The pandas library is constantly updating, improving the efficiency and functionality of date and time handling. Good news – the .dt accessor has had quite a few enhancements since pandas version 0.15.0.

On Precision

In cases where the data includes timestamps but your focus is only on the date, use .to_datetime with a precision specification to implicitly remove the time zone:

df['date_precision'] = pd.to_datetime(df['datetime_column'], format='%Y-%m-%d')

Time zones: not just for world travelers

Time zones can complicate dates. The best thing to do is to normalize to UTC first to dodge any potential errors:

df['utc_date'] = pd.to_datetime(df['datetime_column'], utc=True).dt.date