Explain Codes LogoExplain Codes Logo

Add missing dates to pandas dataframe

python
dataframe
pandas
datetime
Nikita BarsukovbyNikita Barsukov·Jan 28, 2025
TLDR

To fill in missing dates in a DataFrame, use pd.date_range() to generate a full date range and reindex() your DataFrame against this range. Here's an example:

import pandas as pd # df now telepathically knows it has a 'date' column date_range = pd.date_range(start=df.index.min(), end=df.index.max(), freq='D') df_filled = df.reindex(date_range, fill_value=0)

This method fills gaps between the earliest and latest dates in your data by substituting zeroes for missing entries.

Data skeletons: fixing missing dates

Having missing dates in your data can be like trying to solve a puzzle with missing pieces. Hence, it's crucial to know how to add those pieces back.

Rise of the undead: eliminating duplicates and missing dates

Sometimes, your data may have duplicates. Here's how you can reindex and behead them in one stroke:

# duplicates = zombies df = df[~df.index.duplicated(keep='first')].reindex(date_range, fill_value=0)

Bewitching timestamps to datetime

To make sure reindexing doesn't turn hair grey, cast your index to DatetimeIndex:

# df.index = magical transformation to datetime df.index = pd.to_datetime(df.index)

Marching to the beat: adjusting frequency

The DataFrame.asfreq() method can be used to fill in missing values based on a specified frequency, like daily ('D'):

# df_asfreq = a method that archaic Romans used to count days df_asfreq = df.asfreq('D', fill_value=0)

Forming the date Avengers: resample and fillna()

When dealing with averages or sums over intervals, form the perfect duo by resampling and then filling NaNs:

# Resampling: Bruce Banner. fillna(0): Hulk. df_resampled = df.resample('D').sum().fillna(0)

Time-sort Tetris

Make sure you have sorted your DataFrame before any wizardry. Stacking things up properly first can save loads of time:

# sorting = Tetris for DataFrame df.sort_index(inplace=True)

Time tactics

Single method, multiple effects

Depending on whether your data is trend-focused or event-driven, you may prefer ffill, bfill or fillna(0).

Plot-pokes

To avoid plot-potholes in your visualisations, ensure that you have a consistent timeline on the x-axis.

Adaptable solutions

Work with dynamic solutions that adapt with growing data input without requiring manual updates.

Naming your path

Indentify your date column or date index clearly during date manipulation to maintain clarity and accuracy in your operations.