Explain Codes LogoExplain Codes Logo

How do I combine two dataframes?

python
dataframe
pandas
data-merging
Anton ShumikhinbyAnton Shumikhin·Feb 11, 2025
TLDR

Merge DataFrames vertically or horizontally using pd.concat:

combined_vertical = pd.concat([df1, df2]) combined_horizontal = pd.concat([df1, df2], axis=1)

Join them based on a common key using pd.merge:

combined = pd.merge(df1, df2, on='key')

With pd.concat you can stack dataframes, whereas pd.merge allows you to join dataframes similarly to SQL joins. Just remember to choose the right keys for joining and decide whether to keep original index or reset them in concat.

Key techniques of Combining DataFrames

To keep original indexes after appending dataframes, you may use ignore_index parameter:

df_merged = df1.append(df2, ignore_index=False) # Original index from both dataframes is preserved. # 'Cuz we love originals, don't we? Just like Pythons love indentation!

While dealing with multiple dataframes, just wrap them into a list before concatenation:

dataframes = [df1, df2, df3] combined = pd.concat(dataframes, ignore_index=True) # Combining three dataframes - because life is better in threes!

For updating data from df2 into df1, just use the update() function:

df1.update(df2) # df1 gets a makeover, courtesy of df2!

Remember, before updating, set the corresponding index first.

Special Cases to Consider

Combining with Duplicates

Want to remove duplicates while combining? Use drop_duplicates(). Just remember, pandas won't do it by default (it's a bit lazy like us, sometimes).

combined = pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True) # De-duplication: because we like to keep things neat and tidy!

Dealing with Overlapping Data

DataFrame got some intersecting data? Use combine_first() to fill null values with values from another DataFrame.

df_combined = df1.combine_first(df2) # Some don't like overlapping... Unless it fills in the gaps!

Merging on Multiple Keys

Joining dataframes on more than one key? Use pd.merge() and define multiple keys:

combined = pd.merge(df1, df2, left_on=['key1', 'key2'], right_on=['key3', 'key4'], how='inner') # Because two keys are better than one... usually!