How do I combine two dataframes?
Merge DataFrames vertically or horizontally using pd.concat
:
Join them based on a common key using pd.merge
:
With pd.concat
you can stack dataframes, whereas pd.merge
allows you to join dataframes similarly to SQL joins. Just remember to choose the right keys for joining and decide whether to keep original index or reset them in concat
.
Key techniques of Combining DataFrames
To keep original indexes after appending dataframes, you may use ignore_index
parameter:
While dealing with multiple dataframes, just wrap them into a list before concatenation:
For updating data from df2
into df1
, just use the update()
function:
Remember, before updating, set the corresponding index first.
Special Cases to Consider
Combining with Duplicates
Want to remove duplicates while combining? Use drop_duplicates()
. Just remember, pandas won't do it by default (it's a bit lazy like us, sometimes).
Dealing with Overlapping Data
DataFrame got some intersecting data? Use combine_first()
to fill null values with values from another DataFrame.
Merging on Multiple Keys
Joining dataframes on more than one key? Use pd.merge()
and define multiple keys:
Was this article helpful?