Split (explode) pandas dataframe string entry to separate rows
Want to split a column of CSVs into multiple rows swiftly? Use pd.Series.explode()
in Pandas. Given a DataFrame df
with 'Values'
column, use .str.split(',')
to slice the string into lists, then .explode()
:
This morphs:
Values
0 a,b,c
1 d,e,f
Into these separate rows:
Values
0 a
0 b
0 c
1 d
1 e
1 f
You'll need Pandas 0.25.0 or later for this trick, though multi-column explosions require Pandas 1.3.0. For an older pandas version, you'd need to get creative.
Digging deeper: Advanced techniques and pitfalls
From singularity to multiverse: Single vs Multi-column explode
In the version 1.3.0 and onwards, Pandas lets you explode multiple columns simultaneously:
Don't mix up: Handling non-string columns
Exploding non-string columns without exploding your sanity is easy. Just keep'em unchanged during the split:
The need for speed: Account for efficiency
Performance is key with large data science, so run %timeit
and pick the approaches that won't have you growing old waiting:
Popping the stack
and the concat
can
When reshape is the game, stacking and concatenating provide alternative plays to get your data in formation:
iterrows
are bad, m'kay?
Although iterrows
may seem like a good idea, it's a trap. Pandas advises to avoid this clunky method for larger datasets. When in doubt, vectorize!
Explode like a pro
Tidy data and explode: Best buddies
In tidy dataframe land, each variable is a column and each observation, a row. After the explode
command, make sure your dataframe is still tidy:
Beyond pandas: tidy_split
Robust beyond measure, third-party functions like tidy_split
retain column order and handle missing values efficiently.
Picking the right tool
Explore the Python community's wisdom to choose the solution that's best for your specific case. Consider data structure, dataframe size, and the Pandas version.
Was this article helpful?