How to iterate over columns of a pandas dataframe
To iterate over Pandas DataFrame columns, use the df.iteritems()
method:
Here, each loop gives you the column name and associated Series.
Turning knobs: Performance and flexibility with df.items()
and df.apply()
df.iteritems()
is fine, but both df.items()
and df.apply()
can offer better performance and more flexibility. So, let's turn some knobs:
Or even better, apply a function on each column without explicitly iterating:
Working with regression? No problem, use df.apply()
to get residuals. Or spin a wheel with traditional loops:
Slicing more your style? Use df.columns
:
Dislike disorganized column manipulation? enumerate
to rescue!
And yes, we refrain from deprecated 'ix'. We're fans of .loc
or .iloc
.
Need to treat columns like rows? df.transpose()
is all you need.
Of course, add error checks because we are not savages!
Techniques for the initiated
Working with large dataframes
With large dataframes, watch the memory footprint:
- Generator expressions can help with memory efficiency.
- Vectorized operations are your friends!
The Art of slicing
To improve your slicing game with df.columns
:
- Negative indexing to skip the last column(s)
- Conditional slicing to filter columns
The need for speed in regression
Running regressions? Here's how to win the race:
- Pre-allocation to store regression results
- Multiprocessing for parallel computation
Was this article helpful?