Explain Codes LogoExplain Codes Logo

How to iterate over columns of a pandas dataframe

python
dataframe
performance
vectorized-operations
Nikita BarsukovbyNikita Barsukov·Jan 26, 2025
TLDR

To iterate over Pandas DataFrame columns, use the df.iteritems() method:

import pandas as pd df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) for label, content in df.iteritems(): print(f'{label}:') print(content)

Here, each loop gives you the column name and associated Series.

Turning knobs: Performance and flexibility with df.items() and df.apply()

df.iteritems() is fine, but both df.items() and df.apply() can offer better performance and more flexibility. So, let's turn some knobs:

for label, content in df.items(): # label and content at your service # Got speed? Let's roll!

Or even better, apply a function on each column without explicitly iterating:

result = df.apply(some_cool_function) # Just applied some function on all columns. It feels like magic!

Working with regression? No problem, use df.apply() to get residuals. Or spin a wheel with traditional loops:

df.apply(lambda col: sm.OLS(target, col).fit().resid) # Running regressions faster than Usain Bolt! # Or old school: for column in df.columns: model = sm.OLS(target, df[column]) results = model.fit() df['residual_' + column] = results.resid # Residuals stored. Who's got time for residuals inspection?

Slicing more your style? Use df.columns:

sub_df = df[df.columns[1:3]] # Slicing dataframe like a hot knife through butter!

Dislike disorganized column manipulation? enumerate to rescue!

for i, column in enumerate(df.columns): # play_with(i, column) # Oh, I see, you like things organized! OCD much?

And yes, we refrain from deprecated 'ix'. We're fans of .loc or .iloc.

df.loc[:,'A'] # using labels for row and column df.iloc[0,:] # using index for row and column # 'ix' who? Never heard of him!

Need to treat columns like rows? df.transpose() is all you need.

for row_label, row_value in df.transpose().iterrows(): # row_label is a column in disguise!

Of course, add error checks because we are not savages!

Techniques for the initiated

Working with large dataframes

With large dataframes, watch the memory footprint:

  • Generator expressions can help with memory efficiency.
  • Vectorized operations are your friends!

The Art of slicing

To improve your slicing game with df.columns:

  • Negative indexing to skip the last column(s)
  • Conditional slicing to filter columns

The need for speed in regression

Running regressions? Here's how to win the race:

  • Pre-allocation to store regression results
  • Multiprocessing for parallel computation