Explain Codes LogoExplain Codes Logo

Multiple aggregations of the same column using pandas GroupBy.agg()

python
pandas
dataframe
aggregation
Alex KataevbyAlex Kataev·Feb 28, 2025
TLDR

To carry out multiple aggregations on a single DataFrame column in pandas, use the GroupBy.agg() function. This is done by giving a dictionary that maps column names to a list of aggregation functions:

# Watch this magic trick... agg_result = df.groupby('key')['value'].agg({ 'total': 'sum', # Presto! Column transformed into its sum 'average': 'mean', # Alakazam! Now it's an average 'peak': 'max' # Abracadabra! And now it's a max })

And just like that, you get a DataFrame with total, average, and peak of the 'value' column for each unique 'key'. Pretty neat, right?

Simplifying the process with pandas >= 0.25

If you're rocking pandas version 0.25 or newer, lean back and enjoy a more legible syntax thanks to named aggregation:

agg_result = df.groupby('key').agg( total=pd.NamedAgg(column='value', aggfunc='sum'), # This line: sum average=pd.NamedAgg(column='value', aggfunc='mean'), # And this one: mean peak=pd.NamedAgg(column='value', aggfunc='max') # MLM: Max Level Magic )

These two code snippets are equal in the eyes of pandas. But to a human interpreting your code, the latter is as clear as a sunny day in the dry season—thanks to the explicit function mapping introduced by NamedAgg.

Custom aggregation: because why not?

Need something a bit more fancy than just 'sum', 'mean', and 'max'? Have a custom function to apply? No problem. Use lambda expressions in the aggregation dictionary:

# Let's make this exciting... agg_result = df.groupby('key')['value'].agg({ 'range': lambda x: x.max() - x.min(), # Boom! Calculated range 'variance': lambda x: x.var() # Pow! There's your variance... })

Here, our lambda functions determined the range and variance for each group, saving us the trouble of defining and explaining a whole new function—clever and efficient!

Compatibility considerations: pandas version matters

With great power comes great responsibility—especially when you're writing code that might run on different versions of pandas. Aggregation syntax has evolved over time, so heed any deprecation warnings that come up. The goal is version-agnostic code that won't fall apart with newer releases.

Visual demonstration

Think of Multiple aggregations on one DataFrame column as a single trip to a meal buffet. One dish (column) with different flavors (aggregations):

+------ Column 'value' ------+ | | | [🥘] Sum => 💰 | | [🍚] Mean => 📈 | | [🥗] Max => 🚀 | | [🍣] Custom => 🧪 | | | +----------------------------+

Here's how to extract those different flavors using pandas:

# Roll up our sleeves... df.groupby('key')['value'].agg([ ('💰', 'sum'), ('📈', 'mean'), ('🚀', 'max'), ('🧪', lambda x: x.median() - x.mean()) # Don't try this at a real buffet! ])

And just like that, your DataFrame column has multiple tastes. Bon appétit!

Retaining original group index

Keep the grouping column from becoming an index by specifying as_index=False:

# More magic coming up... df.groupby('key', as_index=False)['value'].agg( Mean=('value', 'mean'), Range=lambda x: x.max() - x.min() # Here be dragons!... Actually, just a range. )

This gives you a DataFrame where the grouping key remains a column, ensuring the output blends well with remaining data.

Tidying up MultiIndex columns

After named aggregation, the DataFrame may have MultiIndex columns. For a tidy output, flatten them:

# It's cleaning time... result = df.groupby('key').agg( Mean=pd.NamedAgg(column='value', aggfunc='mean'), Sum=pd.NamedAgg(column='value', aggfunc='sum') ) result.columns = ['_'.join(col).strip() for col in result.columns.values] # Squeaky clean!

Coding efficiently with method chaining

Combine different methods chains to write clean and efficient code:

(df .groupby('key', as_index=False) .agg(mean_value=('value', 'mean'), sum_value=('value', 'sum')) # First the mean and sum .assign(ratio=lambda x: x['sum_value'] / x['mean_value']) # Then the ratio... Voila! )

The methods above run sequentially without needing extra variable storage—a recipe for clean and comprehendible code.