Multiple aggregations of the same column using pandas GroupBy.agg()
To carry out multiple aggregations on a single DataFrame column in pandas, use the GroupBy.agg()
function. This is done by giving a dictionary that maps column names to a list of aggregation functions:
And just like that, you get a DataFrame with total, average, and peak of the 'value' column for each unique 'key'. Pretty neat, right?
Simplifying the process with pandas >= 0.25
If you're rocking pandas version 0.25 or newer, lean back and enjoy a more legible syntax thanks to named aggregation:
These two code snippets are equal in the eyes of pandas. But to a human interpreting your code, the latter is as clear as a sunny day in the dry season—thanks to the explicit function mapping introduced by NamedAgg.
Custom aggregation: because why not?
Need something a bit more fancy than just 'sum', 'mean', and 'max'? Have a custom function to apply? No problem. Use lambda expressions in the aggregation dictionary:
Here, our lambda functions determined the range and variance for each group, saving us the trouble of defining and explaining a whole new function—clever and efficient!
Compatibility considerations: pandas version matters
With great power comes great responsibility—especially when you're writing code that might run on different versions of pandas. Aggregation syntax has evolved over time, so heed any deprecation warnings that come up. The goal is version-agnostic code that won't fall apart with newer releases.
Visual demonstration
Think of Multiple aggregations on one DataFrame column as a single trip to a meal buffet. One dish (column) with different flavors (aggregations):
Here's how to extract those different flavors using pandas:
And just like that, your DataFrame column has multiple tastes. Bon appétit!
Retaining original group index
Keep the grouping column from becoming an index by specifying as_index=False
:
This gives you a DataFrame where the grouping key remains a column, ensuring the output blends well with remaining data.
Tidying up MultiIndex columns
After named aggregation, the DataFrame may have MultiIndex columns. For a tidy output, flatten them:
Coding efficiently with method chaining
Combine different methods chains to write clean and efficient code:
The methods above run sequentially without needing extra variable storage—a recipe for clean and comprehendible code.
Was this article helpful?