Explain Codes LogoExplain Codes Logo

Getting group-wise statistics (count, mean, etc.) using pandas GroupBy

python
dataframe
groupby
pandas
Nikita BarsukovbyNikita Barsukov·Dec 5, 2024
TLDR

Here's a bit of python to make your life easier with groupby and agg:

# DataFrame 'df', don't leave home without it grouped_stats = df.groupby('Category')['Values'].agg(['count', 'mean', 'sum'])

And boom! Just like that, you've got your count, mean, and sum for each 'Category'.

Unpacking the process

Digging up the numbers

If your curiosity is itching for row count per group, .size() will put the cat right out of its misery:

# It's the circle of counts group_counts = df.groupby(['col1', 'col2']).size().reset_index(name='counts')

But hey, we're all for overachieving here! Let's go all out and get the mean and count at once:

# It's like killing two birds with one stone, but less grim multi_stats = df.groupby(['col1', 'col2']).agg({'col3': 'mean', 'col4': 'count'})

Just remember, null values can be a heartbreaker. They might mess with your means and counts. Be sure to handle those bad boys properly!

Customizing group statistics

Hand-pick your aggregates

Your ticket to aggregation utopia is a ride on the agg train:

# 'col1' is first class, and 'col2' is just along for the ride stats = df.groupby('col1')['col2'].agg(['mean', 'std', 'var', 'max'])

Going fancy with a reset_index() will turn your bumpy multi-index DataFrame into a nice, flat ride:

# Because who likes multi-tiered structures, amirite? final_stats = stats.reset_index()

Comprehensive stats, a touch away

When you want to go full Sherlock on your stats, describe is your magnifying glass:

# It's like a database's tell-all biography descriptive_stats = df.groupby('col1').describe()

But hey, if you only care about the juicy bits, just snag 'em:

# Because sometimes, less is more focused_descriptions = descriptive_stats[['count', 'mean']]

This ecosystem of stats, complete with their visual cousins - charts and graphs - can bring the true face of your data front and center.