Explain Codes LogoExplain Codes Logo

Count unique values per groups with Pandas

python
pandas
dataframe
groupby
Anton ShumikhinbyAnton Shumikhin·Jan 15, 2025
TLDR

Essence demystified: to count unique values within groups of a Pandas DataFrame, make groupby bff with nunique(). Here's a party of them in Python:

# 'df' is your DataFrame, 'Group' your crew, 'Value' the unique opinion of each member. unique_counts = df.groupby('Group')['Value'].nunique()

💡 Pro Tip: If your DataFrame might be having a party of triplets (i.e., duplications), remember cleanup before you count.

Getting efficiency with groupby

Large datasets? No worries! groupby is fearless when teamed up with nunique(). It won't create a blackhole in your memory:

# 'large_df' is more like a galaxy, 'Category' is your particular constellation and 'Item' a unique star. unique_stars = large_df.groupby('Category')['Item'].nunique()

💡 Pro Tip: Yep, it’s like if you run a cosmic vacuum cleaner through your data, picking up only the stars. (Disclaimer: no actual stars are harmed)

Counting unique IDs: groupby vs nunique

Working with email data? Here is how we count unique users by their email domain:

# 'EmailDomain' is where users live and 'UserID' is their unique name. df.groupby('EmailDomain')['UserID'].nunique()

💡 Fun Fact: Pandas speaks strip. Before counting, remind it to take out the white spaces from your domains, like picking out the seeds from apples.

Grouping with style: Introducing agg

To maintain the original column names in the output (and a bit of sanity), party with agg and nunique:

# Salute to 'Value', our unique guest, maintaining its identity while getting counted. df.groupby('Group').agg(unique_count=('Value', 'nunique'))

🃏 Easter-egg: Agg's got style. Use it wisely, and it'll make your DataFrame look haute couture.

Data cleaning: no duplicates allowed

It's a unique count party. Duplicates? No entry!

# Twins or triplets, 'Drop_Duplicates' gives same-face-value entries an exit ticket before the party. df_clean = df.drop_duplicates(subset=['Domain', 'ID']) df_clean.groupby('Domain')['ID'].nunique()

Single-column party: meet value_counts

Life's made simpler and more exciting when the whole party is concentrated around a single column with value_counts():

# 'Domain' is the spotlight. df['Domain'].value_counts()

💡 Pro Tip: Remember, value_counts() is "The Ruler" when it’s not about multiple groupings.

Distinct values: unique or drop_duplicates

Getting the guest list before the party? Here is where unique() and drop_duplicates() enter the stage:

df['Domain'].unique() # or alternatively df['Domain'].drop_duplicates()

🃏 Joke of the Day: "So, we're unique, like everyone else."

Total uniqueness with nunique

To know total number of unique guests at the party, ask nunique():

df['Domain'].nunique()

💡 Pro Tip: Remember, nunique() is more of an introvert. It won't scream the details, only the total unique count.