Count unique values per groups with Pandas

python

pandas

dataframe

groupby

byAnton Shumikhin·Jan 15, 2025

Essence demystified: to count unique values within groups of a Pandas DataFrame, make groupby bff with nunique(). Here's a party of them in Python:

# 'df' is your DataFrame, 'Group' your crew, 'Value' the unique opinion of each member.
unique_counts = df.groupby('Group')['Value'].nunique()

💡 Pro Tip: If your DataFrame might be having a party of triplets (i.e., duplications), remember cleanup before you count.

Getting efficiency with `groupby`

Large datasets? No worries! groupby is fearless when teamed up with nunique(). It won't create a blackhole in your memory:

# 'large_df' is more like a galaxy, 'Category' is your particular constellation and 'Item' a unique star.
unique_stars = large_df.groupby('Category')['Item'].nunique()

💡 Pro Tip: Yep, it’s like if you run a cosmic vacuum cleaner through your data, picking up only the stars. (Disclaimer: no actual stars are harmed)

Counting unique IDs: `groupby` vs `nunique`

Working with email data? Here is how we count unique users by their email domain:

# 'EmailDomain' is where users live and 'UserID' is their unique name.
df.groupby('EmailDomain')['UserID'].nunique()

💡 Fun Fact: Pandas speaks strip. Before counting, remind it to take out the white spaces from your domains, like picking out the seeds from apples.

Grouping with style: Introducing `agg`

To maintain the original column names in the output (and a bit of sanity), party with agg and nunique:

# Salute to 'Value', our unique guest, maintaining its identity while getting counted.
df.groupby('Group').agg(unique_count=('Value', 'nunique'))

🃏 Easter-egg: Agg's got style. Use it wisely, and it'll make your DataFrame look haute couture.

Data cleaning: no duplicates allowed

It's a unique count party. Duplicates? No entry!

# Twins or triplets, 'Drop_Duplicates' gives same-face-value entries an exit ticket before the party.
df_clean = df.drop_duplicates(subset=['Domain', 'ID'])
df_clean.groupby('Domain')['ID'].nunique()

Single-column party: meet `value_counts`

Life's made simpler and more exciting when the whole party is concentrated around a single column with value_counts():

# 'Domain' is the spotlight.
df['Domain'].value_counts()

💡 Pro Tip: Remember, value_counts() is "The Ruler" when it’s not about multiple groupings.

Distinct values: `unique` or `drop_duplicates`

Getting the guest list before the party? Here is where unique() and drop_duplicates() enter the stage:

df['Domain'].unique()
# or alternatively
df['Domain'].drop_duplicates()

🃏 Joke of the Day: "So, we're unique, like everyone else."

Total uniqueness with `nunique`

To know total number of unique guests at the party, ask nunique():

df['Domain'].nunique()

💡 Pro Tip: Remember, nunique() is more of an introvert. It won't scream the details, only the total unique count.

explain-codes / Python / Count unique values per groups with Pandas

Linked

How to group dataframe rows into list in pandas groupby



Pandas 'count(distinct)' equivalent



Pandas dataframe get first row of each group



Show distinct column values in pyspark dataframe



Count cumulative total in PostgreSQL



Converting a Pandas GroupBy output from Series to DataFrame



How to loop over grouped Pandas dataframe?



Getting efficiency with Counting unique IDs: vs Grouping with style: Introducing Data cleaning: no duplicates allowed Single-column party: meet Distinct values: or Total uniqueness with

Linked

How to group dataframe rows into list in pandas groupby



Pandas 'count(distinct)' equivalent



Pandas dataframe get first row of each group



Show distinct column values in pyspark dataframe



Count cumulative total in PostgreSQL



Converting a Pandas GroupBy output from Series to DataFrame



How to loop over grouped Pandas dataframe?



Was this article helpful?

Count unique values per groups with Pandas

Getting efficiency with groupby

Counting unique IDs: groupby vs nunique

Grouping with style: Introducing agg