Counting unique values in a column in pandas dataframe like in Qlik?

python

dataframe

pandas

data-analysis

byAnton Shumikhin·Oct 21, 2024

Accurately quantify distinct elements in a Pandas dataframe column using nunique() for the distinct count or value_counts() for the frequency distribution:

import pandas as pd

# Simulating your real-world data
df = pd.DataFrame({'col': ['X', 'Y', 'X', 'Z', 'Y', 'X']})

# Getting our hands dirty to count distinct elements
distinct_count = df['col'].nunique()

# Rolling up our sleeves to get frequency of each unique value
frequency = df['col'].value_counts()

print(distinct_count)  # You'll see: 3
print(frequency)

Output:

X    3
Y    2
Z    1
Name: col, dtype: int64

Tailored techniques

To master the art of data analysis, you'll need to be dexterous with queries used to calculate different counts. Let's delve into a couple of cutting-edge techniques:

No nulls, no problem!

On occasions, you might want to exclude null values when counting. That's when count() knocks on the door:

# Null values? Not on my watch!
non_null_count = df['col'].count()

Embracing nulls

But hey, sometimes nulls can't be discarded and should be included in the counting spree. Use size as your helmet in such cases:

# All aboard, including nulls!
total_count = df['col'].size

Conditions, conditions, conditions

Boolean indexing is your stealth weapon to land conditional counts:

# Counting X's but excluding imposters (nulls)
x_count = df[df['col'] == 'X']['col'].count()

A toolbox, not a tool

The agg() function is like a potluck dinner- every function can join the feast:

# One tool, multiple uses: story of `agg()`
count_summary = df.agg({'col': ['count', 'nunique', 'size']})

Filter 'n' Count

Provide a filtration mask to your data with query() before counting to extract the most valuable insights:

# Counting after filtering using `query()`: Because clean data is happy data
filtered_count = df.query('col == "X"')['col'].count()

Counting marathon across the dataframe

To run a counting marathon across multiple columns, flex the nunique() muscle without attributing any specific column:

# Count of unique values across the dataframe: because variety is the spice of data!
all_unique_counts = df.nunique()

Count within `groupby()`

When clubbing similar data points with groupby(), use count() to account the data falling under each umbrella:

# Counting group-wise: segregate, count, repeat!
grouped_counts = df.groupby('col')['value'].count()

Remember the golden rule: Always eyeball your dataframe for duplicate entries with df.drop_duplicates(), else your counting game might slip off!

Counting... in style!

Pandas reserves quite a few arrows in its quiver to tackle complex data scenarios. Let's learn to shoot 'em straight!

Stacking for crisp viewing

groupby() can be paired with result stacking to polish the appeal of presentability:

# Paint a better canvas with stacked counts
stacked_counts = df.groupby('col').size().unstack()

Count uniques via `pivot_table`

If summarizing data and counting unique values are your goals, then pivot_table is the basket to keep all your eggs in:

# Create a pivot and count on the fly! 
pivot_count = df.pivot_table(index='col1', columns='col2', aggfunc='nunique')

Efficient handling of large data

When dealing with large data, atlases, use `chunksize' to navigate without outspending on memory:

# Handle large data with `chunksize`. Because size matters!
chunk_counts = pd.read_csv('large_dataset.csv', chunksize=10000)

explain-codes / Python / Counting unique values in a column in pandas dataframe like in Qlik?

Linked

How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?



How do I get the row count of a Pandas DataFrame?



How do I retrieve the number of columns in a Pandas data frame?



What is the difference between count(0), count(1).. and count(*) in mySQL/SQL?



Warning: Null value is eliminated by an aggregate or other SET operation in Aqua Data Studio



Count the frequency that a value occurs in a dataframe column



What does "select count(1) from table_name" on any database tables mean?



Tailored techniques Counting... in style!