Explain Codes LogoExplain Codes Logo

How do I count the occurrence of a certain item in an ndarray?

python
numpy
pandas
dataframe
Alex KataevbyAlex Kataev·Aug 22, 2024
TLDR

Quickly count an item x in a NumPy array arr:

count = (arr == x).sum()

This np.sum expression counts how many times arr equals x, a simple way to tabulate True results.

Beyond the basics: alternative counting methods

The initial np.sum method is the fastest route, but a few other techniques are versatile for specific scenarios. Let's explore them:

Count occurrences and get unique items with numpy.unique

Generate an inventory of unique items with corresponding counts:

unique, counts = np.unique(arr, return_counts=True) occurrences = dict(zip(unique, counts))

numpy.unique gives both unique items and their counts, a double boon when you need distinct elements and frequency. A Python dict eases the item-quantity retrieval.

Count only non-negative integers using numpy.bincount

An array with non-negative integers is best dealt with using:

counts = np.bincount(arr)

Where np.bincount serves as a efficient tool for counting the frequency of each non-negative integer.

Use a non-NumPy method with collections.Counter

A worthwhile Python standard library alternative is collections.Counter:

from collections import Counter counter = Counter(arr.ravel()) # you shall not pass without a counter

The Counter method formulates a dictionary-like object for each unique item in the array, providing an easy way to keep tally.

Visualization

Let's explore this concept with a hypothetical ndarray.

import numpy as np # Array representation of your items items = np.array(['🌹', '🌻', '🌹', '🌼', '🌹', '🌻', '🌼']) # Counting the roses roses_count = np.sum(items == '🌹') # stop and count the roses

Your counting result:

Item TypeOccurrence
🌹3

Tactical counting techniques

There are cases where you need to count items satisfying specific conditions. Here's how to leverage NumPy's capabilities:

Count occurrences with multiple conditions

Count items that fit multiple criteria using:

count = np.sum((arr > x) & (arr < y)) # Goldilocks counting: not too big, not too small

This code returns the count of elements in arr that are both greater than x and less than y.

Efficiently count non-zero elements with numpy.count_nonzero

If your objective is to count only ones or non-zero elements:

count = np.count_nonzero(arr) # zero is not a hero today

This direct and optimized method utilizes np.count_nonzero.

Powering through with NumPy

You can get more from NumPy, with advanced techniques to enhance computation and provide robust results:

Update array elements during counting

Sometimes, you may want to update array elements based on their frequency:

unique, counts = np.unique(arr, return_counts=True) arr[np.isin(arr, unique[counts < threshold])] = new_value # threshold: the 'You shall not pass!' of numbers

The np.isin function checks for elements below a certain threshold, replacing them with a new_value.

Count occurrences using NumPy in tandem with Pandas

An interplay between NumPy and Pandas can offer a useful perspective:

import pandas as pd # Turning the array into a Pandas series series = pd.Series(arr.ravel()) # Ravel: not just for classical composers # Using Pandas to count occurrences counts = series.value_counts() # who's counting? Pandas!

This shows that Pandas.value_counts is handy when dealing with tabular data and offers an alternative approach. External tools like Pandas complement NumPy's powerful computational abilities.