Explain Codes LogoExplain Codes Logo

How do I check if there are duplicates in a flat list?

python
functions
collections
dataframe
Nikita BarsukovbyNikita Barsukov·Jan 26, 2025
TLDR

Here's a quick way to find if there are duplicates in your list, aptly named my_list:

has_duplicates = len(my_list) > len(set(my_list)) # As efficient as a 1-liner gets; elegance in simplicity.

has_duplicates tells you if my_list has any repeat offenders. It'll hold True if it finds them lurking there.

The 'unhashable' headache

The set() method capture is a fine net, but it comes with a catch. It wrangles only with elements that are hashable. For the slippery ones that aren’t, it huffs and puffs. Those troublemakers could be lists or dictionaries that just won’t settle into a set. Here's how you square off with them:

def contains_duplicates(seq): # Here be dragons... seen = [] return any(i in seen or seen.append(i) for i in seq) # Show yourself, duplicates!

It's not as slick, as the tortoise and the hare fable goes, but hey, it gets the job done for unhashable types.

Quick draw with short-circuiting

Performance matters. A lot. For behemoth lists, short-circuiting saves the day. The any() approach above short-circuits, or to put it in plainer terms, it hits the brakes as soon as a duplicate dare show its face.

Another swift-and-smooth approach wields Dennis Otkidach's all_distinct function. It's known to be as swift as a gazelle:

def all_distinct(iterable): seen = set() return not any(i in seen or seen.add(i) for i in iterable) # Look ma, no duplicates!

Just chat up with not all_distinct(my_list) if you're on the hunt for duplicates.

Handling Godzilla-sized lists

For lists that weigh as much as Godzilla, memory usage and time complexity turn into an all-out war. Functionality that hoards all the elements does no good.

Meet the unassuming alternative: the functional programming way using functools.reduce:

from functools import reduce def reducer(seen, element): # See all, know all. if element in seen: raise ValueError("Duplicate found") # HALT! Who goes there? seen.add(element) return seen try: reduce(reducer, set(), my_list) # An empty set to start with. Nothing like a clean slate, eh? except ValueError: has_duplicates = True else: has_duplicates = False

This one still has O(N) complexity, short-circuits, and doesn’t create an army of interim lists.

Playing detective with collections.Counter

Sometimes, just flagging duplicates isn’t enough. You seek the full scoop. Thankfully, Python’s got your back with collections.Counter. It hands over a dossier with precise counts:

from collections import Counter item_count = Counter(my_list) duplicates = {item: count for item, count in item_count.items() if count > 1}

duplicates now reads like a novel about each duplicate and its carbon copies in my_list.

When pandas bring their toolkit

If my_list is kneading data, chances are you’re in the pandas’ sandbox. Yes, pandas come with their constant factor overhead, but their toolkit is hard to ignore:

import pandas as pd df = pd.DataFrame(my_list, columns=['values']) has_duplicates = df.duplicated().any()

A quick glance and you know if there are any panda-cubs in the data-frame.

Staying in the safe lane with doctests

Make sure your function white lies don’t go unnoticed. Include doctests— covert tests hidden as docstrings. They run like soldiers on a mission, ensuring the code behaves:

def all_distinct(iterable): """ >>> all_distinct([1, 2, 3]) # No duplicate? No problem! True >>> all_distinct([1, 2, 2]) # Caught ya, duplicate! False """ seen = set() return not any(i in seen or seen.add(i) for i in iterable) import doctest doctest.testmod()

Simply hit ‘run’ to cover both your back and the function's!