How do I check if there are duplicates in a flat list?

python

functions

collections

dataframe

byNikita Barsukov·Jan 26, 2025

Here's a quick way to find if there are duplicates in your list, aptly named my_list:

has_duplicates = len(my_list) > len(set(my_list))  # As efficient as a 1-liner gets; elegance in simplicity.

has_duplicates tells you if my_list has any repeat offenders. It'll hold True if it finds them lurking there.

The 'unhashable' headache

The set() method capture is a fine net, but it comes with a catch. It wrangles only with elements that are hashable. For the slippery ones that aren’t, it huffs and puffs. Those troublemakers could be lists or dictionaries that just won’t settle into a set. Here's how you square off with them:

def contains_duplicates(seq):  # Here be dragons...
    seen = []
    return any(i in seen or seen.append(i) for i in seq)  # Show yourself, duplicates!

It's not as slick, as the tortoise and the hare fable goes, but hey, it gets the job done for unhashable types.

Quick draw with short-circuiting

Performance matters. A lot. For behemoth lists, short-circuiting saves the day. The any() approach above short-circuits, or to put it in plainer terms, it hits the brakes as soon as a duplicate dare show its face.

Another swift-and-smooth approach wields Dennis Otkidach's all_distinct function. It's known to be as swift as a gazelle:

def all_distinct(iterable):
    seen = set()
    return not any(i in seen or seen.add(i) for i in iterable)  # Look ma, no duplicates!

Just chat up with not all_distinct(my_list) if you're on the hunt for duplicates.

Handling Godzilla-sized lists

For lists that weigh as much as Godzilla, memory usage and time complexity turn into an all-out war. Functionality that hoards all the elements does no good.

Meet the unassuming alternative: the functional programming way using functools.reduce:

from functools import reduce

def reducer(seen, element):  # See all, know all.
    if element in seen:
        raise ValueError("Duplicate found")  # HALT! Who goes there?
    seen.add(element)
    return seen

try:
    reduce(reducer, set(), my_list)  # An empty set to start with. Nothing like a clean slate, eh?
except ValueError:
    has_duplicates = True
else:
    has_duplicates = False

This one still has O(N) complexity, short-circuits, and doesn’t create an army of interim lists.

Playing detective with `collections.Counter`

Sometimes, just flagging duplicates isn’t enough. You seek the full scoop. Thankfully, Python’s got your back with collections.Counter. It hands over a dossier with precise counts:

from collections import Counter
item_count = Counter(my_list)
duplicates = {item: count for item, count in item_count.items() if count > 1}

duplicates now reads like a novel about each duplicate and its carbon copies in my_list.

When pandas bring their toolkit

If my_list is kneading data, chances are you’re in the pandas’ sandbox. Yes, pandas come with their constant factor overhead, but their toolkit is hard to ignore:

import pandas as pd
df = pd.DataFrame(my_list, columns=['values'])
has_duplicates = df.duplicated().any()

A quick glance and you know if there are any panda-cubs in the data-frame.

Staying in the safe lane with doctests

Make sure your function white lies don’t go unnoticed. Include doctests— covert tests hidden as docstrings. They run like soldiers on a mission, ensuring the code behaves:

def all_distinct(iterable):
    """
    >>> all_distinct([1, 2, 3])  # No duplicate? No problem! 
    True
    >>> all_distinct([1, 2, 2])  # Caught ya, duplicate!
    False
    """
    seen = set()
    return not any(i in seen or seen.add(i) for i in iterable)

import doctest
doctest.testmod()