How do I find the duplicates in a list and create another list with them?

python

prompt-engineering

best-practices

dataframe

byAlex Kataev·Feb 12, 2025

Use list comprehension and Counter from the collections module to quickly find duplicates in a Python list. Here's how you can generate a list of items that appear more than once:

from collections import Counter

# Original list potentially packed with duplicates
original_list = ['🍎', '🍏', '🍊', '🍏', '🍎']

# An apple a day keeps the doctor away, but 2 apples a day gives us duplicates!
duplicates = [item for item, count in Counter(original_list).items() if count > 1]

# Output: ['🍎', '🍏']
print(duplicates)

Now the duplicates list stores the repeated items.

Getting the job done efficiently

For large datasets, efficiency is a big deal. collections.Counter cuts to the chase by operating in O(n) time, unlike the tortoise-like .count() method. When dealing with non-hashable items like lists or dictionaries, we'll need a step-by-step (or quadratic) custom solution:

# A list of dictionaries that may include duplicates
original_list = [{'apple': 1}, {'banana': 2}, {'apple': 1}]

seen = []
duplicates = []

# Each dictionary we've seen before is a duplicate
for dict_item in original_list:
    if dict_item in seen:
        duplicates.append(dict_item)
    else:
        seen.append(dict_item)

Keeping it scalable

It's important to factor in the nature of your data. Ordered data? Generator-based methods could be your best bet:

from more_itertools import unique_everseen

# Identifying duplicates while retaining order
unique_everseen(duplicates(original_list))

An efficient and readable way to handle large lists is using sets and avoiding the use of list.count() in a loop which leads to an O(n²) complexity. Here's how you do it:

unique_items = set()
duplicates = []

# A set has no duplicates, so it can't take a joke
for item in original_list:
    if item in unique_items:
        duplicates.append(item)
    else:
        unique_items.add(item)

This method passes through the list just once, constantly running a check against a unique set of items.

Additional Pythonic solutions

The Counter method is handy, sure, but why stop there?

Scoring with the champion `chain`

Score some quick wins with chain for speedy operations:

from iteration_utilities import chain

# Chaining "unique_everseen" and "duplicates" for that Formula 1 pitstop-level efficiency
chain_unique_duplicates = list(chain(unique_everseen, duplicates(original_list)))

Guard against non-hashables

Dealing with non-hashable elements like dictionaries? Nothing a good loop strategy can’t fix:

seen = []
duplicates = []

# Peek-a-boo with each non-hashable dictionary
for dict_item in original_list:
    if dict_item not in seen:
        seen.append(dict_item)
    else:
        duplicates.append(dict_item)

Get empowered with Pandas

Because who doesn't love Pandas? This master tool is the go-to option for dealing with varied data types and intricate data structures. It offers robust duplicate handling with tons of added features.

import pandas as pd

# Original list on a pandas-powered diet
series = pd.Series(original_list)

# Et voilà! Spot the duplicates like a pro
duplicates_series = series[series.duplicated()]