How do I find the duplicates in a list and create another list with them?
Use list comprehension and Counter
from the collections module to quickly find duplicates in a Python list. Here's how you can generate a list of items that appear more than once:
Now the duplicates
list stores the repeated items.
Getting the job done efficiently
For large datasets, efficiency is a big deal. collections.Counter
cuts to the chase by operating in O(n) time, unlike the tortoise-like .count()
method. When dealing with non-hashable items like lists or dictionaries, we'll need a step-by-step (or quadratic) custom solution:
Keeping it scalable
It's important to factor in the nature of your data. Ordered data? Generator-based methods could be your best bet:
An efficient and readable way to handle large lists is using sets and avoiding the use of list.count()
in a loop which leads to an O(nΒ²) complexity. Here's how you do it:
This method passes through the list just once, constantly running a check against a unique set of items.
Additional Pythonic solutions
The Counter
method is handy, sure, but why stop there?
Scoring with the champion chain
Score some quick wins with chain
for speedy operations:
Guard against non-hashables
Dealing with non-hashable elements like dictionaries? Nothing a good loop strategy canβt fix:
Get empowered with Pandas
Because who doesn't love Pandas? This master tool is the go-to option for dealing with varied data types and intricate data structures. It offers robust duplicate handling with tons of added features.
Was this article helpful?