How do I check if there are duplicates in a flat list?
Here's a quick way to find if there are duplicates in your list, aptly named my_list:
has_duplicates tells you if my_list has any repeat offenders. It'll hold True if it finds them lurking there.
The 'unhashable' headache
The set() method capture is a fine net, but it comes with a catch. It wrangles only with elements that are hashable. For the slippery ones that aren’t, it huffs and puffs. Those troublemakers could be lists or dictionaries that just won’t settle into a set. Here's how you square off with them:
It's not as slick, as the tortoise and the hare fable goes, but hey, it gets the job done for unhashable types.
Quick draw with short-circuiting
Performance matters. A lot. For behemoth lists, short-circuiting saves the day. The any() approach above short-circuits, or to put it in plainer terms, it hits the brakes as soon as a duplicate dare show its face.
Another swift-and-smooth approach wields Dennis Otkidach's all_distinct function. It's known to be as swift as a gazelle:
Just chat up with not all_distinct(my_list) if you're on the hunt for duplicates.
Handling Godzilla-sized lists
For lists that weigh as much as Godzilla, memory usage and time complexity turn into an all-out war. Functionality that hoards all the elements does no good.
Meet the unassuming alternative: the functional programming way using functools.reduce:
This one still has O(N) complexity, short-circuits, and doesn’t create an army of interim lists.
Playing detective with collections.Counter
Sometimes, just flagging duplicates isn’t enough. You seek the full scoop. Thankfully, Python’s got your back with collections.Counter. It hands over a dossier with precise counts:
duplicates now reads like a novel about each duplicate and its carbon copies in my_list.
When pandas bring their toolkit
If my_list is kneading data, chances are you’re in the pandas’ sandbox.
Yes, pandas come with their constant factor overhead, but their toolkit is hard to ignore:
A quick glance and you know if there are any panda-cubs in the data-frame.
Staying in the safe lane with doctests
Make sure your function white lies don’t go unnoticed. Include doctests— covert tests hidden as docstrings. They run like soldiers on a mission, ensuring the code behaves:
Simply hit ‘run’ to cover both your back and the function's!
Was this article helpful?