How do I check if there are duplicates in a flat list?
Here's a quick way to find if there are duplicates in your list, aptly named my_list
:
has_duplicates
tells you if my_list
has any repeat offenders. It'll hold True
if it finds them lurking there.
The 'unhashable' headache
The set()
method capture is a fine net, but it comes with a catch. It wrangles only with elements that are hashable. For the slippery ones that aren’t, it huffs and puffs. Those troublemakers could be lists or dictionaries that just won’t settle into a set. Here's how you square off with them:
It's not as slick, as the tortoise and the hare fable goes, but hey, it gets the job done for unhashable types.
Quick draw with short-circuiting
Performance matters. A lot. For behemoth lists, short-circuiting saves the day. The any()
approach above short-circuits, or to put it in plainer terms, it hits the brakes as soon as a duplicate dare show its face.
Another swift-and-smooth approach wields Dennis Otkidach's all_distinct function. It's known to be as swift as a gazelle:
Just chat up with not all_distinct(my_list)
if you're on the hunt for duplicates.
Handling Godzilla-sized lists
For lists that weigh as much as Godzilla, memory usage and time complexity turn into an all-out war. Functionality that hoards all the elements does no good.
Meet the unassuming alternative: the functional programming way using functools.reduce
:
This one still has O(N) complexity, short-circuits, and doesn’t create an army of interim lists.
Playing detective with collections.Counter
Sometimes, just flagging duplicates isn’t enough. You seek the full scoop. Thankfully, Python’s got your back with collections.Counter
. It hands over a dossier with precise counts:
duplicates
now reads like a novel about each duplicate and its carbon copies in my_list
.
When pandas bring their toolkit
If my_list
is kneading data, chances are you’re in the pandas’ sandbox.
Yes, pandas come with their constant factor overhead, but their toolkit is hard to ignore:
A quick glance and you know if there are any panda-cubs in the data-frame.
Staying in the safe lane with doctests
Make sure your function white lies don’t go unnoticed. Include doctests— covert tests hidden as docstrings. They run like soldiers on a mission, ensuring the code behaves:
Simply hit ‘run’ to cover both your back and the function's!
Was this article helpful?