Explain Codes LogoExplain Codes Logo

Get unique values from a list in python

python
list-comprehensions
set-data-structure
unique-values
Anton ShumikhinbyAnton Shumikhin·Aug 23, 2024
TLDR

To eliminate duplicates in a jiffy, lean on Python's set:

unique = list(set(original_list))

This magical incantation swiftly turns your original_list into an assembly of uniques, then brings it back to the familiar territory of lists.

Feel the need to maintain some order in life? Bring in the dict.fromkeys:

unique_ordered = list(dict.fromkeys(original_list))

Keeping things in sequence

Let's face it — order is important. That's why you have a queue at your favorite café, right? Unless you want to get your set all shuffled (which will happen if you use a bare set). So, preserving the original order while looking for unicorns in your list can be achieved using dict.fromkeys.

# Like the queue at the café, keeps things in order unique_ordered = list(dict.fromkeys(original_list))

This format is efficient and maintains order, ever since dictionaries started having feelings (Python 3.7+ 🎉). But what if you're still living in the past (older Python versions, you know)? The OrderedDict is your way out:

from collections import OrderedDict # It's like a time machine, but cooler unique_ordered = list(OrderedDict.fromkeys(original_list))

One little trade secret — ordering might come at a cost. These methods may not be as fast as the straight-up set method for larger-than-life lists.

The smart and quick way

Ever thought about how to build a new list, but ONLY when the conditions are right? Enter stage — list comprehensions and set:

used = set() # List comprehension, or as I call it - "The Checklist" unique = [x for x in original_list if x not in used and not used.add(x)]

Apart from making you look like a Pythonista, this method is very readable and avoids the seek-and-destroy-duplicates mission inside the list.

The need for speed

Sets in Python are your best friends for storing unique elements. They hate duplicates almost as much as you do. How do they fight 'em so fast? It's all in their hash-based fighting style.

You just simply need to add elements to a set using output.add(elem), and the set will handle the rest. Now that's what I call teamwork!

unique_set = set() for elem in original_list: # Add 'em up in the set - NO duplicates allowed! unique_set.add(elem)

If you don't care about keeping things in order, starting off with a set is like making it to the finish line before even starting the race! 🏎️

Keep count with collections

If you thought Python was just about lists and sets, well, you're in for a treat! It also offers awesome libraries like collections.Counter that can choose the path of counter-strike against duplicates:

from collections import Counter # "Who's your daddy", says Counter counts = Counter(original_list) unique = list(counts)

Welcome to the realm of list comprehension

You can utilize the power of list comprehension to traverse through a list, while enabling a set to track what's already been added:

used = set() # Keep your friends close and duplicates closer unique = [x for x in original_list if x not in used and not used.add(x)]

It provides a neat interface to accumulate a unique collection in an efficient manner. Feels like Python joujutsu, doesn't it? Stay with it to keep those duplicates at bay!

When you want to play by your own rules

What if you need to handle complex objects or apply custom comparison logic while dealing with duplicates? Well, the ropes allow you to:

unique = [] seen = set() for o in original_list: # Just the way you are - Billy Joel identifier = complex_logic_to_unique(o) if identifier not in seen: # Only the first cut is the deepest unique.append(o) seen.add(identifier)

Here, complex_logic_to_unique defines how you recognize an item as unique.

In a galaxy not so far away

Remember, in a galaxy not so far away, giants like NumPy and Pandas offer built-in functionalities like numpy.unique and df.drop_duplicates(). They bring to table more customizable options and are designed with array and DataFrame deduplication in mind. Brace yourself for them!