Explain Codes LogoExplain Codes Logo

Removing duplicates in lists

python
remove-duplicates
list-comprehension
python-3.8
Anton ShumikhinbyAnton Shumikhin·Aug 26, 2024
TLDR

Here's the quick way to eliminate duplicates in order from a list using set comprehension:

original = [1, 2, 2, 3, 2, 1] # someone left photocopier on deduped = [x for x in original if not (x in seen := set()) and not seen.add(x)]

Voila! The deduped list: [1, 2, 3] is spic and span, maintaining the integrity of the original.

When items are unhashable

When our list is a party full of unhashable items such as nested lists or dictionaries, a nested loop approach is a lifesaver:

def remove_duplicates(nested_list): unique_list = [] # start with an empty bag for element in nested_list: # double-check if that element is already in the bag if element not in unique_list: # add it only if it's new unique_list.append(element) return unique_list

Now, using remove_duplicates(your_nested_list), we get a squeaky clean list.

Order preservation and performance

From Python 3.8+, order's the new black. Dictionary keys now maintain their orders:

original = [4, 5, 6, 4, 5] # back to back meetings anyone? deduped = list(dict.fromkeys(original))

The deduped list here ends up being [4, 5, 6] keeping its original order. It's like magic, but it's just Python doing what it does best.

Creating reusable functions

Keep it DRY—Don't Repeat Yourself. Here's how to create a helper function to deduplicate any list:

def remove_duplicates(lst): # Look! No repetition. return list(dict.fromkeys(lst)) unique_list = remove_duplicates(original)

This function can be reused throughout your code, promoting readability and maintainability.

Beyond simple lists: Complex scenarios

For more intricate scenarios, where list contains custom objects or you need to remove duplicates based on object attributes, you might need to flex your Python muscles a bit more.

Also, you can pull a masterstroke by using groupby function from itertools module with sorted lists:

from itertools import groupby original = [('apple', 1), ('banana', 2), ('apple', 3)] # life's staples sorted_original = sorted(original, key=lambda x: x[0]) # sorting by the fruit type deduped = [key for key, group in groupby(sorted_original, lambda x: x[0])]

Just like that, deduped gives you, ['apple', 'banana'].