How do I remove duplicates from a list, while preserving order?

python

dataframe

pandas

best-practices

To eliminate duplicates from a list while keeping the order intact, the following one-liner is quite handy:

remove_dupes = lambda lst: list(dict.fromkeys(lst))
unique_list = remove_dupes([1, 2, 2, 3, 4, 4, 5])  # Behold [1, 2, 3, 4, 5], the purified list!

The dict.fromkeys() method generates an order-preserved dictionary, which is then converted back to a list.

Diving into several methods

Starting from Python 3.7, the dict object retains the insertion order. Hence, we can use it for eliminating duplicates:

unique_items = list(dict.fromkeys(your_list))  # "Dict" is my favorite four-letter word

In earlier versions of Python (<=3.5), collections.OrderedDict can be employed to achieve the same result:

from collections import OrderedDict
unique_items = list(OrderedDict.fromkeys(your_list))  # A lifeline for "version-challenged" programmers

Both approaches are simple, Pythonic, and efficient as they don't require any external dependencies.

A set and list comprehension can be combined for keeping the order with O(1) complexity for membership checks:

seen = set()
unique_list = [x for x in your_list if x not in seen and not seen.add(x)]  # Seen.add() or not seen.add(), that's the question

In the above statement, the "or" operator enables efficient set updates.

For non-hashable or complicated items, more_itertools’ unique_everseen does a commendable job:

from more_itertools import unique_everseen
unique_list = list(unique_everseen(your_list))  # Because sometimes, L.A.Z.Y is the right approach

This code builds a lazy iterator that eliminates duplicates on-demand, useful for huge datasets.

Pandas provides an efficient, vectorized approach suitable for large lists:

import pandas as pd
unique_list = pd.Series(your_list).drop_duplicates().tolist()  # Panda-monium for data wranglers

This can be especially handy for data wrangling tasks due to its versatility and performance.

Though one-liners might seem attractive, always prioritize readability and the context of usage:

For simple lists, use dict comprehension or OrderedDict for better readability.
For large datasets, Pandas can offer fast operations with optimized functions.
For non-hashable items, unique_everseen from more_itertools ensures lazy checks.

Use the built-in functions and libraries as much as possible to minimize dependencies.
Clarity should be valued over complex logic unless code performance demands otherwise.
Benchmark your code to find the best-suited solution for your use case.

Starting from Python 3.7, a plain dictionary is equivalently fast as an OrderedDict for retaining order.
List comprehension can be faster than traditional functions as it avoids function call overhead.
Using logical operators in list comprehensions can further optimize set updates.