How do I remove duplicates from a list, while preserving order?
To eliminate duplicates from a list while keeping the order intact, the following one-liner is quite handy:
The dict.fromkeys()
method generates an order-preserved dictionary, which is then converted back to a list.
Diving into several methods
Implementing set and dict from Python 3.7+
Starting from Python 3.7, the dict
object retains the insertion order. Hence, we can use it for eliminating duplicates:
In earlier versions of Python (<=3.5), collections.OrderedDict
can be employed to achieve the same result:
Both approaches are simple, Pythonic, and efficient as they don't require any external dependencies.
Using list comprehension with set
A set and list comprehension can be combined for keeping the order with O(1) complexity for membership checks:
In the above statement, the "or" operator enables efficient set updates.
Applying lazy techniques for complex cases
For non-hashable or complicated items, more_itertools’ unique_everseen
does a commendable job:
This code builds a lazy iterator that eliminates duplicates on-demand, useful for huge datasets.
Leveraging Pandas for large data scenarios
Pandas provides an efficient, vectorized approach suitable for large lists:
This can be especially handy for data wrangling tasks due to its versatility and performance.
Optimizing performance and clarity
Picking the right method
Though one-liners might seem attractive, always prioritize readability and the context of usage:
- For simple lists, use dict comprehension or OrderedDict for better readability.
- For large datasets, Pandas can offer fast operations with optimized functions.
- For non-hashable items,
unique_everseen
frommore_itertools
ensures lazy checks.
Tips for best practices
- Use the built-in functions and libraries as much as possible to minimize dependencies.
- Clarity should be valued over complex logic unless code performance demands otherwise.
- Benchmark your code to find the best-suited solution for your use case.
Some notes on performance
- Starting from Python 3.7, a plain dictionary is equivalently fast as an
OrderedDict
for retaining order. - List comprehension can be faster than traditional functions as it avoids function call overhead.
- Using logical operators in list comprehensions can further optimize set updates.
Was this article helpful?