Explain Codes LogoExplain Codes Logo

Remove duplicate dict in a list in Python

python
dataframe
unique-records
performance
Alex KataevbyAlex Kataev·Mar 4, 2025
TLDR

Quickly remove duplicate dictionaries in a list by first changing dictionaries into immutable frozensets, and then using a set to preserve distinctness. Employ a list comprehension for a compressed and practical solution:

list_of_dicts = [{'a': 1}, {'a': 1}, {'b': 2}] # Want a one-liner to save the universe from the duplicate invasion? Here you go! unique_dicts = list({frozenset(d.items()): d for d in list_of_dicts}.values()) print(unique_dicts)

This uses a dictionary comprehension to keep only the latest occurrence of each dictionary, ensuring unique elements in the final outcome.

Approach alternatives for duplicate extermination

Removing duplicates isn't 1D - you need options considering data structures, ordering preferences, or the latest/first appearances. Below are effective alternatives:

Preserving original order

If the sequence of dictionaries matters, opt for collections.ordereddict or iteration_utilities.unique_everseen. Here's a quick usage of unique_everseen:

from iteration_utilities import unique_everseen # Not Harry Potter's spells, but effective for preserving dictionary order! unique_dicts = list(unique_everseen(list_of_dicts, key=lambda d: frozenset(d.items())))

Beware, unhashable items in dictionaries require custom key functions for unique_everseen.

Handy pandas for bulky data

Handling larger datasets effortless? Pandas comes to rescue. It offers methods like drop_duplicates():

import pandas as pd df = pd.DataFrame(list_of_dicts) df = df.loc[df.drop_duplicates().index] # Pandas DataFrame to the dict list - Transformation complete! unique_dict_list = df.to_dict('records')

Pandas shines in cases of identifying duplicates based on selected dataframe columns.

Considering performance

With bulky lists of dictionaries, performance matters. Compare common methods like set conversion versus iteration_utilities or Pandas for your use-case efficiency.

Tackling edgy corner-cases

The basic method works for many cases, but anomalous scenarios may need special handling:

Unhashable elements

Unhashable types like lists may defy typical methods. Convert these into tuples or other suitablehashable forms while preserving the original's essence.

Checking on content vs references

Ensure your approach checks dict contents rather than references. An overlooked detail here might retain unwanted duplicates. Tweaking the uniqueness condition might be necessary.

Potential tuple conversion pitfalls

Converting to tuples assumes all keys and values are immutable/have hashability. However, mutable types like lists or dictionaries as values might upset it.

Key Things-to-Remember

  • Opt for methods capable of handling your unhashable elements.
  • Balance the trade-off between maintaining order and ensuring efficiency.
  • For unordered data, mapping dicts to frozensets could work well.
  • To keep original order with performance in check, iteration_utilities is a good choice.
  • In a Pandas-centric workflow, make good use of Pandas dataframe methods.
  • Benchmark different approaches to find the most efficient method.