List of unique dictionaries

python

data-integrity

list-comprehension

numpy

byNikita Barsukov·Sep 12, 2024

Remove duplicates from a list of dictionaries by converting each into a hashable frozenset, add these to a set to inherently remove duplicates, then revert the frozenset objects back into dictionaries.

Here’s the Pythonic one-liner:

dicts = [{'a': 1, 'b': 2}, {'b': 2, 'a': 1}, {'a': 2, 'b': 3}]
unique = [dict(t) for t in set(frozenset(d.items()) for d in dicts)]
print(unique)

Result: Each dictionary will now appear only once, regardless of key order.

Techniques for achieving uniqueness

Now let's dive into more detailed techniques that can be used to increase efficiency and maintain data integrity in achieving uniqueness in a list of dictionaries.

Using `dict comprehension`

In Python 2.7 and above, key-based filtering of duplicates is achieved efficiently using dictionary comprehension. Here's how:

# this makes 'id' feel very special
unique_dict_list = list({v['id']: v for v in dict_list}.values())

This code preserves the last occurrence of each 'id', an ideal method when all dictionaries in the list share a distinct key.

Applying 'JSON serialization'

When dealing with complex dictionary structures, convert them into json strings, which can be used for hashing and comparison:

import json
# Converts dict to JSON: "Easy come, easy go, will you let me go? JSON! No, we will not let you go"
unique_json = {json.dumps(d, sort_keys=True): d for d in dicts}.values()

Harnessing the power of 'numpy'

If you're dealing with larger datasets, using the numpy library can provide a high-performance solution:

import numpy as np
# numpy's battle cry: "Size doesn't intimidate me!"
dicts_array = np.array([tuple(d.items()) for d in dicts])
unique_indices = np.unique(dicts_array, axis=0, return_index=True)[1]
unique_dicts = [dicts[i] for i in unique_indices]

This technique involves converting the dictionaries to numpy arrays, but is quite efficient for larger datasets.

Starting state:

You have a pile of puzzle pieces (your dictionaries):

🧩🧩🧩🧩🧩🧩

Goal:

You want a set of unique pieces:

🧩🧩🧩

Sorting:

Sort the pieces and keep the unique ones:

unique_pieces = set(tuple(sorted(d.items())) for d in mosaic)

Final state:

The result is a set of unique puzzle pieces:

🧩1, 🧩2, 🧩3

This is your list of unique dictionaries!

Maintaining consistency in dictionary keys

Identifying mismatched keys

Ensure that the keys of the dictionaries remain consistent throughout. An uneven distribution of keys can lead to erroneous duplicate filtering.

Leveraging lexicographical sorting

When dictionaries contain similar but not identical sets of keys, sorting the dictionary entries lexicographically could be a viable alternative:

# Time for the Key-sort jigsaw puzzle!
sorted_dicts = sorted(dicts, key=lambda d: tuple(sorted(d.items())))
unique_dicts_by_keys = [dict(t) for t in {tuple(d.items()) for d in sorted_dicts}]