Explain Codes LogoExplain Codes Logo

Flatten nested dictionaries, compressing keys

python
flatten
dataframe
pandas
Nikita BarsukovbyNikita Barsukov·Jan 26, 2025
TLDR

Flatten your nested dictionary with a recursive function that links keys across different levels. Use sep to set a delimiter, such as an underscore. Below is the Python solution:

def flatten_dict(d, parent_key=''): flat_dict = {} for k, v in d.items(): # Being a Python keyer is like being an office clerk, but you're the office and the clerk. new_key = f"{parent_key}{k}_" if parent_key else k if isinstance(v, dict): flat_dict.update(flatten_dict(v, new_key)) else: flat_dict[new_key[:-1]] = v return flat_dict print(flatten_dict({'a': {'b': {'c': 1}}})) # Output: {'a_b_c': 1}

The function flatten_dict comprehends keys from all levels into a single-layer dictionary with clear, continuous key paths, like a boss 😎.

Flattening higher-level data types

When nested structures contain lists or other advanced types, the basic flattening recipe won't cut it. We can amend the code to handle this:

from collections.abc import MutableMapping def flatten_dict(d, parent_key='', sep='_'): items = [] for k, v in d.items(): new_key = f'{parent_key}{sep}{k}' if parent_key else k if isinstance(v, MutableMapping): # When data blows up in your face, gather the pieces and make something new! items.extend(flatten_dict(v, new_key, sep=sep).items()) elif isinstance(v, list): for i, item in enumerate(v): items.extend(flatten_dict({f'{k}_{i}': item}, parent_key, sep=sep).items()) else: items.append((new_key, v)) return dict(items)

This version embraces the diversity of Python's MutableMapping for robust type checking, making sure we correctly identify dictionary-like structures across different Python versions.

Simplifying complex JSONs

When confronted with a bulky and complex JSON structure, give it a panda hug! Pandas offers a json_normalize method which can flatten these:

import pandas as pd def pandas_flatten(json_dict): # Pandas: for times when Python seems too Pythonic. return pd.json_normalize(json_dict, sep='_').to_dict(orient='records')[0]

This solution converts the flattened dataframe back into a dictionary, maintaining an iterable, highly performant structure.

Countering key collisions

In the flattening process, key collisions may occur. Preempt any such untoward situation by:

  • Appending unique prefixes to keys (e.g., using their level depth).
  • Infusing elements of randomness via random strings or hashes to ensure uniqueness.
  • Considering the data’s context and choosing a meaningful concatenation strategy (e.g., using array index numbers for list elements).

GitHub examples and code snippets

For convenience, a GitHub repository contains all the mentioned code examples. It hosts flattening functions and test implementations for various types of nested dictionaries, including complex JSON structures.

Maximizing itertools and more-itertools

The itertools Python library allows you to efficiently control iterators, an essential aspect of flattening operations. The more-itertools library offers further tools:

  • collapse() handles nested iterables,
  • split_at() breaks up structures based on conditions.

Coupled with the flattening functions, these tools can optimize complex structures and handle deeply nested data.