Flatten nested dictionaries, compressing keys

python

flatten

dataframe

pandas

byNikita Barsukov·Jan 26, 2025

Flatten your nested dictionary with a recursive function that links keys across different levels. Use sep to set a delimiter, such as an underscore. Below is the Python solution:

def flatten_dict(d, parent_key=''):
    flat_dict = {}
    for k, v in d.items():
        # Being a Python keyer is like being an office clerk, but you're the office and the clerk.
        new_key = f"{parent_key}{k}_" if parent_key else k
        if isinstance(v, dict):
            flat_dict.update(flatten_dict(v, new_key))
        else:
            flat_dict[new_key[:-1]] = v
    return flat_dict

print(flatten_dict({'a': {'b': {'c': 1}}}))  # Output: {'a_b_c': 1}

The function flatten_dict comprehends keys from all levels into a single-layer dictionary with clear, continuous key paths, like a boss 😎.

Flattening higher-level data types

When nested structures contain lists or other advanced types, the basic flattening recipe won't cut it. We can amend the code to handle this:

from collections.abc import MutableMapping

def flatten_dict(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = f'{parent_key}{sep}{k}' if parent_key else k  
        if isinstance(v, MutableMapping):
            # When data blows up in your face, gather the pieces and make something new!
            items.extend(flatten_dict(v, new_key, sep=sep).items())
        elif isinstance(v, list):
            for i, item in enumerate(v):
                items.extend(flatten_dict({f'{k}_{i}': item}, parent_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

This version embraces the diversity of Python's MutableMapping for robust type checking, making sure we correctly identify dictionary-like structures across different Python versions.

Simplifying complex JSONs

When confronted with a bulky and complex JSON structure, give it a panda hug! Pandas offers a json_normalize method which can flatten these:

import pandas as pd

def pandas_flatten(json_dict):
    # Pandas: for times when Python seems too Pythonic.
    return pd.json_normalize(json_dict, sep='_').to_dict(orient='records')[0]

This solution converts the flattened dataframe back into a dictionary, maintaining an iterable, highly performant structure.

Countering key collisions

In the flattening process, key collisions may occur. Preempt any such untoward situation by:

Appending unique prefixes to keys (e.g., using their level depth).
Infusing elements of randomness via random strings or hashes to ensure uniqueness.
Considering the data’s context and choosing a meaningful concatenation strategy (e.g., using array index numbers for list elements).

GitHub examples and code snippets

For convenience, a GitHub repository contains all the mentioned code examples. It hosts flattening functions and test implementations for various types of nested dictionaries, including complex JSON structures.

GitHub Repository - Python Flatten Library

Maximizing itertools and more-itertools

The itertools Python library allows you to efficiently control iterators, an essential aspect of flattening operations. The more-itertools library offers further tools: