Convert list of dictionaries to a pandas DataFrame

python

pandas

dataframe

data-structures

byAnton Shumikhin·Dec 21, 2024

Easily convert a list of dictionaries to a DataFrame using pandas:

import pandas as pd
data = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]  #simple, isn't it?
df = pd.DataFrame(data)  #how pandas ate your data... and didn't regret it!

Feed the beast your list data directly to pd.DataFrame() and get a neatly structured DataFrame. The keys of your dicts are now the DataFrame's column headers.

Filling in the gaps and sorting it out

With dictionaries that do not contain the same keys, pandas puts NaN in DataFrame for missing values, kind of like those awkward silences in conversations:

import pandas as pd
data = [{'a': 1, 'b': 2}, {'a': 3, 'c': 5}]  #"b" and "c" playing hide and seek
df = pd.DataFrame(data)  #Welcome to NaN's club for missing "b" or "c"

The column order in the DataFrame is deduced from the sorted order of dictionary keys, kind of like alphabetic seating order at school...remember those days?

The power to index and records to the rescue

In pandas, create your custom index using the index argument:

df = pd.DataFrame(data, index=['first', 'second']) #"first" and "second", because originality!

Or use pd.DataFrame.from_records() for those dictionaries with an attitude:

df = pd.DataFrame.from_records(data) #Transformers: dictionaries in disguise!

Designed for more elaborate transformations, especially when dealing with structured arrays.

Taming the nested beast

Chained or nested dictionaries? No worries, just flatten them with pd.json_normalize():

nested_data = [{'id': 1, 'info': {'name': 'Alice', 'age': 25}}, {'id': 2, 'info': {'name': 'Bob', 'age': 30}}]  #data inception!
df = pd.json_normalize(nested_data, sep='_')  #taming the beast, one level at a time...

Columns like info_name and info_age are flat out telling you what's up.

Flex for the column, not row

Different dictionary directions & orientations? pd.DataFrame.from_dict() comes to the rescue:

df = pd.DataFrame.from_dict(data, orient='columns') #Just flipping dictionaries, you know

Ideal for when a dictionary represents a column not a row.

Old school CSV conversion for control freaks

Sometimes, you may want to go manual and convert to CSV using csv.writer:

import csv

with open('output.csv', 'w', newline='') as file:  #Ready to manual. Get Set Go!
    writer = csv.DictWriter(file, fieldnames=['a', 'b', 'c'])
    writer.writeheader() #"a","b","c"... We got alphabet!
    writer.writerows(data) #Let's row, row, row the boat...

It's less flexible and may remind you of that grueling 1000 piece puzzle, but hey, who am I to judge!

Beware of the oddities of pandas

Versions, datatypes, and performance oh my!

Versions: Methods may have version-dependent features. Always keep an eye on your pandas version or it might turn into a pumpkin.
Data Type Handling: NaN values can subtly change calculations. Keep your dtype conversions sharper than a chef's knife.
Performance: When processing large datasets, adopt performance optimizations like a pro athlete embraces a rigorous training plan.

Pragmatic treasure chest

When JSON strikes!

Got a JSON file? pandas has you covered:

import pandas as pd

with open('recipes.json', 'r') as file: #unboxing the json.
    data = pd.read_json(file) #Smooth as butter!

Dances with timestamps and dates:

df = pd.DataFrame(data, parse_dates=['date_column']) #date whisperer

Ensure your dates are well-behaved within pandas.

Utilize categorical data:

df['category_column'] = df['category_column'].astype('category')  #Binning the data!

Handle categories well; improve performance and reduce memory usage.

explain-codes / Python / Convert list of dictionaries to a pandas DataFrame

Linked

Create a Pandas Dataframe by appending one row at a time



Pandas GroupBy Columns with NaN (missing) Values



Creating a dictionary from a csv file?



Convert a Pandas DataFrame to a dictionary



Delete the first three rows of a dataframe in pandas



Get column index from column name in python pandas



Pandas get rows which are NOT in other dataframe



Filling in the gaps and sorting it out The power to index and records to the rescue Taming the nested beast Flex for the column, not row Old school CSV conversion for control freaks Beware of the oddities of pandas Pragmatic treasure chest