Explain Codes LogoExplain Codes Logo

Convert list of dictionaries to a pandas DataFrame

python
pandas
dataframe
data-structures
Anton ShumikhinbyAnton Shumikhin·Dec 21, 2024
TLDR

Easily convert a list of dictionaries to a DataFrame using pandas:

import pandas as pd data = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}] #simple, isn't it? df = pd.DataFrame(data) #how pandas ate your data... and didn't regret it!

Feed the beast your list data directly to pd.DataFrame() and get a neatly structured DataFrame. The keys of your dicts are now the DataFrame's column headers.

Filling in the gaps and sorting it out

With dictionaries that do not contain the same keys, pandas puts NaN in DataFrame for missing values, kind of like those awkward silences in conversations:

import pandas as pd data = [{'a': 1, 'b': 2}, {'a': 3, 'c': 5}] #"b" and "c" playing hide and seek df = pd.DataFrame(data) #Welcome to NaN's club for missing "b" or "c"

The column order in the DataFrame is deduced from the sorted order of dictionary keys, kind of like alphabetic seating order at school...remember those days?

The power to index and records to the rescue

In pandas, create your custom index using the index argument:

df = pd.DataFrame(data, index=['first', 'second']) #"first" and "second", because originality!

Or use pd.DataFrame.from_records() for those dictionaries with an attitude:

df = pd.DataFrame.from_records(data) #Transformers: dictionaries in disguise!

Designed for more elaborate transformations, especially when dealing with structured arrays.

Taming the nested beast

Chained or nested dictionaries? No worries, just flatten them with pd.json_normalize():

nested_data = [{'id': 1, 'info': {'name': 'Alice', 'age': 25}}, {'id': 2, 'info': {'name': 'Bob', 'age': 30}}] #data inception! df = pd.json_normalize(nested_data, sep='_') #taming the beast, one level at a time...

Columns like info_name and info_age are flat out telling you what's up.

Flex for the column, not row

Different dictionary directions & orientations? pd.DataFrame.from_dict() comes to the rescue:

df = pd.DataFrame.from_dict(data, orient='columns') #Just flipping dictionaries, you know

Ideal for when a dictionary represents a column not a row.

Old school CSV conversion for control freaks

Sometimes, you may want to go manual and convert to CSV using csv.writer:

import csv with open('output.csv', 'w', newline='') as file: #Ready to manual. Get Set Go! writer = csv.DictWriter(file, fieldnames=['a', 'b', 'c']) writer.writeheader() #"a","b","c"... We got alphabet! writer.writerows(data) #Let's row, row, row the boat...

It's less flexible and may remind you of that grueling 1000 piece puzzle, but hey, who am I to judge!

Beware of the oddities of pandas

Versions, datatypes, and performance oh my!

  • Versions: Methods may have version-dependent features. Always keep an eye on your pandas version or it might turn into a pumpkin.
  • Data Type Handling: NaN values can subtly change calculations. Keep your dtype conversions sharper than a chef's knife.
  • Performance: When processing large datasets, adopt performance optimizations like a pro athlete embraces a rigorous training plan.

Pragmatic treasure chest

When JSON strikes!

Got a JSON file? pandas has you covered:

import pandas as pd with open('recipes.json', 'r') as file: #unboxing the json. data = pd.read_json(file) #Smooth as butter!

Dances with timestamps and dates:

df = pd.DataFrame(data, parse_dates=['date_column']) #date whisperer

Ensure your dates are well-behaved within pandas.

Utilize categorical data:

df['category_column'] = df['category_column'].astype('category') #Binning the data!

Handle categories well; improve performance and reduce memory usage.