Explain Codes LogoExplain Codes Logo

Appending to an empty DataFrame in Pandas?

python
dataframe
pandas
data-manipulation
Anton ShumikhinbyAnton Shumikhin·Dec 23, 2024
TLDR

You can append to an empty DataFrame using .append() for baby-sized data, or pd.concat() for adult-sized data:

import pandas as pd # Empty DataFrame (basically the Loch Ness Monster of data science) df = pd.DataFrame() # Append with a freeloader dict (ignore_index feels like "who needs order?") df = df.append({'column1': 'value1', 'column2': 'value2'}, ignore_index=True) # Append a company of dicts with concat (because we love company parties) rows_to_append = [{'column1': 'value1a', 'column2': 'value2a'}, {'column1': 'value1b', 'column2': 'value2b'}] df = pd.concat([df, pd.DataFrame(rows_to_append)], ignore_index=True)

Switching from append to concat

If you find that .append() leaves you in a performance rut, don't sweat! pd.concat() is your trusty steed for handling extensive data. With this method, you avoid constant DataFrame re-allocations when appending, making your code a lean, mean, performance machine.

# First, make sure your guest list is in order rows_to_append_df = pd.DataFrame(rows_to_append) # Then break out the champagne... df = pd.concat([df, rows_to_append_df], ignore_index=True)

Ensuring your data's in disguise

Before appending, remember to don your detective hat and validate the data you're adding. This means making sure your data is a DataFrame first and foremost, because you wouldn't mix whiskey with water, would you?

# If it's a Series or list crashing the party, convert to DataFrame first series_to_append = pd.Series(['value1c', 'value2c'], index=['column1', 'column2']) df = pd.concat([df, series_to_append.to_frame().T], ignore_index=True)

When append plays the trickster

When tacking on dictionaries directly to your DataFrame, remember to use ignore_index=True. This re-indexes your DataFrame and saves you from the dreaded data misalignment.

# Append without ignoring index, awaiting impending doom df = df.append([{'column1': 'value1d', 'column2': 'value2d'}]) # The hero we need, but don't deserve df = df.append([{'column1': 'value1d', 'column2': 'value2d'}], ignore_index=True)

With great power comes great responsibility: pd.concat()

Once you've made peace with .append(), pd.concat() is the next boss fight. It's not just useful for appending - it offers a more efficient way to restructure data along different axis, and even has parameters to handle column names that just won't play nice.

# Concat to the rescue another_df = pd.DataFrame([{'column1': 'value1e', 'column2': 'value2e'}]) df = pd.concat([df, another_df], ignore_index=True)

Everything you wanted to know about append vs. concat

The legacy of append

.append() is like your favorite, but worn-out old jeans. It still works for quick trials or a small data dance, but for the long haul, you'll start to feel the wear.

Moving forward with concat

pd.concat() is the shiny new pair of trousers in data appending. It can handle various types of data and many rows at once, making it a workhorse for data manipulation.

Avoid the pitfall of data misalignment

Remember to check the data formats for smooth appending operations. You don't want column shifts or surprise type conversions, right? Be sure to run df.dtypes to check the datatypes before adding data to your DataFrame.