Import multiple CSV files into pandas and concatenate into one DataFrame

python

pandas

dataframe

vectorized-operations

byAnton Shumikhin·Feb 23, 2025

Pandas and glob can be combined to import data from multiple CSV files and merge it into a single DataFrame. The *.csv pattern will match all CSV files in a directory, pd.read_csv() reads each file and pd.concat() brings them all together:

import pandas as pd
import glob

# Fetch all CSV files (Feels like a CSV treasure hunt!)
csv_files = glob.glob('*.csv')

# Concatenate into one DataFrame (One DataFrame to rule them all)
combined_df = pd.concat((pd.read_csv(f) for f in csv_files), ignore_index=True)

The real world isn't always tidy, and neither are file directories. Use os.path.join() with an r prefix to allow for cross-platform compatibility and escape sequence interpretation:

import os

# Full path ahead! (Unlike my career...)
file_path = os.path.join(r'your_directory', '*.csv')
csv_files = glob.glob(file_path)

To track the source of data in the final DataFrame, use assign to add a new identifier column:

combined_df = pd.concat(
    (pd.read_csv(f).assign(filename=os.path.basename(f)) for f in csv_files),
    ignore_index=True
)

For path handling bliss, consider pathlib to turn paths into easy-to-handle objects:

from pathlib import Path

# pathlib handling File paths. (It's ridiculously easy. Trust me!)
p = Path(r'your_directory')
csv_files = p.glob('*.csv')
combined_df = pd.concat((pd.read_csv(f) for f in csv_files), ignore_index=True)

Let's dig deeper

Taking Concatenation to the Next Level

Save important metadata, like filenames, using the assign method:

combined_df = pd.concat(
    (pd.read_csv(f).assign(source_file=f) for f in csv_files),
    ignore_index=True
)

Watch Your Memory

Reading huge files at once can blow your memory budget. Use generator expressions:

combined_df = pd.concat(
    (pd.read_csv(f, chunksize=10000) for f in csv_files),
    ignore_index=True
)

The chunksize can be tuned according to your system's memory.

Dust off those CSVs before Merging

Sometimes CSV files have to be massaged before they fit well together. Preprocess file data according to your needs:

def preprocess_file(filename):
    # Insert your data massaging magic here
    df = pd.read_csv(filename)
    # ... some more magic ...
    return df

combined_df = pd.concat(
    (preprocess_file(f) for f in csv_files),
    ignore_index=True
)

What's next in the journey?

Let's look into the DataFrame Mirror

Use combined_df.describe() or combined_df.head() to glance at your beautiful DataFrame creation.

Having Trouble? Let's Debug

Here are some starting points to debug common issues:

Mismatched columns: Ensure all CSVs share the same column structure.
Encoding issues: Specify the encoding type within pd.read_csv().
File not found errors: Verify the file path and pattern.

Strive for Efficiency

Try map or list comprehensions with pd.concat for efficient, tidy code. Also, remember the power of vectorized operations, when at need of adding new columns.

explain-codes / Python / Import multiple CSV files into pandas and concatenate into one DataFrame

Linked

How do I read a large csv file with pandas?



Dump a NumPy array into a csv file



Read file from line 2 or skip header row



Reading binary file and looping over each byte



Pandas left outer join multiple dataframes on multiple columns

