Import multiple CSV files into pandas and concatenate into one DataFrame
Pandas and glob can be combined to import data from multiple CSV files and merge it into a single DataFrame. The *.csv
pattern will match all CSV files in a directory, pd.read_csv()
reads each file and pd.concat()
brings them all together:
The real world isn't always tidy, and neither are file directories. Use os.path.join()
with an r prefix to allow for cross-platform compatibility and escape sequence interpretation:
To track the source of data in the final DataFrame, use assign
to add a new identifier column:
For path handling bliss, consider pathlib
to turn paths into easy-to-handle objects:
Let's dig deeper
Taking Concatenation to the Next Level
Save important metadata, like filenames, using the assign
method:
Watch Your Memory
Reading huge files at once can blow your memory budget. Use generator expressions:
The chunksize
can be tuned according to your system's memory.
Dust off those CSVs before Merging
Sometimes CSV files have to be massaged before they fit well together. Preprocess file data according to your needs:
What's next in the journey?
Let's look into the DataFrame Mirror
Use combined_df.describe()
or combined_df.head()
to glance at your beautiful DataFrame creation.
Having Trouble? Let's Debug
Here are some starting points to debug common issues:
- Mismatched columns: Ensure all CSVs share the same column structure.
- Encoding issues: Specify the encoding type within
pd.read_csv()
. - File not found errors: Verify the file path and pattern.
Strive for Efficiency
Try map
or list comprehensions with pd.concat
for efficient, tidy code. Also, remember the power of vectorized operations, when at need of adding new columns.
Was this article helpful?