Explain Codes LogoExplain Codes Logo

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

python
pandas
dataframe
excel
Anton ShumikhinbyAnton Shumikhin·Dec 9, 2024
TLDR

pd.read_excel() gives all the power in your hands. To fetch all worksheets from Excel workbook as a dict, use sheet_name=None.

import pandas as pd # Abracadabra! All worksheets are in your hands now! data = pd.read_excel('file.xlsx', sheet_name=None)

For certain sheets, replace None with your desired list of sheet names.

# Like a magician just asks for Sheet1 and Sheet2 data = pd.read_excel('file.xlsx', sheet_name=['Sheet1', 'Sheet2'])

Power of pd.ExcelFile for heavy workbooks

Worried about those monstrous Excel files? Use pd.ExcelFile and wave your magic wand, it reads the workbook once and then allows to access any sheet you need.

xlsx = pd.ExcelFile('file.xlsx') # For sheet 1, use parse and voila! sheet1_df = xlsx.parse('Sheet1') # Oh you want Sheet 2 as well? Done! sheet2_df = xlsx.parse('Sheet2')

This spells efficiency and performance!

How to juggle with sheet indices

Don't remember sheet names but know their positions? Use their indices.

# Tell ExcelFile to fetch index 0 and there's your first sheet sheet1_df = xlsx.parse(0)

Cover your back by checking all available sheet names:

# Names, please! print(xlsx.sheet_names)

Handling large beasts (Big data!)

Some beasts are just too big to handle it all, you don't need all that ferocity. Optimize your monster taming:

# Tame the beast by only picking columns 'A', 'C', 'G' df = pd.read_excel('file.xlsx', usecols=['A', 'C', 'G']) # Or tame it by skipping rows df = pd.read_excel('file.xlsx', skiprows=range(1, 20))

Unite and conquer Data integration

Unite multiple worksheets using pd.concat():

import pandas as pd # Cupcakes and cheesecake TOGETHER, can you imagine? combined_df = pd.concat([cupcakes_df, cheesecake_df], ignore_index=True)

For complex unification, march with the flag of merge() and conquer the battlefield:

# Unite them under the flag of 'common_column' result_df = pd.merge(cupcakes_df, cheesecake_df, on='common_column', how='outer')

Advanced magic spells (Retrieval techniques)

Complex times require complex spells:

xlsx = pd.ExcelFile('file.xlsx') # Find the year 2021 in the realm of sheet_names! dfs = {sheet_name: xlsx.parse(sheet_name) for sheet_name in xlsx.sheet_names if '2021' in sheet_name}

Remember all sheet indices in your magical book using enumerate():

# Remember all sheets in your magical book for index, name in enumerate(xlsx.sheet_names): vars()['df' + str(index)] = xlsx.parse(name)

Potion against troubles and optimizations

Keep your pandas updated, only the fresh pandas have the latest magic:

pip install --upgrade pandas

If memory spells trouble, chop your task into chunks:

# Eat your meat in chunks chunk_size = 50000 chunks = pd.read_excel('file.xlsx', chunksize=chunk_size) # Keep chunks in vault or merge them later big_df = pd.concat([chunk for chunk in chunks], ignore_index=True)