Explain Codes LogoExplain Codes Logo

Pandas read in table without headers

python
pandas
dataframe
csv
Alex KataevbyAlex Kataev·Aug 14, 2024
TLDR

The quickest way to read a CSV file without headers in Pandas is using pd.read_csv('file.csv', header=None). This will read the first row as data rather than column names. You can then assign column names using df.columns = ['name1', 'name2', ...].

import pandas as pd df = pd.read_csv('file.csv', header=None) # reading magic happening here df.columns = ['name1', 'name2', 'name3'] # poof! you've got column names

Picking out columns using usecols

Sometimes, you might want to extract just a few columns from a large data file. usecols is the hero you need:

df = pd.read_csv('large_file.csv', header=None, usecols=[3, 6]) # cherry-picking columns df.columns = ['fourth_column', 'seven_column'] # more magic. New names!

This tells pandas to pick up only the 4th and 7th columns. Efficiency at its finest!

Challenges beyond CSV files

Dealing with tab-separated values (TSV) or other non-comma-separated files? Use read_table. It is read_csv's attempt at a secret identity.

df = pd.read_table('data.tsv', header=None, sep='\t') # For other delimiters (e.g. semi-colon), do this: df = pd.read_csv('data.csv', header=None, sep=';')

Remember to mark the sep parameter as per the file's delimiter.

Adding headers after reading the file

If you have a change of heart and wish to add column names after reading the file, names is there for you:

headers = ['id', 'value1', 'value2'] df = pd.read_csv('no_header_data.csv', header=None, names=headers)

names allows you to assign your own custom column names. It's never too late!

Avoid cloning in headers

Imagine you're naming your newborn twins. It's tempting to give them identical names, but it can lead to confusion, similar to header duplicates:

# Don't do this: df.columns = ['id', 'data', 'data'] # This is better: df.columns = ['id', 'data1', 'data2']

Avoid duplicates and keep your sanity!

Indexing in the wake of non-existent headers

You can still be the master of organization and index your data:

df = pd.read_csv('file.csv', header=None, index_col=0)

By setting index_col, you're making a column the DataFrame's index. Who needs headers, anyway?

Mixed and missing data types

Every data story has its dark corners. Dirty data. Missing values. Unwelcome types. But, as Python's last line of defense, you're prepared:

df = pd.read_csv('file.csv', header=None, dtype={0: 'int', 1: 'float'}, na_values=['NA'])

Explicit is better than implicit. dtype saves you from unexpected type conversions, and na_values fights off the missing data monsters!