Pandas read in table without headers

python

pandas

dataframe

csv

byAlex Kataev·Aug 14, 2024

The quickest way to read a CSV file without headers in Pandas is using pd.read_csv('file.csv', header=None). This will read the first row as data rather than column names. You can then assign column names using df.columns = ['name1', 'name2', ...].

import pandas as pd
df = pd.read_csv('file.csv', header=None) # reading magic happening here
df.columns = ['name1', 'name2', 'name3'] # poof! you've got column names

Picking out columns using usecols

Sometimes, you might want to extract just a few columns from a large data file. usecols is the hero you need:

df = pd.read_csv('large_file.csv', header=None, usecols=[3, 6]) # cherry-picking columns
df.columns = ['fourth_column', 'seven_column'] # more magic. New names!

This tells pandas to pick up only the 4th and 7th columns. Efficiency at its finest!

Challenges beyond CSV files

Dealing with tab-separated values (TSV) or other non-comma-separated files? Use read_table. It is read_csv's attempt at a secret identity.

df = pd.read_table('data.tsv', header=None, sep='\t') 
# For other delimiters (e.g. semi-colon), do this:
df = pd.read_csv('data.csv', header=None, sep=';')

Remember to mark the sep parameter as per the file's delimiter.

Adding headers after reading the file

If you have a change of heart and wish to add column names after reading the file, names is there for you:

headers = ['id', 'value1', 'value2']
df = pd.read_csv('no_header_data.csv', header=None, names=headers)

names allows you to assign your own custom column names. It's never too late!

Avoid cloning in headers

Imagine you're naming your newborn twins. It's tempting to give them identical names, but it can lead to confusion, similar to header duplicates:

# Don't do this:
df.columns = ['id', 'data', 'data']
# This is better:
df.columns = ['id', 'data1', 'data2']

Avoid duplicates and keep your sanity!

Indexing in the wake of non-existent headers

You can still be the master of organization and index your data:

df = pd.read_csv('file.csv', header=None, index_col=0)

By setting index_col, you're making a column the DataFrame's index. Who needs headers, anyway?

Mixed and missing data types

Every data story has its dark corners. Dirty data. Missing values. Unwelcome types. But, as Python's last line of defense, you're prepared:

df = pd.read_csv('file.csv', header=None, dtype={0: 'int', 1: 'float'}, na_values=['NA'])

Explicit is better than implicit. dtype saves you from unexpected type conversions, and na_values fights off the missing data monsters!

explain-codes / Python / Pandas read in table without headers

Linked

How to add header row to a pandas DataFrame



Load data from txt with pandas



How to skip the headers when processing a csv file using Python?



Python Pandas Error tokenizing data



Replacing Header with Top Row



Python import csv to list



How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?



Picking out columns using usecols Challenges beyond CSV files Adding headers after reading the file Avoid cloning in headers Indexing in the wake of non-existent headers Mixed and missing data types