Explain Codes LogoExplain Codes Logo

How to take column-slices of dataframe in pandas

python
pandas
dataframe
slicing
Nikita BarsukovbyNikita Barsukov·Aug 12, 2024
TLDR

To slice columns in a pandas DataFrame, use loc with column names or iloc with column indices. For label-based slicing:

column_slice = df.loc[:, 'B':'C'] # I'm loc-ing in on you, columns B to C!

For position-based slicing:

column_slice = df.iloc[:, 1:3] # Columns, stay still. iloc-ing you down by your index!

Both methods return columns B and C from the DataFrame df.

Selecting exactly what you need

When you don't need consecutive columns, select non-adjacent columns using a list:

selected_columns = df.loc[:, ['A', 'C', 'F']] # A lifetime supply of vitamin A, C, and F!

This will appropriately select only the specified columns, like the precision of a trained sniper!

Slicing: inclusive is exclusive

Python slices are exclusive, but pandas' .loc and .iloc are inclusive of the last element:

# Standard Python list slicing (exclusive): python_slice = myList[0:2] # Python says 'no soup for you' to the last index! # Pandas DataFrame slicing with .loc (inclusive): pandas_slice = df.loc[:, 'B':'C'] # Pandas is more generous, 'B' and 'C', you both are invited!

The fossils of deprecated. features

Always keep your code young and vibrant! The old .ix indexer is deprecated. Let's stick to .loc and .iloc to avoid historical artifacts in our code.

Slice of Life

Don't fear the slice (function). It's the backbone of the sweet syntactic sugar that makes pandas slicing so readable:

# Slice object explicitly used df.loc[:, slice('B', 'C')] # Slice of life looks like this!

Be flexible with reindexing

Reorder your life, as well as your DataFrame columns. Use reindexing to follow a new order that makes more sense:

new_order = ['Profits', 'Sales'] reindexed_df = df.reindex(columns=new_order) # Because it's never too late to prioritise profits!

Our new DataFrame now prioritizes Profits over other columns.

Know your extraction game

With large datasets, getting the data you need is like finding a needle in a haystack. Use iloc to ensure you're targeting the right haystack :

# Fetching columns by their integer position data_subset = df.iloc[:, [0, 2, 5]] # Looking for needles at position 0, 2, and 5!

Slices without slippages

Tread carefully while slicing. A typo or index error can ruin your day. To avoid those, use the columns attribute of your dataframe:

# Using the DataFrame's columns to avoid errors df.loc[:, df.columns[1:4]] # Column names? Naah, real pandas live on the edge with wild positions!

This approach is bulletproof against typos and adjusts dynamically to column changes.