Explain Codes LogoExplain Codes Logo

What does axis in pandas mean?

python
dataframe
pandas
numpy
Nikita BarsukovbyNikita Barsukov·Dec 12, 2024
TLDR

In Pandas, the axis parameter indicates the direction the operation follows: axis=0 is downwards across rows (default), which is similar to taking a column-wise aggregation, and use axis=1 for operations that traverse sideways across columns, which equates to a row-wise aggregation.

# Sum columns: it's as simple as falling down. Just SUMMON the strength to fall (axis=0) column_sum = df.sum() # Beware, axis=0 is the default, no joker card needed # Sum rows: these columns ain't loyal, they SUM up sideways (axis=1) row_sum = df.sum(axis=1) # So, you've decided to row, row, row your boat

This leaves you with column_sum as the sum of column entries and row_sum as the sum of row entries.

Exploring with 'index' and 'columns'

While setting the axis parameter, you can use labels: axis='index' or axis='columns' instead of axis=0 or axis=1, respectively. This helps prevent confusion while calculating the mean across rows or columns:

# Mean down the index (same as axis=0): mean_index = df.mean(axis='index') # No axes to grind here, just indexing # Mean across columns (same as axis=1): mean_columns = df.mean(axis='columns') # Columns? More like colum-bros, they stick together!

A deeper dive into advanced usage and pitfalls

Targeting specified rows or columns

As a pythonista, having high precision is crucial, so let's get precise. Using df.loc[] with axis allows you to aim right at specific rows or columns when calculating the mean:

# Calculate mean for specific columns: 'A' and 'B' are more than just letters. specific_mean = df.loc[:, ['A', 'B']].mean(axis=0) # Calculate mean for specific rows: rows aren't always trouble, sometimes they're useful specific_row_mean = df.loc['row_label'].mean()

The domino effect... set_index changes

set_index is like a magic spell — casting it on your DataFrame can change how axis interacts with your data. This comes into play especially when chaining methods:

# Changing the index: the gamechanger move df_with_new_index = df.set_index('new_column') mean_with_new_index = df_with_new_index.mean(axis='index') # New index, who dis?

The art of data concatenation

In the art of concatenation with pd.concat(), the axis parameter plays a vital role. It determines the direction of concatenation and therefore, the structure of your result:

# Concatenate along columns (you get wider...um... yes, more columns): concatenated_columns = pd.concat([df1, df2], axis='columns') # Concatenate along index (you get taller...um... yes, more rows): concatenated_rows = pd.concat([df1, df2], axis='index')

Remember: Concatenation can change everything!

Perspective is key: Try transposition

Changing perspectives, transposing the DataFrame, sometimes helps your data and you both understand each other a little bit better:

# Transposition, a.k.a, the switch: transposed_df = df.T # Did rows becoming cols blow up your mind? Don't worry, it's just an axis illusion!

Biases, axis, and pandas flexibility

Numpy: More than just a relative

Where there's pandas, there's NumPy. And where there's axis usage in pandas, there's NumPy's influence. For NumPy's functions, when axis=None (default), the operation is applied to the entire array — a.k.a, all values in a pandas DataFrame:

# Mean over the entire array in NumPy: np_mean = numpy_array.mean(axis=None) # Equivalent operation in pandas (it's as flat as a pancake here): flat_mean = df.values.flatten().mean()

'Array' is the way

In the realm of data structures, arrays are generally preferred over matrices due to their increased flexibility, and because they portray a clear and friendly view of your data when using pandas:

# Here's why arrays are popular: array_like = df.iloc[:,0] # Just a Series enjoying the single life

The data science beginner’s guide

Axis is the guidepost in the roadmap of Python data science. Be it reducing functions (mean(), sum()) or reshaping functions (stack(), unstack()), without the knowledge of axis, you'd be lost in the pandas forest.