Explain Codes LogoExplain Codes Logo

How to sort a DataFrame in python pandas by two or more columns?

python
dataframe
sort-values
pandas
Nikita BarsukovbyNikita Barsukov·Oct 28, 2024
TLDR

pandas DataFrame empowers you to sort by multiple columns using sort_values. Combine the field names and sort orders (ascending or descending) inside lists:

import pandas as pd # Assuming 'df' contains your data sorted_df = df.sort_values(by=['f1', 'f2'], ascending=[True, False])

In this one-liner, df gets sorted by f1 in ascending order and 'f2' descending. To handle diverse data types and larger data sets, we delve a bit deeper.

The ins and outs of DataFrame sorting

Large data handling

Larger numeric data sets get a speed boost from numpy.lexsort. Recall, this method requires descending sort order to reverse column order and only functions with numeric data.

Data uniformity

Without consistent data types, sorting results could spring surprises. Run df.dtypes and eliminate surprises with type consistency before sort operations.

Real-time sorting

Make your sorting operation as real-time as a chat conversation by setting the inplace parameter to True:

# Just like grandma's saying: "Neatness saves time, dirt wastes it." df.sort_values(by=['f1', 'f2'], ascending=[True, False], inplace=True)

Measuring performance

Keep performance in check with %timeit which can compare the execution time of different methods.

Multi-indexed sorting

Improve your sorting prowess with pandas' multi-level indexing abilities using set_index followed by sort_values.

Complex sorting

Combine concat and sort_values for performing element-wise concatenation of sorted parts using different criteria.

Numeric sorting using numpy

numpy.lexsort can be your friend with numeric-heavy data, but remember, it needs you to reverse the column order for descending sorts.

Advanced sorting techniques

Custom categories

Use pd.Categorical to define an exact column order for custom sort orders.

Pre-sort filtering

Combine the power of sort_values with query to filter rows before sorting, beneficial for large datasets.

Multiple column sorting

Use the multiple column sort ability of sort_values() function to break tie values in the primary sort column.