How to sort a DataFrame in python pandas by two or more columns?
pandas DataFrame empowers you to sort by multiple columns using sort_values
. Combine the field names and sort orders (ascending or descending) inside lists:
In this one-liner, df
gets sorted by f1
in ascending order and 'f2' descending. To handle diverse data types and larger data sets, we delve a bit deeper.
The ins and outs of DataFrame sorting
Large data handling
Larger numeric data sets get a speed boost from numpy.lexsort
. Recall, this method requires descending sort order to reverse column order and only functions with numeric data.
Data uniformity
Without consistent data types, sorting results could spring surprises. Run df.dtypes
and eliminate surprises with type consistency before sort operations.
Real-time sorting
Make your sorting operation as real-time as a chat conversation by setting the inplace
parameter to True
:
Measuring performance
Keep performance in check with %timeit
which can compare the execution time of different methods.
Multi-indexed sorting
Improve your sorting prowess with pandas' multi-level indexing abilities using set_index
followed by sort_values
.
Complex sorting
Combine concat
and sort_values
for performing element-wise concatenation of sorted parts using different criteria.
Numeric sorting using numpy
numpy.lexsort
can be your friend with numeric-heavy data, but remember, it needs you to reverse the column order for descending sorts.
Advanced sorting techniques
Custom categories
Use pd.Categorical
to define an exact column order for custom sort orders.
Pre-sort filtering
Combine the power of sort_values
with query
to filter rows before sorting, beneficial for large datasets.
Multiple column sorting
Use the multiple column sort ability of sort_values()
function to break tie values in the primary sort column.
Was this article helpful?