Explain Codes LogoExplain Codes Logo

Postgresql DISTINCT ON with different ORDER BY

sql
distinct-on
postgresql
sql-performance
Anton ShumikhinbyAnton Shumikhin·Jan 16, 2025
TLDR

In PostgreSQL, you can leverage a subquery to retrieve distinct rows filtered by DISTINCT ON, and then apply a different sort order in the outer query.

Quick SQL Example:

SELECT * FROM ( SELECT DISTINCT ON (column1) column1, column2 FROM your_table ORDER BY column1, column2 DESC ) sub ORDER BY column2;

Key Concepts:

  • DISTINCT ON (column1) filters out unique column1 values.
  • The inner ORDER BY column1, column2 DESC determines which row gets selected among each group of column1.
  • The final sort order is determined by the outer ORDER BY column2.

Enough tea? Let's get to the biscuits. 🍪

Optimizing value selection with DISTINCT ON

DISTINCT ON can serve as a powerful tool when your goal is to retrieve the maximum or minimum values within each distinct group. Order the rows of interest within the DISTINCT ON clause, and your job is done!

Getting most recent orders from a list, anyone? 👀

SELECT * FROM ( SELECT DISTINCT ON (customer_id) customer_id, order_id, order_date FROM orders ORDER BY customer_id, order_date DESC ) sub ORDER BY order_date DESC;

Handling complex ordering needs with subqueries

When your order by columns aren't in the DISTINCT ON clause, subqueries are your rescuers. These help you achieve a different order than you used for filtering rows.

Watch this in action:

SELECT * FROM ( SELECT DISTINCT ON (address_id) address_id, product_id, purchase_date FROM purchases ORDER BY address_id, purchase_date DESC ) sub ORDER BY product_id, purchase_date DESC;

Window functions: a savory alternative to DISTINCT ON

DISTINCT ON can be a real star. But it's not always enough. That's when PostgreSQL window functions like ROW_NUMBER(), RANK(), or LAST_VALUE() can step in. These are helpful in defining a window of related rows.

Using window functions to get distinct values:

SELECT column1, column2 FROM ( SELECT column1, column2, ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2 DESC) as rn FROM your_table ) sub WHERE rn = 1 ORDER BY column2;

We've ranked rows within partitions of column1 and picked top-ranked rows from each partition. It's all about being number 1, isn't it?

Subqueries and the performance game

While subqueries are flexible, don't forget they can impact performance. Don't you want optimized query execution? Match, set, go with indexed columns!

Let's look an example:

SELECT * FROM ( SELECT DISTINCT ON (indexed_column) indexed_column, other_column FROM your_table ORDER BY indexed_column, other_column DESC ) sub ORDER BY other_column;

Now with an index on indexed_column, we bet on faster query performance!

The aliasing hack for flexible ordering

When ordering parameters differ from those in the DISTINCT ON clause, aliasing columns can provide the needed flexibility. Your data will maintain its uniqueness but present in the desired order.

Food for thought:

SELECT alias.column1, alias.column2 FROM ( SELECT DISTINCT ON (column1) column1 as col1, column2 as col2 FROM your_table ORDER BY column1, column2 DESC ) alias ORDER BY alias.col2;

See? You can use different aliases in subquery and change the ordering in your outer query without hints of confusion! Smooth, isn't it?

Bossing null handling

SQL can be a bit tricky with NULLs in order operations. But don't worry, PostgreSQL has you covered with NULLS FIRST or NULLS LAST.

Defeating NULL-related unexpected Results:

SELECT DISTINCT ON (column1) column1, column2 FROM your_table ORDER BY column1, column2 DESC NULLS LAST;