Group BY and COUNT in PostgreSQL

sql

prompt-engineering

performance

best-practices

byAlex Kataev·Nov 10, 2024

Here's the basic syntax to perform a GROUP BY with COUNT in PostgreSQL:

SELECT column, COUNT(*) AS total
FROM table
GROUP BY column;

Here's an example where we count order statuses:

SELECT status, COUNT(*) AS total
FROM orders
GROUP BY status;

This will return the count of each unique status present in the orders table.

Distinct Counts: Uniqueness in Focus

When you're interested in counting distinct values, you'll employ COUNT(DISTINCT column):

SELECT status, COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
GROUP BY status;

Here each status will correlate to the count of unique customers. It's like turning customers into snowflakes, each one unique!

Performance Tuning: Speedy Gonzales with Indexing

You can reduce query times significantly by creating an index on the columns often used in filtering or joining:

CREATE INDEX idx_column ON table(column);

Consider this the Fast&Furious mode for your PostgreSQL queries.

Avoid Overdoing `GROUP BY`: Because Moderation is Essential

Avoid resorting to GROUP BY when you're simply counting distinct values. Use COUNT(DISTINCT) instead, like:

SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

That way the GROUP BY clause takes a vacation, and we get cleaner, perhaps faster code!

Conditional Light Saber: Use the FILTER!

FILTER clause lets you count rows where certain conditions are met:

SELECT status,
       COUNT(*) FILTER (WHERE refunded) AS refunded_orders,
       COUNT(*) FILTER (WHERE NOT refunded) AS completed_orders
FROM orders
GROUP BY status;

Now we're wielding a light saber, nimbly slicing data to count refunded and completed orders for each status.

CTEs and Window Functions: Our Swiss Army Knife

CTEs help structure the query, while Window Functions allow COUNT over a range of rows. They’re PostgreSQL’s Swiss Army knife.

WITH OrderedStatuses AS (
    SELECT status,
        ROW_NUMBER() OVER(PARTITION BY status ORDER BY order_id) AS row_num
    FROM orders
)
SELECT status, COUNT(row_num) AS total
FROM OrderedStatuses
GROUP BY status;

Our SQL turned from a blunt blade to a poison-dipped tactical dagger, ranking orders within each status before counting.

EXISTS over Aggregate: The Quickening

Leverage EXISTS when you're merely checking for presence rather than counting. It can be swifter:

SELECT status
FROM orders o
WHERE EXISTS (
    SELECT 1
    FROM customers c
    WHERE c.id = o.customer_id
    AND c.region = 'Europe'
)
GROUP BY status;