Explain Codes LogoExplain Codes Logo

Group BY and COUNT in PostgreSQL

sql
prompt-engineering
performance
best-practices
Alex KataevbyAlex Kataev·Nov 10, 2024
TLDR

Here's the basic syntax to perform a GROUP BY with COUNT in PostgreSQL:

SELECT column, COUNT(*) AS total FROM table GROUP BY column;

Here's an example where we count order statuses:

SELECT status, COUNT(*) AS total FROM orders GROUP BY status;

This will return the count of each unique status present in the orders table.

Distinct Counts: Uniqueness in Focus

When you're interested in counting distinct values, you'll employ COUNT(DISTINCT column):

SELECT status, COUNT(DISTINCT customer_id) AS unique_customers FROM orders GROUP BY status;

Here each status will correlate to the count of unique customers. It's like turning customers into snowflakes, each one unique!

Performance Tuning: Speedy Gonzales with Indexing

You can reduce query times significantly by creating an index on the columns often used in filtering or joining:

CREATE INDEX idx_column ON table(column);

Consider this the Fast&Furious mode for your PostgreSQL queries.

Avoid Overdoing GROUP BY: Because Moderation is Essential

Avoid resorting to GROUP BY when you're simply counting distinct values. Use COUNT(DISTINCT) instead, like:

SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM orders;

That way the GROUP BY clause takes a vacation, and we get cleaner, perhaps faster code!

Conditional Light Saber: Use the FILTER!

FILTER clause lets you count rows where certain conditions are met:

SELECT status, COUNT(*) FILTER (WHERE refunded) AS refunded_orders, COUNT(*) FILTER (WHERE NOT refunded) AS completed_orders FROM orders GROUP BY status;

Now we're wielding a light saber, nimbly slicing data to count refunded and completed orders for each status.

CTEs and Window Functions: Our Swiss Army Knife

CTEs help structure the query, while Window Functions allow COUNT over a range of rows. They’re PostgreSQL’s Swiss Army knife.

WITH OrderedStatuses AS ( SELECT status, ROW_NUMBER() OVER(PARTITION BY status ORDER BY order_id) AS row_num FROM orders ) SELECT status, COUNT(row_num) AS total FROM OrderedStatuses GROUP BY status;

Our SQL turned from a blunt blade to a poison-dipped tactical dagger, ranking orders within each status before counting.

EXISTS over Aggregate: The Quickening

Leverage EXISTS when you're merely checking for presence rather than counting. It can be swifter:

SELECT status FROM orders o WHERE EXISTS ( SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.region = 'Europe' ) GROUP BY status;

This forms groups of orders that have at least one European customer. It's like picking out ripe fruits without actually counting them.