Explain Codes LogoExplain Codes Logo

Count cumulative total in PostgreSQL

sql
window-functions
postgresql
data-visualization
Nikita BarsukovbyNikita Barsukov·Feb 3, 2025
TLDR
SELECT date, value, SUM(value) OVER (ORDER BY date) AS cumulative_total FROM your_table;

To get cumulative totals in PostgreSQL, you need to use SUM() OVER clause paired with ORDER BY to compute a running sum of value. Just switch your_table and relevant column names for quick results.

Refining and Optimizing

The above is an elegant starting point, but actual data often throws in complications. Let's tackle those next.

De-duplicating daily data

When you want to count unique daily items such as users, use the DISTINCT clause:

SELECT date, COUNT(DISTINCT email) OVER (ORDER BY date) AS unique_email_count FROM user_table;

This lets John Doe feel super special each day by being counted only once 😎.

Remove pesky overall duplicates

To ensure uniqueness throughout all your data, we can transform our query using DISTINCT ON to ensure each user is counted once throughout:

SELECT date, COUNT(email) OVER (ORDER BY date) AS cumulative_unique_emails FROM ( SELECT DISTINCT ON (email) email, date FROM user_table ORDER BY email, date ) as unique_emails;

This makes sure John Doe remains the star, even if he loves spamming your logs. 🌟

Don't forget, for larger datasets, an index on (email, date) will make your query run as fast as Usain Bolt. ⏱

Handling date gaps through date series generation

If you want to capture data for days without user logins, then generate a series and LEFT JOIN:

WITH date_series AS ( SELECT generate_series(MIN(date)::date, MAX(date)::date, '1 day'::interval)::date as date FROM user_table ) SELECT ds.date, COALESCE(COUNT(ut.email), 0) OVER (ORDER BY ds.date) AS cumulative_total FROM date_series ds LEFT JOIN user_table ut ON ds.date = ut.date GROUP BY ds.date ORDER BY ds.date;

Now, even on the quietest of days, your data would be cooler than a silent monk. 🧘‍♂️

Remember, always ensure your PostgreSQL version supports window functions and use appropriate data types to avoid any surprises.

Advanced Techniques

It's always good to have some extra tools in your PostgreSQL toolbox for more complex scenarios.

Using ROLLUP for subtotals

ROLLUP generates a result set that is similar to the one generated by UNION, but with total aggregates for hierarchical levels:

SELECT date, value, SUM(value) OVER (ORDER BY date) AS cumulative_total FROM your_table GROUP BY ROLLUP(date, value);

This works like an extra pair of glasses, helping you see total values by date and value. 🕶

Comparative cumulative counts with INNER JOIN

Let's say you're keen on comparing counts across different dates. Here's an approach using an INNER JOIN:

SELECT a.date, COUNT(b.email) AS cumulative_count FROM (SELECT DISTINCT date FROM user_table) a INNER JOIN user_table b ON a.date >= b.date GROUP BY a.date ORDER BY a.date;

This architectural marvel will help you see how your user logins stack up over time. 🏗

Including empty dates

It's crucial to maintain data consistency and coherence, and include counts for days without any new logins. Here's how:

WITH recursive cte AS ( SELECT MIN(date)::date as date FROM user_table UNION ALL SELECT date + 1 FROM cte WHERE date < CURRENT_DATE ) SELECT cte.date, COALESCE(SUM(value), 0) OVER (ORDER BY cte.date) AS cumulative_total FROM cte LEFT JOIN user_table ut ON cte.date = ut.date GROUP BY cte.date;

A recursive CTE(WITH RECURSIVE) generates a continuous range of dates, keeping a consistently accurate track of cumulative totals.