Explain Codes LogoExplain Codes Logo

Grouping into interval of 5 minutes within a time range

sql
interval
performance
postgresql
Anton ShumikhinbyAnton Shumikhin·Nov 13, 2024
TLDR

Slice and dice records into 5-minute intervals by playing around with SQL's FLOOR and UNIX_TIMESTAMP functions. These functions will snugly fit your timestamps into neat time slots:

SELECT COUNT(*), FROM_UNIXTIME(FLOOR(UNIX_TIMESTAMP(time_column)/(5*60))*(5*60)) AS interval_start FROM table_name WHERE time_column BETWEEN 'range_start' AND 'range_end' GROUP BY interval_start;

In this practical magic, replace table_name, time_column, range_start, and range_end with your actual values. This Harry Potter spell will dish out a count of records for each 5-minute window in your specified range.

Dealing with interval gaps and large data

To make your query perfect down to the last detail, we'll focus on zero occurrences within intervals, performance optimization, PostgreSQL nuances, and how to present the results in a reader-friendly format.

Zero occurrences handling - the empty bin problem

In the real world, data prefers to play hide-and-seek. Some 5-minute intervals may be having a day off without transactions. You may want to include such lazy guys in results with a count of zero:

-- MySQL approach for generating a series of numbers to fill gaps SELECT TIME_FORMAT( FROM_UNIXTIME( series.series_number * 300 ), '%H:%i' ) AS time_group, COALESCE(SUM(t.record_count), 0) as records_count FROM (SELECT (a.a + (10 * b.a)) * 300 as series_number FROM -- Tony Stark approved this for generating a number series ... ) series LEFT JOIN ( -- Original query here ) t ON t.interval_start = FROM_UNIXTIME(series.series_number) GROUP BY time_group ORDER BY time_group;

Performance boost for large datasets

Like a Lambo, your query should also be fast and furious, especially with large datasets. The best part? You can step up your game without spending a dime by making use of proper indexing and a sharp-eyed WHERE clause:

-- Ensure that 'time_column' is indexed to speed up the WHERE filtering -- Run it once, and ride the speed wave forever! 🏄‍♂️ CREATE INDEX time_column_idx ON table_name(time_column);

PostgreSQL-powered precision

In PostgreSQL, feel like Sherlock by discovering unseen patterns using date_trunc. This function allows precise interval grouping and generate_series will include missing intervals:

-- PostgreSQL makes precision easy peasy lemon squeezy 🍋 SELECT to_char(date_trunc('minute', time_column) - ((EXTRACT(MINUTE FROM time_column)::integer % 5) * interval '1 minute'), 'HH24:MI') AS time_group, COUNT(*) as records_count FROM table_name, generate_series('range_start', 'range_end', interval '5 minutes') as series WHERE time_column BETWEEN series AND series + '5 minutes'::interval GROUP BY time_group ORDER BY time_group;

The different leaps time can make

Depending on the data, timezone, and the data types used, the leap of time can result in unexpected results and performance issues. So, let's dive in and solve a few more tricky scenarios:

Timezone adaptations

When your data sprints across timezones, it can add another layer of complexity. To handle this wild beast, you need to use timezone-aware functions like CONVERT_TZ in MySQL or AT TIME ZONE in PostgreSQL to mark your intervals to a common timezone:

-- MySQL brings the power to convert zones. How cool is that! 🌍 SELECT CONVERT_TZ(time_group, 'SYSTEM', 'UTC') as time_group_utc FROM ( -- Previous GROUP BY query here ) AS derived;

Don't forget the notorious yet overlooked daylight savings! They can potentially shake up your intervals.

Making round-offs your BFF

Rounding can sometimes miscount records at interval borders. But the good news is that these mismatches are preventable if you ensure your rounding strategy aligns well with your data intervals.

Using EXTRACT for precise resolution

For PostgreSQL users, the date_trunc function allows flexible interval resolutions, helping you to group data with precision. Make use of the EXTRACT function to take your data grouping to the next level.

Interval sums

To sum up each interval, aggregation functions are your savior:

-- When it comes to sums, SQL is a real mathematician 🧮 SELECT time_group, SUM(count_column) as sum_count FROM ( -- Subquery with interval grouping here ) AS subquery GROUP BY time_group;