Explain Codes LogoExplain Codes Logo

Simple way to calculate median with MySQL

sql
join
best-practices
performance
Alex KataevbyAlex Kataev·Sep 26, 2024
TLDR

Calculate the median in MySQL using a lean subquery and AVG function to find the middle values. Here, we order the data, find the halfway point and apply the average when needed. Result? A quick median.

SELECT AVG(val) as median FROM ( SELECT val FROM data ORDER BY val LIMIT 1 OFFSET (SELECT COUNT(*) FROM data) DIV 2 ) AS subquery;

Just swap data with your table name and val with the column in question. This approach covers both odd and even number of rows.

Unpacking the subquery: Median calculation explained

Breaking down the process

The previous fast answer section gave you an immediate solution. Now, let's break it down:

  1. Sort rows by using ORDER BY val.
  2. Count total rows with SELECT COUNT(*) FROM data.
  3. Skip rows to reach the middle with LIMIT 1 OFFSET.
  4. If you have an even number of rows, no worries! The AVG function averages the two middle values.

An easy way for MariaDB 10.3.3 users

If you're a MariaDB 10.3.3 or newer user, life is simpler! Use the MEDIAN function instead of the subquery:

SELECT MEDIAN(val) FROM data;

The function is smartly optimized. Leaves all the hard work to MariaDB and just gives you the median.

Deeper dive: Advanced techniques and optimizations

Large dataset? No problem.

Dealing with a huge dataset? Efficiency is your best friend. Indexing columns and using EXPLAIN will help craft a better query execution plan.

Smart SQL for smart results

Some SQL patterns can find the middle value without sorting. An example would be using SIGN and COUNT functions in the query. Sounds cool, huh?

SELECT d.val FROM data d WHERE 2 * (SELECT COUNT(*) FROM data WHERE val < d.val) < COUNT(*) AND 2 * (SELECT COUNT(*) FROM data WHERE val <= d.val) >= COUNT(*) LIMIT 1;

This technique compares counts on either side of the value to find the sweet spot in the middle.

A live median calculator

If your data keeps changing and you need an up-to-date median, consider calculating it on-the-fly.

SELECT d.val AS median FROM ( SELECT data.val, @rownum:=@rownum+1 AS 'rank' FROM data, (SELECT @rownum:=0) r ORDER BY data.val ) AS sorted_data WHERE sorted_data.rank IN (FLOOR((@rownum+1)/2), CEIL((@rownum+1)/2));

This on-the-move approach ensures your median isn't held hostage by outdated data.