Explain Codes LogoExplain Codes Logo

Using DISTINCT and COUNT together in a MySQL Query

sql
distinct
count
subqueries
Anton ShumikhinbyAnton Shumikhin·Aug 7, 2024
TLDR

The magic recipe to find unique entries for a particular column in a MySQL table is: use COUNT(DISTINCT column_name). For example, to identify the number of unique **user_id**s in the users table:

/* Show me how special each user is! */ SELECT COUNT(DISTINCT user_id) FROM users;

The SQL Server Trinity: Make your query, get hits, count the true believers (user_ids, in our case).

The anatomy of COUNT(DISTINCT)

Focused counts: Bringing in the WHERE clause

Grab your magic marker and underline this: to zoom into specific conditions, mix COUNT(DISTINCT) with a WHERE clause:

/* Just the active users. Laziness doesn't count in our domain! */ SELECT COUNT(DISTINCT user_id) FROM users WHERE active = 1;

Remember, less is more. Just add clauses that cater to real needs.

SQL Injection: The unnecessary vaccine

Be the good nurse. Don't inject any raw user input into your SQL query. Try prepared statements if you're a PHP fan:

/* Hi there, PHP. Together, we are invincible! */ $stmt = $pdo->prepare('SELECT COUNT(DISTINCT column_name) FROM table WHERE condition = :value'); $stmt->execute(['value' => $userInput]);

This trick outsmarts SQL injection and optimizes query execution. No defense is too much!

Database security: The keymaster's concern

If you're the keymaster, permissions are your wake-up call. Make sure everything tallies with database integrity and security.

Counting subsets: The power of GROUP BY

Large datasets can turn COUNT(DISTINCT) into a resource hog. For an alternative, try subqueries or GROUP BY for partial counts.

SELECT category, COUNT(DISTINCT product_id) FROM products GROUP BY category;

This gives you a unique count per category. It can be faster than a global distinct count, especially if your dataset is a size of an elephant.

The GROUP BY clause: DISTINCT in disguise

GROUP BY inherently ensures each group is unique, so DISTINCT generally likes to take a break during those times.

SELECT type, COUNT(product_id) FROM products GROUP BY type;

Mastering the act of juggling performance in SQL

Efficient substitution with subqueries

Subqueries can be a more efficient alternative for distinct counts, particularly if you need a count of specific subsets:

/* Let the subquery do the heavy lifting. */ SELECT COUNT(1) FROM (SELECT DISTINCT user_id FROM sessions) as unique_sessions;

Here, the sessions table is only referenced once, maximizing efficiency.

The quick trick: Switching to COUNT(1)

Did you hear about COUNT(1)? It is often considered faster than COUNT(*) – more like a hare than a tortoise, relying on lower overhead.

SELECT COUNT(1) FROM (SELECT DISTINCT user_id FROM sessions) as unique_sessions;