Select count of rows in another table in a Postgres SELECT statement

sql

join

performance

best-practices

byNikita Barsukov·Dec 4, 2024

SELECT
  id,
  (SELECT COUNT(*) FROM related_table WHERE related_table.foreign_key = main_table.id) AS related_count
FROM main_table;

This code snippet depicts how to utilize a correlated subquery and count function to swiftly fetch the count of related rows from another table within your Postgres SELECT statement.

Correlated subquery Vs. JOIN

A correlated subquery might first spring to mind when counting related rows from another table. However, an equally effective method is to use JOINs:

SELECT a.id, COUNT(b.id) as related_count
FROM main_table a
LEFT JOIN related_table b ON a.id = b.foreign_key
GROUP BY a.id;

-- "JOIN the club, it's more fun" - Tables to each other, probably 🎉💃

In this scenario, LEFT JOIN ensures zero counts for main_table records without any related records in related_table. Remember, the GROUP BY clause is crucial when working with aggregate functions alongside data retrieval.

Focusing on performance

Performance is a key factor. Subqueries work great with smaller data sets, but they start to slow things down as the size of data increases. This is where indexing steps in to ensure a speedy counting operation. Make sure the foreign_key in related_table indexed, and you're good to go!

Catering to read and write patterns

A heavily read-oriented database that requires frequent counts can greatly benefit from a trigger-cache solution. A trigger function updates a dedicated count column on main_table whenever changes occur in related_table.

CREATE OR REPLACE FUNCTION update_count()
RETURNS TRIGGER AS $$
BEGIN
  UPDATE main_table
  SET related_count = related_count + 1
  WHERE id = NEW.foreign_key;
  /* Speed? Yes, please! 🚀 */
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_related_count
AFTER INSERT ON related_table
FOR EACH ROW EXECUTE FUNCTION update_count();

-- Look mom, no hands! - PostgreSQL after setting up the trigger, probably 👐🚫

Navigating large datasets

As the dataset grows larger, certain considerations are important to optimize performance:

WHERE clause: Filter before count: Use the WHERE clause to filter out unnecessary records before the counting process.
Sorting and filtering aggregated results: You may decide to use ORDER BY clause for sorting and HAVING clause for filtering aggregate results, but remember these operations can add overhead.
Appreciating PostgreSQL version: Ensure that the features you're using are compatible with your PostgreSQL version.

ORM-based web frameworks

For web developers, web frameworks that have ORM (Object-Relational Mapping) like Django, offers built-in methods to annotate querysets with counts from related tables:

from django.db.models import Count
queryset = MainTable.objects.annotate(related_count=Count('related_table'))

-- There is nothing 'generic' about Django's power! - Django ORM, probably 💪

Query Profiling

Always use EXPLAIN to understand how PostgreSQL is executing your queries. Profiling can tell you if your query is reading more rows than necessary or if an index isn't being fully harnessed.

explain-codes / Sql / Select count of rows in another table in a Postgres SELECT statement

Linked

Select group of rows that match all items in a list



What's the difference between select_related and prefetch_related in Django ORM?



How to do a Postgresql subquery in select clause with join in from clause like SQL Server?



When or why would you use a right outer join instead of left?



Postgresql: using a calculated column in the same query

