Explain Codes LogoExplain Codes Logo

Select count of rows in another table in a Postgres SELECT statement

sql
join
performance
best-practices
Nikita BarsukovbyNikita Barsukov·Dec 4, 2024
TLDR
SELECT id, (SELECT COUNT(*) FROM related_table WHERE related_table.foreign_key = main_table.id) AS related_count FROM main_table;

This code snippet depicts how to utilize a correlated subquery and count function to swiftly fetch the count of related rows from another table within your Postgres SELECT statement.

Correlated subquery Vs. JOIN

A correlated subquery might first spring to mind when counting related rows from another table. However, an equally effective method is to use JOINs:

SELECT a.id, COUNT(b.id) as related_count FROM main_table a LEFT JOIN related_table b ON a.id = b.foreign_key GROUP BY a.id;

-- "JOIN the club, it's more fun" - Tables to each other, probably 🎉💃

In this scenario, LEFT JOIN ensures zero counts for main_table records without any related records in related_table. Remember, the GROUP BY clause is crucial when working with aggregate functions alongside data retrieval.

Focusing on performance

Performance is a key factor. Subqueries work great with smaller data sets, but they start to slow things down as the size of data increases. This is where indexing steps in to ensure a speedy counting operation. Make sure the foreign_key in related_table indexed, and you're good to go!

Catering to read and write patterns

A heavily read-oriented database that requires frequent counts can greatly benefit from a trigger-cache solution. A trigger function updates a dedicated count column on main_table whenever changes occur in related_table.

CREATE OR REPLACE FUNCTION update_count() RETURNS TRIGGER AS $$ BEGIN UPDATE main_table SET related_count = related_count + 1 WHERE id = NEW.foreign_key; /* Speed? Yes, please! 🚀 */ RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER update_related_count AFTER INSERT ON related_table FOR EACH ROW EXECUTE FUNCTION update_count();

-- Look mom, no hands! - PostgreSQL after setting up the trigger, probably 👐🚫

As the dataset grows larger, certain considerations are important to optimize performance:

  • WHERE clause: Filter before count: Use the WHERE clause to filter out unnecessary records before the counting process.
  • Sorting and filtering aggregated results: You may decide to use ORDER BY clause for sorting and HAVING clause for filtering aggregate results, but remember these operations can add overhead.
  • Appreciating PostgreSQL version: Ensure that the features you're using are compatible with your PostgreSQL version.

ORM-based web frameworks

For web developers, web frameworks that have ORM (Object-Relational Mapping) like Django, offers built-in methods to annotate querysets with counts from related tables:

from django.db.models import Count queryset = MainTable.objects.annotate(related_count=Count('related_table'))

-- There is nothing 'generic' about Django's power! - Django ORM, probably 💪

Query Profiling

Always use EXPLAIN to understand how PostgreSQL is executing your queries. Profiling can tell you if your query is reading more rows than necessary or if an index isn't being fully harnessed.