Explain Codes LogoExplain Codes Logo

In SQL, what's the difference between count(column) and count(*)?

sql
count
nulls
joins
Alex KataevbyAlex Kataev·Aug 7, 2024
TLDR

Within the scope of SQL, COUNT(column) notches up non-null entries of a specified column, whereas COUNT(*) tallies all rows, with negligent regard to duplicates and nulls.

SELECT COUNT(column_name) FROM table_name; -- Don't count the null! It's a trap. SELECT COUNT(*) FROM table_name; -- Count it like you mean it. Every row has its day.

COUNT(*) welcomes every record to the counting party, including duplicates and aimless null-containing rows, while COUNT(column) tactfully leaves out null values.

Precision vs Generality: Choose Wisely

Chasing precision with count(column)

COUNT(column) curates exclusive counts of non-null column values, avoiding the extra sprinkling of NULL values. It comes in handy when:

  • You are playing detective, identifying non-NULL duplicates: SELECT column_name FROM table WHERE COUNT(column_name) > 1
  • Counting unique non-NULL values only: SELECT COUNT(DISTINCT column_name) FROM table

Pursuing speed with count(*)

COUNT(*) leaps over individual row data, making use of table indexes for an impressive speed boost. It's the go-to when:

  • Determining the dataset size
  • Conducting data integrity checks (Is everything still intact after that data import?)

Wrestling with NULL

If you have a users table where the email column may contain NULL values:

SELECT COUNT(email), COUNT(*) FROM users;

COUNT(email) delivers the count of users who provided an email, while COUNT(*) sweeps up all users. No email? No problem for COUNT(*).

Counting without hiccups

Swapping COUNT(column_name) with COUNT(*) in a query designed to filter duplicates or specific criteria can play tricks on your query logic, nudging you towards misleading results.

Nifty Tricks & Treats in Count-Land

Vanishing act with empty datasets

Performing a count on an empty set? Watch the null vanish:

SELECT COUNT(column_name) FROM table_name WHERE 1 = 0; -- Returns 0, party cancelled due to lack of guests SELECT COUNT(*) FROM table_name WHERE 1 = 0; -- Also returns 0, because 0 equals 0 ;)

Both return 0, as the WHERE clause cleverly filters out all partygoers.

Counting amidst Change

Transactional databases could be writing (INSERT) or erasing (DELETE) records as you count. Here, COUNT(*) snaps a headcount at that moment, whereas COUNT(column) may miss the latecomers and early leavers.

Index-induced Speed

Lacking a proper index, COUNT(column_name) may get a tad sluggish with full table scans, while COUNT(*) can zip through using optimized shortcuts through indexes.

When Subtleties are Paramount

Group Therapy with count

The GROUP BY clause often teams with COUNT, crunching counts for each group:

  • With COUNT(column), groups sporting all nulls find themselves outside the count.
  • COUNT(*), on the other hand, accurately reflects the group sizes.

Paying heed to Constraints

When columns are shackled with NOT NULL constraints, COUNT(column) and COUNT(*) yield identical counts. From here, the choice hinges on your situation or coding style preferences.

Dealing with Joins

LEFT JOINs might insert NULL values into rows, skewing counts if COUNT(column) is beckoned from the nullable side:

SELECT COUNT(order_items.id), COUNT(*) FROM orders LEFT JOIN order_items ON orders.id = order_items.order_id; -- Navigating the seas of traces left behind by LEFT JOIN.

Here, COUNT(order_items.id) count only orders that have been fulfilled, while COUNT(*) embraces all orders, completed or otherwise.