Explain Codes LogoExplain Codes Logo

Counting the number of occurrences of a substring within a string in PostgreSQL

sql
prompt-engineering
best-practices
regexp
Alex KataevbyAlex Kataev·Dec 25, 2024
TLDR

Isn't it delightful when SQL gets the substring counting job done in a jiffy? Witness the magic LENGTH and REPLACE weave. Here's the trick: tally the string length, zap out the substring, count length again, and strike the difference. This reveals the vanished substrings' total length. The ratio of this difference and the substring's length gives you the occurrence_count:

SELECT ((LENGTH(string) - LENGTH(REPLACE(string, 'substring', ''))) / LENGTH('substring')) AS occurrence_count FROM table;

Just plug your string column and substring of interest into the codex, and voilà—occurrence_count is your substring quantity in every table row.

Building on basics

How about we spice things up with some ARRAY_LENGTH and string_to_array action perfect for elongated substrings or avoiding intermediate results storage? Check it out:

SELECT (ARRAY_LENGTH(string_to_array(string, 'substring'), 1) - 1) AS occurrence_count FROM table;

The - 1 corrects the overcount due to additional element introduced by the splitting process. Remember, if your substring is a divider, string_to_array will tally one element more than the actual count.

Consider dynamic result column updates based on another column content for a trial:

UPDATE table SET result = (LENGTH(name) - LENGTH(REPLACE(name, 'substring', ''))) / LENGTH('substring');

This tiny dynamite of a query fires up the result with occurrences count of 'substring' in the name column for each table row!

Dynamic update: A closer look

Time to put on yours Sherlock hat. For nailing dynamic occurrence count, we cozy up with the efficient regexp_replace 'g' option that leaves no stone (instance within the string) unturned:

UPDATE table SET result = (LENGTH(regexp_replace(name, 'substring', '', 'g')) - LENGTH(name)) / (- LENGTH('substring'));

This crime-solving query—with name column as clues—*mysteriously updates our result column. To throw a spanner in the works, we use an unused character as SUBSTITUTE in the *cat-and-mouse game of search and replace.

Delving into the nitty-gritty

Special characters escapade

A substring interspersed with special characters that could be mistaken for regular expressions - sound like a cliffhanger? Fret not, just escape these characters:

SELECT (LENGTH(string) - LENGTH(REPLACE(string, E'\\.', ''))) / LENGTH(E'\\.') AS occurrence_count FROM table;

Replace '\\.' with your escaped characters. And... boom! Mystery solved.

Subquery integration: The how-to

Subqueries come handy while weaving substring counts into other complex queries or UPDATE operations. They are the Sherlock to your Watson, solving the string-count crimes with perfect stride:

UPDATE table SET result = (SELECT count FROM ( SELECT (LENGTH(table.name) - LENGTH(REPLACE(table.name, 'substring', ''))) / LENGTH('substring') AS count FROM table) AS subquery ) WHERE ...;

Potential pitfall down the road

Remember while using string_to_array or REGEXP_REPLACE, keep a sharp eye on the search string as a pattern. Don't let it be the spanner in the works. A unique delimiter for string_to_array keeps you stay ahead of the game.