How to count string occurrence in string?
Easily count search
occurrences in text
employing countOccurrences
. By using RegExp with the 'g' flag the function executes a global search pattern. When nothing is found, it returns 0, thanks to the || operator; otherwise, array length from .match()
signifies the count.
Breaking down the match method and regex flags
The JavaScript .match()
method, when teamed with regular expressions, is a power-tool to find pattern iterations in text data. The search is initially case-sensitive, which helps keep an accurate count, a must-have when dealing with case-significant data.
However, keep in mind if .match()
finds no matches, it returns a null
. To avoid potential errors when trying to return the length of null, we resort to the logical ||
operator leading to an end result of 0.
Alternative: counting using split
If you prefer steering clear from .match()
, consider the .split()
method. Here, you're breaking down the original string into an array at each instance of the substring. Subtract one from the length of this array, and you get the occurrence count.
Please note, .split()
may not behave as you expect when search
is an empty string. Instead of splitting at every character, it gives you an array of the entire string, leading to a potential off-by-one error.
Handling overlapping substrings
Traditional regex searching runs into trouble when dealing with overlapping substrings - no match is currently being made at the overlapping point. To solve it, we have rolled up our sleeves and manually iterate over the text.
The function shown here even handles zero-length search strings, dutifully avoiding infinite loops and ensuring correctness all the way.
Keep an eye on performance
Saving the search
string's length can work wonders for performances. Especially when dealing with monstrous amounts of text, holding onto this number avoids length recalculations on every loop.
Performance tests indicate that some methods outpace regex-based matching:
- Leveraging
.indexOf()
in a loop for non-regex patterns is a winner. - Caching lengths for
text
andsearch
can run circles around uncached versions. - Using the
split()
method sparingly - overuse could lead to performance bumps on large strings.
Was this article helpful?