Explain Codes LogoExplain Codes Logo

Count the number of occurrences of a character in a string

python
collections
functions
performance
Nikita BarsukovbyNikita Barsukov·Aug 30, 2024
TLDR

In Python's str.count(), counts a character's appearances in a string:

count = "banana".count("a") print(count) # Output: 3

No surprises here, "a" appears 3 times in "banana".

Counting within a range

str.count() can be more precise, it counts within a specific portion of the string:

count = "banana".count("a", 1, -1) print(count) # Output: 2 # Two "a"s from 2nd to last character... still looking for the last "a"

Let’s get comprehensive with collections.Counter

If you need a character census:

from collections import Counter counter = Counter("banana") print(counter['a']) # Output: 3 # A-ha, there you are, third "a"!

With Counter, we can even run a full-scale tally.

Ignoring case like a gentleman

Want to count regardless of case?

from collections import Counter counter = Counter("Banana".lower()) print(counter['a']) # Output: 3 # Capital A, lowercase a — no discrimination here!

Use .lower() to treat 'A' and 'a' as the same character.

Regular expressions for flexing (and matching)

And if you love flexibility and pattern matching:

import re count = len(re.findall('a', "BanAna", re.IGNORECASE)) print(count) # Output: 4 # Regex for the win! It's raining a's and A's here.

The re.IGNORECASE flag makes the search a case-blind investigator.

Going beyond the basics

Time is money, let’s get efficient

When processing massive strings, str.count() is usually faster — it's an efficient direct method optimized for such tasks. Regular expressions are a bit more resource-hungry due to their fancy flexibility.

Unicode? No problemo!

If you have Unicode characters in your string:

uni_count = "naïve café".count("é") print(uni_count) # Output: 1 # How many é's? Uno, señor!

Unicode characters are handled seamlessly with str.count().

Additional considerations and edge cases

What’s so special about special characters?

Keep in mind, special characters need escaping in re.findall():

count = len(re.findall('\$', "Cost: $5.00")) print(count) # Output: 1 # One dollar sign coming right up!

The backslash \ signals the dollar sign $ to drop its special character status.

What about overlapping occurrences?

str.count() does not consider overlapping occurrences:

overlapping_count = "aaa".count("aa") print(overlapping_count) # Output: 1 # Even though "aa" is there twice, let's not count our chickens before they hatch.

Despite "aa" appearing twice with overlap, there's only one separate occurrence.