Explain Codes LogoExplain Codes Logo

How can I tell if a string repeats itself in Python?

python
functions
performance
benchmarking
Anton ShumikhinbyAnton Shumikhin·Nov 14, 2024
TLDR

Here's a quick way to detect if a substring is repeated within a string by checking if the original resides within its doubled self (excluding first and last chars):

def is_repeated(s): return s in (s + s)[1:-1] # Usage: print(is_repeated("abcabc")) # True print(is_repeated("abcd")) # False

This clever trick is based on the observation that any repeated sequence will coincide with a part of its doubled version.

Strategies for different usage scenarios

The size and complexity of the strings in question can strongly influence which approach is best suited for detecting repetition. Let's expand our toolkit with more advanced techniques.

Large string sizes: The principal_period function

Here's a method for tackling this problem when dealing with larger strings. The principal_period function leverages Python's inbuilt string.find() method and can identify the shortest repeating substring effectively:

def principal_period(s): i = (s + s).find(s, 1, -1) return None if i == -1 else s[:i] print(principal_period("abcabc")) # 'abc' print(principal_period("abcd")) # None

Comments at Line#3: "Turning the problem on its side...literally!"

Special fan club for Regular Expressions

An alternative method that employs regular expressions may appeal to those who are more regex-inclined, specifically the pattern (.+?)\1+$ used with Python's re.fullmatch():

import re def is_repeated_regex(s): return re.fullmatch(r"(.+?)\1+$", s) is not None print(is_repeated_regex("abcabc")) # True print(is_repeated_regex("abcd")) # False

Comments at Line#3: "Is there an echo in here?"

Going beyond with Algorithmic Enhancements

With a combination of the divmod function and a divisors generator, we can minimize iterations, leading to greater efficiency:

def divisors(n): # Check if the length of string is multiple of any number up to its half i = 1 while i <= n // 2: if (n % i)==0: yield i i += 1 yield n def is_repeated_optimized(s): # Check if the string repeated by the number of times it's length is divided by divisor is equal to string return any(s[:i] * (len(s) // i) == s for i in divisors(len(s))) print(is_repeated_optimized("abcabc")) # True print(is_repeated_optimized("abcd")) # False

Comments at Line#2: "Looking for half of my DNA"

Benchmarking and performance factors

From the viewpoint of performance, functions like principal_period and is_repeated_optimized have shown to be superior. They exhibit at least a 5x speedup for large strings, and in unusual scenarios even a 50x speed difference has been observed. But in most cases, the use of string equality testing has proven to be the fastest:

# Measure time and plot times = [timeit.timeit(setup=code_setup, number=1000) for method in methods] sns.barplot(x=methods, y=times) plt.ylabel('Execution Time (s)') plt.show()

Comments before the plot command: "Dress up your plots, seaborn is in the house!"

Handling edge cases

String handling can be nuanced. Here are some considerations for better handling edge cases.

Empty Strings

Our function should return False for an empty string, as it technically equals an infinite repetition, and we don't want that kind of infinity in our code!

Unicode and Special Characters

Make sure to handle unicode or special character strings correctly, because characters like emojis can also repeat themselves! 😁😁

Memory usage with large strings

Though strings in Python are immutable, string slicing creates new strings, which may not be memory efficient for large strings.

Compatibility with Older Python Versions

If you are stuck with Python 2.x, use // for integer division to avoid the float division introduced in Python 3.