Explain Codes LogoExplain Codes Logo

How to Replace Multiple Substrings of a String?

python
regex
string-replacement
optimization
Anton ShumikhinbyAnton Shumikhin·Nov 17, 2024
TLDR

To conduct a swift replacement of multiple substrings, leverage the power of a dictionary and straightforward string replace method:

replacements = {'apple': 'orange', 'banana': 'berry'} text = "apple and banana" for old, new in replacements.items(): text = text.replace(old, new) # Executing fruit surgery print(text) # Output: orange and berry

This looping process directly remolds your string with each key-value substitution of your dictionary. It's as if the text itself undergoes a swift makeover!

Regex to the rescue

When it comes to complex or numerous replacements, the re module and regex become essential, like a Swiss army knife for strings:

import re replacements = {'apple': 'orange', 'banana': 'berry'} pattern = re.compile("|".join(re.escape(key) for key in replacements.keys())) text = "apple and banana in the apple orchard" text = pattern.sub(lambda match: replacements[match.group(0)], text) print(text) # Output: orange and berry in the orange orchard

By creating a single regex pattern using re.compile, we match all the keys collectively. The pattern.sub method then employs a handy lambda function to replace each match, thereby making the string replacements more efficient than a diet plan.

Multilines and ordered substitutions

When working with multiline strings, use re.DOTALL flag. And if the sequence of replacement matters, sort the collection by key length to prevent early substitution of substrings present in larger keys:

pattern = re.compile("|".join(sorted(re.escape(key) for key in replacements.keys(), key=len, reverse=True)), re.DOTALL)

Optimized for large sets

For considerable size of strings or numerous replacements, a single re.compile operation, and reusing the compiled result is your friend. It's almost as if we're investing ahead, Warren Buffet, here we come!

def multiple_replace(text, replacements): # compiling pattern once, then re-use it; a penny saved is a penny earned! pattern = re.compile("|".join(sorted(re.escape(k) for k in replacements, key=len, reverse=True)), re.DOTALL) return pattern.sub(lambda m: replacements[m.group(0)], text) text = "apple and banana in the apple orchard" print(multiple_replace(text, {'apple': 'orange', 'banana': 'berry'}))