Explain Codes LogoExplain Codes Logo

Split string with multiple delimiters in Python

python
string-splitting
regex
custom-function
Nikita BarsukovbyNikita Barsukov·Aug 4, 2024
TLDR

No need to beat around the bush. Use re.split() method from Python's re module, for splitting strings on multiple delimiters. Just define a regex pattern, and you're off to the races.

import re text = "apple,banana;orange blueberry" split_list = re.split('[,; ]+', text) # A quick print keeps the doctor away (doctor not included for non-apple strings) print(split_list)

Outputs:

['apple', 'banana', 'orange', 'blueberry']

What is '[,; ]+', you ask? It's a regular expression saying "Look for any of these characters and split at their location. The + part means it should look out for consecutive delimiters.

Splitting strategies with code examples

Regular expressions: The Swiss Army Knife

Regular expressions are like wild magic spells for strings. When delimiters start playing hard to get, regex is your trusty sidekick.

import re complex_text = "apple,banana;orange*blueberry|pear\npeach" complex_split_list = re.split('[,;*|\n]+', complex_text) # Prints the fruits of our labor. Get it? Fruits? Never mind... print(complex_split_list)

Outputs:

['apple', 'banana', 'orange', 'blueberry', 'pear', 'peach']

One step further, you can use regex with re.escape() function. It's like a secret weapon against special characters acting as delimiters.

Mimicking Swiss Army Knife: Non-regex approaches

Let's face it, re can be intimidating. Thankfully, Python's str.replace() and str.split() functions offer a layman's solution to the delimiter conundrum.

text = "apple;banana, orange.blueberry" uniform_text = text.replace('; ', ', ').replace('.', ', ') uniform_split_list = uniform_text.split(', ') print(uniform_split_list)

Outputs:

['apple', 'banana', 'orange', 'blueberry']

Next-level string splitting and performance boost

Speed up with compiled regex

If we are running the same regex operation multiple times, pre-compile the pattern with re.compile().

import re # Pre-compile the regex pattern delimiter_pattern = re.compile(r'[|,;]') text_series = ["apple;banana", "orange,pear", "blueberry|peach"] # Split 'em all! for text in text_series: print(delimiter_pattern.split(text)) # Splitting never felt so good!

Regex patterns: Play it safe with re.escape()

When escape sequences come into play, re.escape() is our Vincent Van Gogh, painting a masterpiece of art, one escape at a time.

special_delimiter = '.|$^*' # Creating regex pattern, now with more safety belts! safe_delimiter = re.escape(special_delimiter) split_pattern = re.compile(f'[{safe_delimiter}]+')

Experiment to get your regex right

Remember the old saying: "With knowledge, comes experimenting with different regex patterns and texts".

Mastering the split nuances

Design custom split function for reusability

If splitting strings has become your day job, create a custom function.

def custom_split(delimiters, text, maxsplit=0): pattern = f'[{"".join(map(re.escape, delimiters))}]+' return re.split(pattern, text, maxsplit=maxsplit) print(custom_split(";, ", "apple;banana, orange; blueberry", maxsplit=2)) # Prints banana;apple, who doesn't like banana apple?

Choose wisely between methods

Some food for thought before you choose your weapon:

  • Complexity of delimiters: Regex brings power but also complexity.
  • Performance needs versus readability and skill level