Explain Codes LogoExplain Codes Logo

In Python, how do I split a string and keep the separators?

python
functions
list-comprehension
string-manipulation
Anton ShumikhinbyAnton Shumikhin·Feb 24, 2025
TLDR

Meet your solution: use re.split() and capturing groups (()) in your match pattern. This will remarkably retain delimiters making life a lot easier!

Here's an example to keep commas:

import re result = re.split(r'(,)', 'one,two,three') # And voila! Along with words, we captured commas as well

Output: ['one', ',', 'two', ',', 'three']

In a nutshell, a regular expression tailored to enclose delimiters in parentheses (capturing groups) instructs re.split() to include them in the fruits of your split.

Dealing with non-alphabets and numbers

Ever wanted to keep those pesky non-alphanumeric separators? Well, distress no more:

result = re.split(r'([^a-zA-Z0-9])', 'Hello, World! 123') # Python says, "Catch 'em all!", where "all" is any non-alphanumeric characters

Output: ['Hello', ',', ' World', '!', ' 123']

The pattern (([^a-zA-Z0-9])) here signals Python to cherish any character that's neither a letter nor a digit in the returned list.

Ditching regex with a custom function

Forego regex and still get your job done, courtesy - split_and_keep function:

def split_and_keep(s, sep): # This is where the magic happens parts = s.split(sep) return [sep.join(part_combination) for part_combination in zip(parts, [''] * (len(parts) - 1) + [sep])]

Cleaning up and dealing with outliers

Some quirks while using re.split():

  • Empty strings may gatecrash your output if the separator is at the start or end of the string. Show them the door using the strip() method or a list comprehension.
  • Unicode characters and escaped sequences in the pattern should be handled deftly to dodge any unexpected anomalies.

When only newline characters matter

Got a simple case where only newline characters need to be preserved? Use the handy splitlines() with the keepends parameter set to True:

text = "Line 1\nLine 2\nLine 3" # Who said you can't have your \n and split it too? lines = text.splitlines(True)

Output: ['Line 1\n', 'Line 2\n', 'Line 3']

Joining back the dots or strings

Now, if you want to reassemble the string after the split, a combination of list comprehension and string methods can help you join the tokens and separators:

segments = ['This', ' ', 'is', ' ', 'a', ' ', 'sentence', '.'] assembled = ''.join(segments) # Yeah, we nailed it!

This returns the cherished string: 'This is a sentence.' with separators in attendance.