In Python, how do I split a string and keep the separators?

python

functions

list-comprehension

string-manipulation

byAnton Shumikhin·Feb 24, 2025

Meet your solution: use re.split() and capturing groups (()) in your match pattern. This will remarkably retain delimiters making life a lot easier!

Here's an example to keep commas:

import re
result = re.split(r'(,)', 'one,two,three')
# And voila! Along with words, we captured commas as well

Output: ['one', ',', 'two', ',', 'three']

In a nutshell, a regular expression tailored to enclose delimiters in parentheses (capturing groups) instructs re.split() to include them in the fruits of your split.

Dealing with non-alphabets and numbers

Ever wanted to keep those pesky non-alphanumeric separators? Well, distress no more:

result = re.split(r'([^a-zA-Z0-9])', 'Hello, World! 123')
# Python says, "Catch 'em all!", where "all" is any non-alphanumeric characters

Output: ['Hello', ',', ' World', '!', ' 123']

The pattern (([^a-zA-Z0-9])) here signals Python to cherish any character that's neither a letter nor a digit in the returned list.

Ditching regex with a custom function

Forego regex and still get your job done, courtesy - split_and_keep function:

def split_and_keep(s, sep):
    # This is where the magic happens
    parts = s.split(sep)
    return [sep.join(part_combination) for part_combination in zip(parts, [''] * (len(parts) - 1) + [sep])]

Cleaning up and dealing with outliers

Some quirks while using re.split():

Empty strings may gatecrash your output if the separator is at the start or end of the string. Show them the door using the strip() method or a list comprehension.
Unicode characters and escaped sequences in the pattern should be handled deftly to dodge any unexpected anomalies.

When only newline characters matter

Got a simple case where only newline characters need to be preserved? Use the handy splitlines() with the keepends parameter set to True:

text = "Line 1\nLine 2\nLine 3"
# Who said you can't have your \n and split it too?
lines = text.splitlines(True)

Output: ['Line 1\n', 'Line 2\n', 'Line 3']

Joining back the dots or strings

Now, if you want to reassemble the string after the split, a combination of list comprehension and string methods can help you join the tokens and separators:

segments = ['This', ' ', 'is', ' ', 'a', ' ', 'sentence', '.']
assembled = ''.join(segments)  # Yeah, we nailed it!

This returns the cherished string: 'This is a sentence.' with separators in attendance.

explain-codes / Python / In Python, how do I split a string and keep the separators?

Linked

Stripping everything but alphanumeric chars from a string in Python



Split string with multiple delimiters in Python



Remove characters except digits from string using Python?



How do I split a string into a list of characters?



How to check if a string is a substring of items in a list of strings



Removing all non-numeric characters from string in Python



Does "\d" in regex mean a digit?



Dealing with non-alphabets and numbers Ditching regex with a custom function Cleaning up and dealing with outliers When only newline characters matter Joining back the dots or strings