In Python, how do I split a string and keep the separators?
Meet your solution: use re.split()
and capturing groups (()
) in your match pattern. This will remarkably retain delimiters making life a lot easier!
Here's an example to keep commas:
Output: ['one', ',', 'two', ',', 'three']
In a nutshell, a regular expression tailored to enclose delimiters in parentheses (capturing groups) instructs re.split()
to include them in the fruits of your split.
Dealing with non-alphabets and numbers
Ever wanted to keep those pesky non-alphanumeric separators? Well, distress no more:
Output: ['Hello', ',', ' World', '!', ' 123']
The pattern (([^a-zA-Z0-9])
) here signals Python to cherish any character that's neither a letter nor a digit in the returned list.
Ditching regex with a custom function
Forego regex and still get your job done, courtesy - split_and_keep
function:
Cleaning up and dealing with outliers
Some quirks while using re.split()
:
- Empty strings may gatecrash your output if the separator is at the start or end of the string. Show them the door using the
strip()
method or a list comprehension. - Unicode characters and escaped sequences in the pattern should be handled deftly to dodge any unexpected anomalies.
When only newline characters matter
Got a simple case where only newline characters need to be preserved? Use the handy splitlines()
with the keepends
parameter set to True
:
Output: ['Line 1\n', 'Line 2\n', 'Line 3']
Joining back the dots or strings
Now, if you want to reassemble the string after the split, a combination of list comprehension and string methods can help you join the tokens and separators:
This returns the cherished string: 'This is a sentence.'
with separators in attendance.
Was this article helpful?