In Python, how do I split a string and keep the separators?
Meet your solution: use re.split() and capturing groups (()) in your match pattern. This will remarkably retain delimiters making life a lot easier!
Here's an example to keep commas:
Output: ['one', ',', 'two', ',', 'three']
In a nutshell, a regular expression tailored to enclose delimiters in parentheses (capturing groups) instructs re.split() to include them in the fruits of your split.
Dealing with non-alphabets and numbers
Ever wanted to keep those pesky non-alphanumeric separators? Well, distress no more:
Output: ['Hello', ',', ' World', '!', ' 123']
The pattern (([^a-zA-Z0-9])) here signals Python to cherish any character that's neither a letter nor a digit in the returned list.
Ditching regex with a custom function
Forego regex and still get your job done, courtesy - split_and_keep function:
Cleaning up and dealing with outliers
Some quirks while using re.split():
- Empty strings may gatecrash your output if the separator is at the start or end of the string. Show them the door using the strip()method or a list comprehension.
- Unicode characters and escaped sequences in the pattern should be handled deftly to dodge any unexpected anomalies.
When only newline characters matter
Got a simple case where only newline characters need to be preserved? Use the handy splitlines() with the keepends parameter set to True:
Output: ['Line 1\n', 'Line 2\n', 'Line 3']
Joining back the dots or strings
Now, if you want to reassemble the string after the split, a combination of list comprehension and string methods can help you join the tokens and separators:
This returns the cherished string: 'This is a sentence.' with separators in attendance.
Was this article helpful?
