Explain Codes LogoExplain Codes Logo

How to split a string, but also keep the delimiters?

java
prompt-engineering
best-practices
functions
Nikita BarsukovbyNikita Barsukov·Sep 14, 2024
TLDR

Use Java's Pattern class with a regex containing positive lookahead (?=regex) and positive lookbehind (?<=regex) assertions. Use "(?<=delimiter)|(?=delimiter)" as your pattern. Check out this elegant code snippet:

String input = "Hello, world! How's it going?"; String[] result = Pattern.compile("(?<=,)|(?=,)") .splitAsStream(input) .filter(s -> !s.isEmpty()) .toArray(String[]::new);

The focus here is commas as delimiters. Swap ", " with your particular delimiter to fetch the expected outcome.

Handling dynamic delimiters: String.format in action!

Working with various delimiters in the mix? String.format comes to the rescue, enabling injection of specific delimiters into your regex patterns:

String delimiter = ","; // Your dynamic delimiter, as unpredictable as a Schrödinger's cat. String pattern = String.format("(?<=%1$s)|(?=%1$s)", Pattern.quote(delimiter)); String[] result = Pattern.compile(pattern) .splitAsStream(input) .toArray(String[]::new);

Calling Pattern.quote(delimiter) ensures that your delimiter is treated as a literal string, dodging conflicts with regex special characters.

Enhancing readability in complex regex

Keeping your regex readable promotes maintainability. Consider this:

String complexDelimiter = ",|;|\\.|!"; String pattern = String.format("(?<=%1$s)|(?=%1$s)", complexDelimiter); String[] result = Pattern.compile(pattern) .splitAsStream(input) .toArray(String[]::new);

Defining complexDelimiter holds several delimiters separated by the pipe character |, improving the code's readability.

Pattern and Matcher: Your allies in string manipulation

For those times when String.split() is just too mainstream, meet the Pattern and Matcher classes:

Pattern pattern = Pattern.compile("yourRegexHere"); // Regular expressions so regular they pay taxes. Matcher matcher = pattern.matcher(input); // In the red corner, the reigning, defending, world champion - the input string! List<String> matchList = new ArrayList<>(); while (matcher.find()) { // Each loop is a round in the boxing ring. matchList.add(matcher.group()); // Gotcha! }

Considering edge cases

Beware of empty elements at the beginning or end of your result array. Filters to the rescue:

.filter(s -> !s.isEmpty()) // Only the strong survive in this world... of strings!

StringTokenizer: Keeping the tokens together

Say hello to the StringTokenizer class, sparing no delimiter:

StringTokenizer tokenizer = new StringTokenizer(input, delimiter, true); // True - to keep delimiters on track. while (tokenizer.hasMoreTokens()) { matchList.add(tokenizer.nextToken()); // All aboard the delimiter express! }

Setting true instructs StringTokenizer to return delimiters as tokens.

Customizability with Guava's Splitter

Sometimes, a little vanity doesn't hurt. Customize your splitting with Guava's Splitter:

Iterable<String> parts = Splitter.on(pattern) .trimResults() .omitEmptyStrings() .split(input);

Simplifying complexity: Back-to-basic with replace

Sometimes keeping it simple is the best approach:

input = input.replace(delimiter, delimiter + "\0"); // + "\0" - There's always a room for one more. String[] parts = input.split("\0"); // Splitville, here we come!

Pump up the uniqueness - use unique characters as placeholders for safe splitting and keeping those precious delimiters intact.