How to split a String by space
To split a Java String by spaces, use split("\\s+"):
The words array now contains each word individually. \\s+ effectively matches multiple spaces.
Deep dive into space splitting
Here, we'll dig deeper into nuances related to splitting, patterns, and pre-processing to tackle a wide array of input types.
Whitespaces of all sorts
The regex \s represents any whitespace character, including spaces, tabs, and various hidden characters. + in regex connotes one or more occurrences. Thus, \s+ matches any series of one or more whitespaces.
The output array reads:
Beware of Unicode
Unicode presents a universe of spaces, such as non-breaking spaces. To split strings considering Unicode spaces, use the \p{Blank} regex pattern.
The output array reads:
Trimmed and proper
Trim unwanted spaces
Use trim() before splitting to eliminate pesky spaces at the string's beginning and end.
The output array reads:
Invisible characters aren't really invisible
Invisible characters like control characters can mess with splitting:
Here, words won't split correctly as \u0007 (bell character) isn't a space. Applying the correct pattern is a must.
Race for performance
For repeated use of complex regex patterns, Pattern class pre-compilation fetches you sweet brownie points:
Pattern.compile() is pre-compiled and can be reused, saving on re-compiling time each time split() is called.
Catching the edge cases
No space, no problem
A string with no spaces returns the original string in an array:
The output reads:
Null and empty strings
Null or empty strings result in an array containing the string itself or an empty array respectively.
The output reads:
Always handle null to avoid unhappy runtime exceptions.
Unleashing advanced splitting
Rumbling with Pattern and Matcher
For complex splitting, Matcher comes in handy with Pattern:
The splitting limit
The split function accepts a second optional parameter that dictates the maximum limit of resultant substrings:
The output array reads:
Escaping in regex
Don't forget to escape backslashes (\\) in Java regex patterns. A single backslash ordinarily points to an escape character in string literals.
It's all about tactics!
Immaculate input formatting
Assure no unwanted characters fiddle with your input string formatting:
- Utilize
trim()to deal with leading & trailing spaces. - Run
.replace()methods to eradicate suspect undesirable characters.
Testing is key
Vigorously test different, varied input strings:
- Play around with multiple and single spaces.
- Fiddle with no space cases.
- Pitch in some unicode and special characters.
Pre-compiling: The performance-enhancer
For demanding, enterprise-grade operations:
- Pre-compile regex patterns using
Pattern.compile(). - Reuse compiled patterns for speed and efficiency.
Was this article helpful?