How to split a String by space
To split a Java String by spaces, use split("\\s+")
:
The words
array now contains each word individually. \\s+
effectively matches multiple spaces.
Deep dive into space splitting
Here, we'll dig deeper into nuances related to splitting, patterns, and pre-processing to tackle a wide array of input types.
Whitespaces of all sorts
The regex \s
represents any whitespace character, including spaces, tabs, and various hidden characters. +
in regex connotes one or more occurrences. Thus, \s+
matches any series of one or more whitespaces.
The output array reads:
Beware of Unicode
Unicode presents a universe of spaces, such as non-breaking spaces. To split strings considering Unicode spaces, use the \p{Blank}
regex pattern.
The output array reads:
Trimmed and proper
Trim unwanted spaces
Use trim()
before splitting to eliminate pesky spaces at the string's beginning and end.
The output array reads:
Invisible characters aren't really invisible
Invisible characters like control characters can mess with splitting:
Here, words
won't split correctly as \u0007
(bell character) isn't a space. Applying the correct pattern is a must.
Race for performance
For repeated use of complex regex patterns, Pattern
class pre-compilation fetches you sweet brownie points:
Pattern.compile()
is pre-compiled and can be reused, saving on re-compiling time each time split()
is called.
Catching the edge cases
No space, no problem
A string with no spaces returns the original string in an array:
The output reads:
Null and empty strings
Null or empty strings result in an array containing the string itself or an empty array respectively.
The output reads:
Always handle null
to avoid unhappy runtime exceptions.
Unleashing advanced splitting
Rumbling with Pattern and Matcher
For complex splitting, Matcher
comes in handy with Pattern
:
The splitting limit
The split
function accepts a second optional parameter that dictates the maximum limit of resultant substrings:
The output array reads:
Escaping in regex
Don't forget to escape backslashes (\\
) in Java regex patterns. A single backslash ordinarily points to an escape character in string literals.
It's all about tactics!
Immaculate input formatting
Assure no unwanted characters fiddle with your input string formatting:
- Utilize
trim()
to deal with leading & trailing spaces. - Run
.replace()
methods to eradicate suspect undesirable characters.
Testing is key
Vigorously test different, varied input strings:
- Play around with multiple and single spaces.
- Fiddle with no space cases.
- Pitch in some unicode and special characters.
Pre-compiling: The performance-enhancer
For demanding, enterprise-grade operations:
- Pre-compile regex patterns using
Pattern.compile()
. - Reuse compiled patterns for speed and efficiency.
Was this article helpful?