Explain Codes LogoExplain Codes Logo

How to split a String by space

java
regex
string-splitting
performance
Anton ShumikhinbyAnton Shumikhin·Sep 23, 2024
TLDR

To split a Java String by spaces, use split("\\s+"):

String[] words = "Split by spaces".split("\\s+");

The words array now contains each word individually. \\s+ effectively matches multiple spaces.

Deep dive into space splitting

Here, we'll dig deeper into nuances related to splitting, patterns, and pre-processing to tackle a wide array of input types.

Whitespaces of all sorts

The regex \s represents any whitespace character, including spaces, tabs, and various hidden characters. + in regex connotes one or more occurrences. Thus, \s+ matches any series of one or more whitespaces.

// We love spaces, don't we? String[] words = "Split by spaces".split("\\s+");

The output array reads:

[ "Split", "by", "spaces" ]

Beware of Unicode

Unicode presents a universe of spaces, such as non-breaking spaces. To split strings considering Unicode spaces, use the \p{Blank} regex pattern.

// Trying to split by an alien non-breaking space? No worries, Java got you! String[] words = "Split\u00A0by space".split("\\p{Blank}+");

The output array reads:

[ "Split", "by", "space" ]

Trimmed and proper

Trim unwanted spaces

Use trim() before splitting to eliminate pesky spaces at the string's beginning and end.

// Listen, trailing whitespaces: You're not welcome here! String[] words = " Surrounding spaces ".trim().split("\\s+");

The output array reads:

[ "Surrounding", "spaces" ]

Invisible characters aren't really invisible

Invisible characters like control characters can mess with splitting:

// Invisible characters trying to split this party? Not on my watch! String[] words = "Invisible\u0007characters".split("\\s+");

Here, words won't split correctly as \u0007 (bell character) isn't a space. Applying the correct pattern is a must.

Race for performance

For repeated use of complex regex patterns, Pattern class pre-compilation fetches you sweet brownie points:

// Because every tick of the clock counts Pattern whitespace = Pattern.compile("\\s+"); String[] words = whitespace.split("Optimize this");

Pattern.compile() is pre-compiled and can be reused, saving on re-compiling time each time split() is called.

Catching the edge cases

No space, no problem

A string with no spaces returns the original string in an array:

//I'm the one and only String[] words = "NoSpace".split("\\s+");

The output reads:

[ "NoSpace" ]

Null and empty strings

Null or empty strings result in an array containing the string itself or an empty array respectively.

// Hey, I might be empty, but I still exist! String[] words = "".split("\\s+");

The output reads:

[ "" ]
//Ouch, NullPointerException hurts! String[] words = null.split("\\s+"); // NullPointerException

Always handle null to avoid unhappy runtime exceptions.

Unleashing advanced splitting

Rumbling with Pattern and Matcher

For complex splitting, Matcher comes in handy with Pattern:

Pattern pattern = Pattern.compile("\\s+"); Matcher matcher = pattern.matcher("Use Pattern and Matcher"); List<String> wordsList = new ArrayList<>(); while (matcher.find()) { wordsList.add(matcher.group()); } String[] words = wordsList.toArray(new String[0]);

The splitting limit

The split function accepts a second optional parameter that dictates the maximum limit of resultant substrings:

// Because sometimes, limit is necessary! String[] words = "Split with limit".split("\\s+", 2);

The output array reads:

[ "Split", "with limit" ]

Escaping in regex

Don't forget to escape backslashes (\\) in Java regex patterns. A single backslash ordinarily points to an escape character in string literals.

It's all about tactics!

Immaculate input formatting

Assure no unwanted characters fiddle with your input string formatting:

  • Utilize trim() to deal with leading & trailing spaces.
  • Run .replace() methods to eradicate suspect undesirable characters.

Testing is key

Vigorously test different, varied input strings:

  • Play around with multiple and single spaces.
  • Fiddle with no space cases.
  • Pitch in some unicode and special characters.

Pre-compiling: The performance-enhancer

For demanding, enterprise-grade operations:

  • Pre-compile regex patterns using Pattern.compile().
  • Reuse compiled patterns for speed and efficiency.