Explain Codes LogoExplain Codes Logo

Removing whitespace from strings in Java

java
regex
string-manipulation
java-8
Alex KataevbyAlex Kataev·Nov 8, 2024
TLDR

Easily remove all whitespace from a Java string:

String stripped = input.replaceAll("\\s+", "");

This string command leverages replaceAll() with the regex \\s+, ensuring consecutive whitespace — spaces, tabs, or newlines — is poof gone, leaving a string devoid of whitespace.

Comprehensive guide and other possibilities

Java strings are immutable. Modifying them essentially creates a new string, and the replaceAll() method shows this principle at work since it yields a brand-new string.

Delving into replaceAll

Double backslashes \\ in Java strings escape a single backslash:

input = input.replaceAll("\\s+", ""); // Remember: "\\s+" is our little regex champion that finds and zaps one or more whitespace characters

Tackling single spaces

Strings primarily containing single spaces? Here's an overachieving caffeinated regex snippet for you:

input = input.replaceAll("\\s", ""); // "\\s" on a mission to eliminate every space in sight, one at a time

Keep your "=" safe

Avoid \\W; it could remove important non-word characters like =. Stick with \\s+ for peace of mind, and, well, a whitespace-free string.

StringUtils for rescue

Apache Commons Lang enthusiasts can use StringUtils.deleteWhitespace:

input = StringUtils.deleteWhitespace(input); // Whitespaces check in but they don't check out

Using trim

trim(), takes care of the leading and trailing spaces only:

input = input.trim(); // trim() to the rescue, showing those pesky space invaders the exit door

Java string operations-case studies

Situation varies and so must our methods:

Complex patterns to the rescue

If faced with complex whitespace scenarios, or need to implement conditions:

input = input.replaceAll("[\\t\\n\\x0B\\f\\r]+", "");

More than just spaces

To retain letters and numbers and remove all else:

input = input.replaceAll("[^\\w\\s]", "");

Memory constraints?

If memory usage is a concern, consider iterative methods bypassing regex to save the precious memory bytes.

Regex character classes

Understanding of regex character classes is paramount:

  • \\s leads to any whitespace character(space, tab, newline).
  • \\S the villain, matching any non-whitespace character.
  • \\w the hero, matching word characters (letters, digits, and underscores).
  • \\W the anti-hero, matching non-word characters (goes against \\w).