Explain Codes LogoExplain Codes Logo

Replacing all non-alphanumeric characters with empty strings

java
regex
string-manipulation
java-8
Nikita BarsukovbyNikita Barsukov·Oct 20, 2024
TLDR

To quickly strip non-alphanumeric characters in Java, the String.replaceAll method comes in handy. Simply use the regex pattern "[^\\p{Alnum}]":

// Remove all those pesky non-alphanumeric characters with one quick sweep String result = inputString.replaceAll("[^\\p{Alnum}]", "");

Poof! This efficient piece of code ensures any character in inputString that isn't a letter or a digit gets vanished.

Understanding regex for replaceAll

A solid grasp on regex (regular expressions) makes string manipulation in Java a breeze. It's all about specifying the correct patterns.

Predefined character classes: the regex cheat code

The \p{Alnum} in your regex is like a cheat code. It's a predefined character class indicating any alphanumeric character.

So remember:

  • \p{Alnum}: alpha...what now? Oh, means it matches any alphanumeric character.
  • \p{Alpha}: for the alphabet soup, matches any alphabetic character.
  • \p{Digit}: for the number crunchers, matches any digit.

White spaces – Handle with care

Ever want to keep the spaces while showing the door to other non-alphanumeric characters? Remember to add space inside the character class:

// Why remove spaces when you can...give them a space? String result = inputString.replaceAll("[^A-Za-z0-9 ]", "");

Avoid the quote-quote situation

Escaping special characters in regex — easier said than done, right? It's all about using double backslashes \\ to make your escape in Java.

And please, don't get quotes stuck inside the regex — they aren't great fans of being trapped:

Incorrect:

// Quotes trapped inside regex — NOT COOL! inputString.replaceAll("[^\\\"A-Za-z0-9\\\"]", "");

Correct:

// Look ma, no unnecessary literals! inputString.replaceAll("[^A-Za-z0-9]", "");

Meeting advanced needs

For advanced or unique string manipulation needs, customized patterns can be your best buddy.

Don't want to lose specific characters

Want to target specific characters, like punctuation? Just define a tighter pattern:

// A punctuation purge String result = inputString.replaceAll("[,.!:;?]", "");

The languages of the world

To accommodate non-English characters, remember, A-Za-z0-9 might not cut it. You'd want to go for Unicode properties:

// A language-friendly version String result = inputString.replaceAll("[^\\p{IsAlphabetic}\\p{IsDigit}]", "");

Is \p{Alnum} a friend or foe?

Beware! \p{Alnum} seems helpful, but may wreak havoc when it comes to localization.

In localized apps, you'd probably need to define your own character classes or use Unicode scripts.

The complex patterns: All superheroes combined

There will be times when you'd need a complex regex that uses several character classes. But hey, Rome wasn't built in a day:

// Step by step for the win String complexPattern = "[^a-zA-Z0-9\\p{L}\\p{Nd}]";

This Bingo combo here covers Latin letters, digits, Unicode letters (\p{L}), and decimal digits (\p{Nd}) all in one.