Explain Codes LogoExplain Codes Logo

How can I strip all punctuation from a string in JavaScript using regex?

javascript
regex
string-manipulation
unicode
Alex KataevbyAlex Kataev·Feb 11, 2025
TLDR

Eliminate unwanted punctuation from a string using the .replace() function with a wisely selected regex pattern:

const cleanString = "Example string, with punctuation!".replace(/[^\w\s]|_/g, "");

This piece of magic finds non-alphanumeric characters (punctuations) and intensely discards them, preserving only letters, numbers, and humble spaces.

Decoding the regex secret

To correctly establish your regex pattern, it's vital to ascertain what the individual character sets encapsulate:

  • \w holds hands with any word character, which includes letters, numbers, and underscores.
  • \s welcomes any whitespace character with arms wide open, including spaces, tabs, and line breaks.
  • The negated character set [^\w\s] makes it plain and simple to strip away any characters that aren't invited to the party — in our case, the punctuation.
  • The underscore _ can be politely asked to leave if you don't count it as one of your word character companions.

Embracing Unicode punctuation

When dealing with inflated strings that might include Unicode characters, an adjusted method ensures that you're not just removing ASCII punctuation but a broader set of symbols:

const cleanUnicodeString = "Example string—now with Unicode! 🎉".replace(/[\p{P}\p{S}]/gu, ""); // Sorry, party icons. You're out too!

Here, \p{P} matches any variety of Unicode punctuation, while \p{S} catches symbols. Remember the u flag signifies Unicode support, however, this needs ECMAScript 2018 or newer to function.

Squashing whitespace

Post-punctuation removal, you might stumble upon unnecessary spaces. To abridge these multiple spaces into single ones:

const singleSpacedString = cleanString.replace(/\s+/g, " "); // Because no one likes being in a crowded space, right?

Transporting beyond ASCII: dealing with Unicode

The world isn't just ASCII, so when dealing with a mix of characters you might need to include Unicode blocks reaching out to General Punctuation (\u2000-\u206F) or the Supplemental Punctuation (\u2E00-\u2E7F):

const unicodePunctuationStrippedString = text.replace(/[\u2000-\u206F\u2E00-\u2E7F]/g, "");

Preserving specific characters

In cases where you want to keep certain symbols, like single quotes for contractions or possessives. You need to tell your regex omission function to spare them:

const stringWithApostrophes = "They're avoiding the punctuation purge!".replace(/[^\w\s']|_/g, "");

Precision: masterfully remove specified characters

Sometimes, you want to keep control in your hands and list out the specific punctuation characters you want gone:

const onlySpecificPunctuationRemoved = text.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "") .replace(/\s{2,}/g, ' '); // Yes, commas, I'm looking at you.

Dancing with modern JS: ES2018 and beyond

With the evolution of JavaScript, the regex capabilities have matured significantly. Thanks to the advent of Unicode Property Escapes, it is now simpler to remove Unicode punctuation:

const modernStringSpa = "String — now fresh, Unicode-clean! 😄".replace(/[\p{P}\p{S}]/gu, ""); // Sorry, we don't serve emojis here.

This technique requires ECMAScript 2018 (ES9) or newer.

Verifying results: a must-have step

Before launching your regex aided string manipulation into your applications, it's vital to test its accuracy:

  1. Use online debuggers like regex101 to craft and refine your expressions, ensuring they match only required characters.
  2. Construct thorough unit tests to affirm your string manipulation functionality, particularly if it plays a major role in data processing.