Explain Codes LogoExplain Codes Logo

Remove non-alphanumeric characters from a string

javascript
regex-engineering
string-manipulation
unicode-properties
Nikita BarsukovbyNikita Barsukov·Mar 4, 2025
TLDR

Here's your quick fix:

const result = str.replace(/\W|_/g, '');

What does it do? The regex \W|_ targets all non-word characters including underscores (because we're strict like that). Now result holds your pristine, cleaned string. It's like taking a shower for your data.

You can handle poorly-formed strings with:

const safeStr = JSON.stringify(str).slice(1, -1); const cleaned = safeStr.replace(/\W|_/g, '');

JSON.stringify wraps your string in quotes and escapes potentially destructive characters. We slice off the quotes for a returned string format, then bathe it in cleanliness. It's like quarantine for your data.

Reusable recipe for special characters and malformed strings

Next, we have strings with sketchy control characters (\n, \r, \b). We stare them down like this:

const amIBeingControlled = /[\r\n\b]/g; let result = str.replace(amIBeingControlled, '').replace(/\W|_/g, '');

The control characters have been shown the door.

Handling multilingual strings? Here's a recipe that includes Unicode properties:

const universalTranslator = /[\p{L}\d]/gu; let result = str.replace(universalTranslator, '');

The u flag allows Unicode support. Here, \p{L} represents any letter from any language and \d matches digits. You're now an intergalactic linguist!

Regex toolset for advanced cleaning

Regex equips us with a toolbox for ridding of unwanted characters:

Custom regex functions for bespoke cleanliness

For unique scenarios that require more than a replace call:

function bespokeCleaner(str) { // Here's where "one-size-fits-all" goes to die return str.replace(/* capricious regex */, ''); }

Global and Unicode flags

To apply the cleaning regime to all occurrences, make sure to add g flag. For dealing with expanded character sets in Unicode strings, just add u:

const result = str.replace(/[^\p{L}\p{N}]/gu, '');

Points to take home

Escaping backslashes in escape rooms

Does your string suddenly feel like it's in an escape room? It might have escapable characters:

const result = str.replace(/\\./g, '').replace(/\W|_/g, '');

Handling suspicious strangers

Always, always, always treat misconfigured input like a sneezing, coughing stranger on your subway train:

const result = (typeof str === 'string') ? str.replace(/\W|_/g, '') : '';

Anti-regex squad tactics

Not flirtatious with regex? Well, here are alternative functions:

Custom filtering functions

Turns out, you can use filter() to cleanse your string too:

function isAlphaNumeric(char) { return char.match(/[A-Za-z0-9]/) ? true : "It's a trap!"; } const result = Array.from(str).filter(isAlphaNumeric).join('');

ASCII check, mate!

For the folks relishing old school: go ASCII:

const result = str.split('').filter((char) => { const code = char.charCodeAt(0); return (code > 47 && code < 58) || (code > 64 && code < 91) || (code > 96 && code < 123); }).join('');

Look, Ma, we cleaned the string with a nerf gun!