Remove non-alphanumeric characters from a string
Here's your quick fix:
What does it do? The regex \W|_
targets all non-word characters including underscores (because we're strict like that). Now result
holds your pristine, cleaned string. It's like taking a shower for your data.
You can handle poorly-formed strings with:
JSON.stringify
wraps your string in quotes and escapes potentially destructive characters. We slice off the quotes for a returned string format, then bathe it in cleanliness. It's like quarantine for your data.
Reusable recipe for special characters and malformed strings
Next, we have strings with sketchy control characters (\n
, \r
, \b
). We stare them down like this:
The control characters have been shown the door.
Handling multilingual strings? Here's a recipe that includes Unicode properties:
The u
flag allows Unicode support. Here, \p{L}
represents any letter from any language and \d
matches digits. You're now an intergalactic linguist!
Regex toolset for advanced cleaning
Regex equips us with a toolbox for ridding of unwanted characters:
Custom regex functions for bespoke cleanliness
For unique scenarios that require more than a replace call:
Global and Unicode flags
To apply the cleaning regime to all occurrences, make sure to add g
flag. For dealing with expanded character sets in Unicode strings, just add u
:
Points to take home
Escaping backslashes in escape rooms
Does your string suddenly feel like it's in an escape room? It might have escapable characters:
Handling suspicious strangers
Always, always, always treat misconfigured input like a sneezing, coughing stranger on your subway train:
Anti-regex squad tactics
Not flirtatious with regex? Well, here are alternative functions:
Custom filtering functions
Turns out, you can use filter()
to cleanse your string too:
ASCII check, mate!
For the folks relishing old school: go ASCII:
Look, Ma, we cleaned the string with a nerf gun!
Was this article helpful?