Explain Codes LogoExplain Codes Logo

Capitalize words in string

javascript
regex
string-manipulation
performance
Nikita BarsukovbyNikita Barsukov·Sep 12, 2024
TLDR

To capitalize each word in a string, utilize .replace() with a regular expression:

const capitalizeWords = str => str.replace(/\b\w/g, c => c.toUpperCase()); console.log(capitalizeWords('hello world')); // Hello World

Here, the beginning of each word (\b\w) is targeted and swapped to uppercase (c.toUpperCase()), resulting in a string where every word is inaugurated by an uppercase letter.

Capitalization with edge cases

The fast answer provides a quick solution for the common cases, but let's toughen it up to handle some edge cases that include punctuation, special characters, and international symbols.

Sailing through punctuation and special characters

When tussling with punctuation or braces, the \b\w might lose balance. To circumvent this, let's refine the regex:

const capitalizeWords = str => str.replace(/(?:^\w|[A-Z]|\b\w|\s+\w)/g, c => c.toUpperCase()); console.log(capitalizeWords("he's a hero (or maybe not)")); // He's A Hero (Or Maybe Not)

Non-capturing groups (?:...) are used, adding conditions for uppercase letters to conserve the original capitalization.

Brushing up on national symbols

Characters outside the basic ASCII set might be left in the dust due to typical regex patterns. Let's offer a ride to the non-ASCII characters:

const capitalizeWords = str => str.normalize("NFD").replace(/(?:\b\p{L})/gu, c => c.toUpperCase()).normalize("NFC"); console.log(capitalizeWords('ßpinåch ømelet')); // ßpinåch Ømelet

By matching any kind of letter from any language, \p{L} is added to our regex. normalize() helps us retain the system of accented symbols.

Improving performance and versatility

We aim for our solution to be capable of juggling strings of diverse lengths and robust enough to catch all types of edge cases.

Standardizing to lowercase

Before capitalization, it's a good practice to guide the string to a lowercase path:

const capitalizeWords = str => str.toLowerCase().replace(/\b\w/g, c => c.toUpperCase()); console.log(capitalizeWords("QUICK BROWN FOX")); // Quick Brown Fox

By utilizing toLowerCase(), we establish a uniform base on which we will apply toUpperCase() for each word.

A slice of Map and Join

If you're more into Map and Join methods and giving regex a hard pass, check this method out:

const capitalizeWords = str => str.toLowerCase().split(' ').map(word => word.charAt(0).toUpperCase() + word.slice(1)).join(' '); console.log(capitalizeWords("the quick brown fox")); // The Quick Brown Fox

Despite being a bit lax on the performance aspect for long strings, this method is high on code clarity.

Building a robust capitalization function

Better than a one-trick pony, we strive to make our capitalization function versatile and a team player in an existing codebase. Here's how to do that:

function capitalizeWords(str, preserveCapitals = false) { const lower = preserveCapitals ? str : str.toLowerCase(); return lower.replace(/\b\w/g, c => c.toUpperCase()); }

This function gives the pliability to conserve existing capitalization when required.

Going the extra mile

There's no such thing as unnecessary information when we are crafting our code implementations. Let's tackle a few more issues that might spring up when you least expect them.

Spaces, the final frontier

Dealing with a pesky non-breaking space?

const capitalizeWords = str => str.replace(/\b[\w\u00A0]/g, c => c.toUpperCase());

By adding the non-breaking space character (\u00A0) to our regex, we assure no space is left behind!

Performance on steroids

For folks who geek out over performance, it's important to remember that the replace() method with regex might consume precious milliseconds on lengthy strings. Pre-processing with toLowerCase() and then applying the pattern might add unnecessary bulk to your runtime.

Befriending language quirks

Don't let language-specific characters like ß from German get lost in translation:

str.replace(/(\b[a-zäöüß])/g, c => c.toUpperCase());

This regex pattern accommodates German umlauts ä, ö, ü, and ß, as an example of how to adjust for language-specific characters.

Additional tricks and thorny areas

Stone-cold strings

Don't forget that JavaScript strings are immutable. Any function claiming to modify your string is only handing you a shiny new string!

When leveraging modern regex features, make sure your runtime environment is not stuck in the past and recognizes these latest updates.

In language, we trust

Working on an application that caters to a specific language or region? Consider using locale-specific methods like toLocaleUpperCase(). These methods provide precise behaviour for specific locales.