Explain Codes LogoExplain Codes Logo

How can I get a character array from a string?

javascript
unicode
regex
performance
Nikita BarsukovbyNikita Barsukov·Dec 8, 2024
TLDR

To convert a string into a character array, use either Array.from(str) or the spread operator [...,str]:

let str = "Hello!"; let chars = Array.from(str); // ['H', 'e', 'l', 'l', 'o', '!'] //bye-bye stringy! // or let chars = [...str]; // ['H', 'e', 'l', 'l', 'o', '!'] //no more clingy stringy!

Beware! Unicode monsters

While working with Unicode characters in JavaScript, be vigilant. Invoking .split('') may result in disruptions when surrogate pairs come into play.

Desired output with Array.from():

let unity = "🌍🕊️"; let peace = Array.from(unity); // ['🌍', '🕊️']

Terrifying output with .split(''):

let peace = unity.split(''); // ['\uD83C', '\uDF0D', '\uD83D', '\uDD4A']

See, the .split('') mechanism fails to acknowledge surrogate pairs and disrupts the Unicode characters.

Taming Unicode with regex

To properly handle Unicode characters, employ a regular expression with the u (Unicode) flag in conjunction with split:

let unity = "🌍🕊️"; let peace = unity.split(/(?=\p{L}|\p{N}|\p{P}|\p{S})/u); // ['🌍', '🕊️']

Iterating with 'for ... of ...'

To maintain the integrity of Unicode characters, consider using a for...of loop, that inherently caters to Unicode:

let unity = "🌍🕊️"; var peace = []; for (let globe of unity) { peace.push(globe); //Peace, not pieces. }

So, peace now comfortably contains ['🌍', '🕊️'].

Check compatibility before dating ES6

Before you use cool ES6 features like the spread operator or Array.from(), ensure your JavaScript environment won't ghost you. Check the ECMAScript compatibility table or MDN for this.

Coping with complex characters using libraries

The grapheme-splitter library serves as a good companion when facing complex character sequences, like emoji flags, family emojis, etc.

let family = require('grapheme-splitter'); let splitter = new family(); let str = "👨‍👩‍👧‍👦"; let meme = splitter.splitGraphemes(str); // ['👨‍👩‍👧‍👦'] // Hello, family!

Performance tales with massive strings

When countering massive strings, Array.from and spread syntax may betray you performance-wise. In such cases, a good-old for...of loop might act as a knight in shining armor.

let heftyString = '...'; // A chunky string let charsArray = []; for (let char of heftyString) { charsArray.push(char); }

Jamming with tokens in older environments

To seamlessly work in non-ES6 environments, combine the character access method charAt with a basic loop:

var oldie = "Y2K"; var arr = []; for (var i = 0; i < oldie.length; i++) { arr.push(oldie.charAt(i)); }

Although this won't respect Unicode integrity, it promises wide support.

The peril of split('') with multi-code-unit characters

split('') treats multi-code-unit characters, like certain emojis or specific foreign language characters, as separate units, which brings undesired results.