Unescape HTML entities in JavaScript?

javascript

xss-shield

html-decoding

domparser

byNikita Barsukov·Nov 3, 2024

To swiftly decode HTML entities in JavaScript with DOM APIs, assign your string to a new element's innerHTML, then utilize textContent to grab the unescaped text.

const decodeHtml = str => {
  const el = document.createElement('div');
  el.innerHTML = str;
  return el.textContent;
};

// Example: 'Cat &amp; Dog' -> 'Cat & Dog'
console.log(decodeHtml('Cat &amp; Dog'));

Careful! Utilize this method exclusively with trusted content to dodge XSS jeopardy.

The aegis of DOMParser

While the elementary method works fine with trusted data, the equation alters when managing untrusted input. Here, security skyrockets up the priority ladder. Say hello to our friend DOMParser – engineered for preparing a safe workspace for unescaping HTML entities.

DOMParser: The XSS Shield

Unveil those HTML entities without compromising your shield against XSS onslaughts. Use DOMParser like this:

const htmlDecode = input => {
  const doc = new DOMParser().parseFromString(input, 'text/html');
  return doc.documentElement.textContent;
};

// Example usage:
console.log(htmlDecode('Cat &amp; Dog')); // Outputs "Cat & Dog" - we've got feline and canine harmony

This method ensures scripts intercepting the input get blocked and not executed, safeguarding against XSS attacks.

Browser compatibility checkpoint

DOMParser API shares good camaraderie with modern browsers (post-2017). Legacy browsers, though, might play spoilsport, lacking this functionality. The tactics depend on your audience: consider polyfills or alternative paths for such scenarios.

Security alert

Extra caution is warranted when handling HTML – a mixture of trust and mistrust. Unsuspecting HTML injections minus accurate sanitization equals risk of malicious code execution. DOMParser parses in a no-script-execution environment, delivering the invincible text.

More than meet the eye

Decoding HTML entities is more than just unembellishing strings – edge cases often add interesting twists.

Null Checks and String Length Surveillance

Inputs might bring surprises, such as null or gargantuan strings. Ensure your function is ready for the unknown:

const htmlDecode = input => {
  if(input === null) return ''; // if the input is null, we're already done 🤷‍♀️
  if(input.length > 10000) throw new Error('Input too long'); // if the string is this long, go grab a coffee ☕
  
  // Fill in the decoding logic here...
};

Library Wisdom: he

For a robust lifeline while decoding HTML entities, consider the popular library 'he' by Matthias Bynens. With he.decode, transform HTML entities into their textual counterparts:

import he from 'he';

console.log(he.decode('Cat &amp; Dog')); // Outputs "Cat & Dog". Don't you love harmony?

These libraries, updated periodically with security patches, offer a wider safety net across numerous situations.

Dealing with untrusted HTML: Sanitize!

When the task is to handle untrusted HTML, always sanitize after decoding to ensure malicious scripts don't spark execution fiestas when the content hits the render party.

All Geared Up for Security

While DOMParser is undoubtedly more secure, remember: Consistency is key. Continuous monitoring and testing help keep any predators, chanced by browser updates that could impact the decoding process, at bay.

Continuous Learning & Vigilance

Rely on MDN Web Docs and W3C draft specification for updated insights. Maintaining this vigilance ensures your code is armed against evolving security concerns.