Explain Codes LogoExplain Codes Logo

Html-encoding lost when attribute read from input field

html
html-encoding
xss-attack
domparser-api
Nikita BarsukovbyNikita Barsukov·Nov 7, 2024
TLDR

Retain an input's HTML entities by wrapping the .val() result in a jQuery <div> and fetching the .html():

var encodedValue = $('<div>').text($('#inputField').val()).html();

This stragety bypasses .val()'s automatic decoding, keeping entities like &lt; intact in encodedValue.

Encoding and Decoding Strategies

Keeping HTML-encoding intact can be a bit tricky sometimes. Here are a few techniques that cover most of the situations:

Securely Encoding with DOMParser API

The DOMParser API allows you to parse your HTML content in a secure way and reduce any XSS attack risks. Here's how to use it:

function htmlEscape(str) { //Security first, they say. var parser = new DOMParser(); var doc = parser.parseFromString(`<!doctype html><body>${str}`, 'text/html'); return doc.body.textContent; //And voila! Your encoded string }

And remember, escaping forward slashes (/) is as important as buckling up before driving!

Safely Encoding/Decoding with Textarea Elements

Creating a detached textarea can surprisingly serve as a safe haven for your text to HTML conversion:

function htmlEncode(str) { let textarea = document.createElement('textarea'); textarea.textContent = str; //This text is safe here return textarea.innerHTML; //Welcome to HTML world! } function htmlDecode(str) { let textarea = document.createElement('textarea'); textarea.innerHTML = str; //First, get comfortable with HTML return textarea.textContent; //Abracadabra! And it's text again }

You have manipulated the innerText and innerHTML properties to encode and decode HTML entities without exposing them to XSS attacks. Neat tricks, huh?

Encoding/Decoding with jQuery

Even when you're in your familiar jQuery territory, keep the user content away from your DOM for safe manipulation. Here's how to do it "the jQuery way":

function jqueryHtmlEncode(value) { //jQuery magic begins return $('<div/>').text(value).html(); } function jqueryHtmlDecode(value) { //Remember the good old spell return $('<textarea/>').html(value).text(); }

Comprehensive Encoding Challenges

On this adventure of HTML encoding, you'll come across various puzzles. Let's see how we can tackle these:

Preserving Whitespace

To preserve whitespace in attributes, we have to deal with some browsers (yes Internet Explorer, I'm looking at you 😉) that can't resist collapsing spaces.

Escaping Quotes for Integrity

Escaping quote marks (', ") within attributes is vital to maintain integrity, because, you know... sometimes data can have opinions of its own.

Saying Goodbye to Old Methods

Times change and so should our techniques. Retire the older, potentially vulnerable div method for encoding and adopt the safer alternatives we discussed above.

Benchmarking Performance

To ensure the efficiency and speed of your chosen method, validate its performance at jsperf.com. Yes, even encoding and decoding strategies need a weigh-in!

Leveraging Tools and Libraries for Encoding

Dozens of frameworks and libraries have encoding utilities. Some of my favorites are:

  • Underscore.js provides _.escape() and _.unescape().
  • Django and AngularJS, they've got your back with automatic encoding.
  • And for non-jQuery scenarios, turn to the classics - document.createElement and document.createTextNode.