Explain Codes LogoExplain Codes Logo

Parse an HTML string with JS

javascript
prompt-engineering
dom-manipulation
security
Anton ShumikhinbyAnton Shumikhin·Nov 3, 2024
TLDR

Parse HTML with JavaScript efficiently using DOMParser:

const parser = new DOMParser(); const doc = parser.parseFromString("<p>Hello</p>", "text/html"); console.log(doc.body.innerHTML); // "<p>Hello</p>", says hello with politeness

In this example, we're transforming an HTML string into a useable DOM structure that allows you to use all the usability features of the DOM, such as traversal and manipulation. Absolute magic! ⚡

Deeper Dive: Extract Specific Elements

Let's say you need to extract a specific HTML element, like links. Here's how you do it using querySelectorAll:

const links = doc.querySelectorAll('a'); links.forEach(link => console.log(link.href)); // console.log all the URLs you got. Who needs SEO?

Alternative parsing strategies

Dummy DOM method

There are times when one must resort to alternative strategies, like creating a dummy DOM:

const el = document.createElement('div'); el.innerHTML = "<p>HTML Ninja Technique</p>";

This would make Mr. Miyagi proud 🥷! Just be aware that using innerHTML can lead to security loopholes.

jQuery parsing

For the jQuery enthusiasts, here's how you can parse HTML:

const el = $('<div>').html("<p>jQuery, Baby!</p>"); const paragraphs = $('p', el); // All p-tags surrender unconditionally

jQuery smoothens out the bumps on the road and provides out-of-box solutions for many common problems.

DOM Range manipulation

const range = document.createRange(); const fragment = range.createContextualFragment("<p>HTML Fragment</p>");

Creates a DocumentFragment, a lightweight alternative to innerHTML. Small is beautiful.

Special considerations

Certain HTML elements like td, tr, th, etc., play by their own rules. Carefully ensure the correct context when parsing such tags.

Performance boosters and security locks

Parsing as XML in Chrome

For a potential performance boost, take a detour via XML:

const xmlString = "<custom-tag>Zigzag route</custom-tag>"; const xmlDoc = parser.parseFromString(xmlString, "application/xml");

Remember, this is an exclusive club for Chrome browsers.

Being security smart

When parsing content from untrusted sources, ensure to sanitize the HTML to prevent cross-site scripting (XSS) attacks. Libraries like DOMPurify could help.

Use of third-party libraries

For complex HTML structures, a specialized library like Florian might be your secret weapon:

const clean = Florian.sanitize("<div>My <script>evil_script</script></div>"); console.log(clean); // "<div>My </div>", the script got swept away!