How to get the pure text without HTML element using JavaScript?
Fetch text stripped of HTML by leveraging textContent
for full content or innerText
for visible content:
The textContent
works best for large scale extraction, whereas innerText
is ideal when dealing with visible, styled text.
Explaining textContent
and innerText
Even though both properties can help extract the text within nodes, they operate differently:
-
textContent
is the blunt tool you reach for when you need all visible and invisible text content, completely ignoring styling or hidden elements. -
innerText
, on the other hand, is the discerning butler, only fetching text from elements that are displayed on the webpage.innerText
mimics how the text would look if a user manually copied it from the page.
Selection of target elements
Correctly identifying your targets is key. Avoid id
overload and be accurate in targeting:
- Utilize
document.getElementById('yourElementId')
to zero in on a specific element. - Use
document.querySelector('selector')
for complex CSS selectors to triangulate your target.
HTML tags, begone!
When dealing with innerHTML
that has nested HTML tags:
You can strip HTML tags by enforcing the replace()
method with a Regular Expression:
Implement event listener for text extraction
Attach event listeners to elements (like buttons) to trigger your text extraction. This elevates the user experience:
Mastering child nodes
For cluttered DOM trees, learn to recurse or use Node.childNodes
to gather text from nested elements:
Simplification with jQuery
If jQuery is included in your project, bingo! Text extraction becomes a cake walk:
Storing your spoils
Store the extracted text in a variable for future use or manipulation:
Cross-platform considerations
Take a moment to verify browser compatibility before settling on a property. innerText
might get the cold shoulder in some older browsers, unlike textContent
, which is the life of the party, with wider support.
Balancing innerText
and textContent
When life gives you two choices, consider your requirements:
- Use
innerText
to keep the readability intact, it emulates how a human would copy text from a webpage. - Choose
textContent
when raw data is the focus, obliterating any need for visual formatting.
Parameters for tailored space handling
When extracting text, you might want to handle whitespace or newline characters:
Checking expected outcomes
Better over-test than underwhelm. When using innerText
, verify that the text's appearance is as expected, given its dependence on CSS.
Was this article helpful?