How to get the pure text without HTML element using JavaScript?
Fetch text stripped of HTML by leveraging textContent for full content or innerText for visible content:
The textContent works best for large scale extraction, whereas innerText is ideal when dealing with visible, styled text.
Explaining textContent and innerText
Even though both properties can help extract the text within nodes, they operate differently:
-
textContentis the blunt tool you reach for when you need all visible and invisible text content, completely ignoring styling or hidden elements. -
innerText, on the other hand, is the discerning butler, only fetching text from elements that are displayed on the webpage.innerTextmimics how the text would look if a user manually copied it from the page.
Selection of target elements
Correctly identifying your targets is key. Avoid id overload and be accurate in targeting:
- Utilize
document.getElementById('yourElementId')to zero in on a specific element. - Use
document.querySelector('selector')for complex CSS selectors to triangulate your target.
HTML tags, begone!
When dealing with innerHTML that has nested HTML tags:
You can strip HTML tags by enforcing the replace() method with a Regular Expression:
Implement event listener for text extraction
Attach event listeners to elements (like buttons) to trigger your text extraction. This elevates the user experience:
Mastering child nodes
For cluttered DOM trees, learn to recurse or use Node.childNodes to gather text from nested elements:
Simplification with jQuery
If jQuery is included in your project, bingo! Text extraction becomes a cake walk:
Storing your spoils
Store the extracted text in a variable for future use or manipulation:
Cross-platform considerations
Take a moment to verify browser compatibility before settling on a property. innerText might get the cold shoulder in some older browsers, unlike textContent, which is the life of the party, with wider support.
Balancing innerText and textContent
When life gives you two choices, consider your requirements:
- Use
innerTextto keep the readability intact, it emulates how a human would copy text from a webpage. - Choose
textContentwhen raw data is the focus, obliterating any need for visual formatting.
Parameters for tailored space handling
When extracting text, you might want to handle whitespace or newline characters:
Checking expected outcomes
Better over-test than underwhelm. When using innerText, verify that the text's appearance is as expected, given its dependence on CSS.
Was this article helpful?