Get HTML source of WebElement in Selenium WebDriver using Python

python

selenium

webdriver

javascript

byAnton Shumikhin·Dec 8, 2024

The simplest way to extract a WebElement's HTML content in Selenium WebDriver using Python is by calling the .get_attribute('outerHTML') method:

# "ID the element, he said. Extract HTML, he said."
element = driver.find_element_by_id('example')
html_source = element.get_attribute('outerHTML')

This code snags the complete HTML of the located element - like serving the full dish, silver platter included.

Extracting code within tags

In situations where you only want the code lying betwixt the element's tags - just the meal, sans the platter - you'd utilize get_attribute('innerHTML'):

# "Why just admire the silver platter when the main course awaits?!"
inner_content = element.get_attribute('innerHTML')

This code nicely serves up all the HTML ensconced within your element's bounds.

Dynamically extracting HTML via JavaScript

Should the above flavors not tickle your tastebuds, or you're keen on a more programmatic approach, consider executing JavaScript with Selenium:

# "When all else fails, throw in a script!"
html_source = driver.execute_script("return arguments[0].outerHTML;", element)

This piece of code sparkles when traditional attribute methods fall short of expectations, especially in the face of tricky DOM structures or page-specific JavaScript monkey business.

How to handle time-sensitive content retrieval

Web elements that insist on playing hide-n-seek, popping up dynamically, can cause a hitch when you're attempting to grab their HTML. That's when you employ a wait:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# "Patience young Padawan, the HTML you seek will soon reveal itself."
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.ID, 'example')))
html_source = element.get_attribute('outerHTML')

Working smart with WebElements

If you find yourself cavorting around many WebElements, befriend the Page Object Model (POM). It's a godsend for whipping up orderly, reusable, and maintainable code.

Saving HTML for future enlightenment

Documenting the HTML of elements for a future rendezvous is simple:

# "Once discovered, never forgotten - the story of the saved HTML."
with open('element.html', 'w', encoding='utf-8') as file:
    file.write(html_source)

Presto! You now can analyze or test this HTML at your leisure.

Unmasking hidden treasures: iframes or shadow DOM

When an element coyly hides in the recesses of an iframe or a shadow DOM, you'll have to switch context to find it:

# Dealing with iframes
# "Art of element extraction: frame edition."
driver.switch_to.frame("frameID")

# Dealing with shadow DOMs:
# "Even dark shadows hold HTML secrets!"
shadow_root = driver.execute_script("return arguments[0].shadowRoot", element)
html_source = shadow_root.get_attribute('innerHTML')

Tech radar: Exploring updates and issues

Web technologies evolve faster than an avenging superhero chasing a lawbreaker. Keep a weather eye on the official Selenium documentation and the community.

Custom wait conditions for tailored scenarios

Sometimes, even patience needs a guide. Forge your own custom wait conditions when standard waits are not yielding fruitful waits...er, results.

Performance considerations when handling large HTML elements

Your HTML might turn out to be as big as The Mountain (Gregor Clegane, anyone?). With large WebElements, remember that excellent performance is as at the heart of an application as Tyrion is in Game of Thrones - indispensable.

explain-codes / Python / Get HTML source of WebElement in Selenium WebDriver using Python

Linked

Extracting text from HTML file using Python



Can I remove script tags with BeautifulSoup?



How to use HTML Agility pack



How to get HTML from a beautiful soup object



Strip HTML tags from text using plain JavaScript

