How can I retrieve the page title of a webpage using Python?

python

web-scraping

beautifulsoup

selenium

byAlex Kataev·Oct 17, 2024

Get the page title pronto using Python, requests and BeautifulSoup.

from requests import get
from bs4 import BeautifulSoup

response = get('http://example.com') # Setting sail on the web ocean!
soup = BeautifulSoup(response.content, 'html.parser') 
title = soup.title.string # Bingo! Found the treasure!
print(title) # "Title: ", says the Python parrot

Be sure you've got requests and beautifulsoup4 aboard your Python ship: pip install requests beautifulsoup4.

Navigating choppy waters and sirens' songs

While diving for titles, you may face some rogues. Here's how to outwit them:

Non-golden HTML: BeautifulSoup can handle most HTML flotsam, but if it gets murky, set sail with lxml or html5lib parsers.
Shipwrecked Page Titles: Before calling .string, check if the title isn't None to avoid being caught in an AttributeError.
Hidden Treasure: For pages displaying content with JavaScript, charter the Selenium ship with a webdriver. It disguises itself as a bona fide browser. Here's a map:

from selenium import webdriver

# Captain's log: Embarking on a new journey!
browser = webdriver.Chrome()
browser.get('http://example.com')
title = browser.title
print(title) # The moment of truth!
browser.quit() # Smooth sailing!

Pirate's Code: Respect the legal and ethical guidelines of web scraping. Nobody likes a pirate!

Other pearls and mermaids

Depending on the sea monster you face, different tactics can be employed:

Flying Colors: Waved custom headers with requests to avoid the notorious Server's gaze.
Talking Parrot: Ensure the page's character is in UTF-8 so even your parrot can read exotic letters.
Mechanize Chest: The Mechanize arsenal combines BeautifulSoup artistry and browser-like firepower, making treasure hunting a breeze.

from mechanize import Browser

br = Browser() # New ship!
br.open('http://example.com')
print(br.title()) # X marks the spot!

Weathering Storms: Employ try-except blocks to gracefully navigate network squalls, HTTP monsters, and timeout typhoons.

Every pirate needs an armory

Every internet sea-teering Python-adventurer needs an armory for more advanced treasure hunts:

Spyglass: Rotate your user agents when setting flags (headers) to appear as different kinds of ships (browsers) to the prying eyes of Server's cannons.
Fast Ships: Consider aiohttp or requests-html for asynchronous voyages if you're hunting multiple treasures at once.
Keen Eyesight: Use lxml.etree for speedy map-reading (parsing) if you're navigating hundreds of treasure maps (HTML documents).
No Fool's Gold: Make sure there's only one <title> element to avoid counterfeit treasure (titles).