Explain Codes LogoExplain Codes Logo

Beautifulsoup findAll() given multiple classes?

python
web-scraping
beautifulsoup
css-selectors
Nikita BarsukovbyNikita Barsukov·Jan 30, 2025
TLDR

To find elements with multiple classes in BeautifulSoup, you can use findAll() with a list of class names or select() with concatenated classes:

soup.findAll("tag", class_=["class1", "class2"]) # Fetch “tag” with either "class1" or "class2" soup.select(".class1.class2") # Fetch tags with both "class1" and "class2"

Note: Remember, in findAll(), replace "tag" with the tags you're looking for and "class1", "class2" with the target classes. The select() method looks for an intersection, not a union.

import re soup.findAll("tag", {"class": re.compile("class1|class2")}) # "tag" with either "class1" or "class2", regex way

With regex, re.compile("class1|class2") will match any element containing either class1 or class2.

Making sense of findAll()

When web scraping, it's common to filter elements that have multiple classes. BeautifulSoup provides distinct ways to handle this, catering to both OR and AND logic between classes.

Search elements with any of the given classes (OR logic)

If you want to find elements that match at least one of several classes, you can pass those classes in a list to findAll() method:

elements = soup.findAll("tag", class_=["class1", "class2"]) # "Tag" with "class1" or "class2", do your thing!

Search elements with all of the given classes (AND logic)

To match elements that contain all the specified classes, use the select() function:

elements = soup.select(".class1.class2") # "Tag" with both "class1" and "class2", why not?

Using regex for more complex searches

Regex can be used for more intricate criteria:

elements = soup.findAll("tag", {"class": re.compile("^class1.*class2$")}) # With love from regex!

The pattern ^class1.*class2$ ensures the class starts with class1 and ends with class2, allowing for dynamic values.

Preserve source order

BeautifulSoup can preserve the original order, taken from the source code, which is crucial for data integrity and understanding contexts.

Handling real-life cases

Preserving order with findAll()

Use a list of classes with findAll() to keep the original order of matched elements. Especially handy when you're dealing with tables and, like me, OCD about sequence.

Using sessions with requests for stateful scraping

When dealing with session-based sites, you can set up a session using the requests library to maintain a single session across your requests:

import requests from bs4 import BeautifulSoup session = requests.Session() response = session.get('https://example.com') soup = BeautifulSoup(response.content, 'html.parser') # soup.findAll can now surf the web without getting its feet wet

Complex cases with dynamic class names

Use Python's re module for expressing complex class-based searches:

import re dynamic_elements = soup.findAll("tag", {"class": re.compile("^class1.*class2$")}) # Regex for the win!

Precise Data Extraction

For extracting data from a specific nested class, use a loop to break your search criteria:

for parent in soup.find_all("div", class_="parent-class"): children = parent.find_all("span", class_="child-class") # This will extract all <span> tags with "child-class" within <div> parents having "parent-class".

CSS selectors

When you need to get very specific, BeautifulSoup's select() method provides fine control through CSS selectors:

articles = soup.select("div.content > p.story.story-highlight") # This gives all <p> elements with both "story" and "story-highlight" class that are direct children of <div> having "content" class.

References