CSS (Cascading Style Sheets) selectors are patterns used to select and target specific HTML elements on a web page for styling and manipulation. CSS Selectors allow us to apply styles selectively to specific elements based on various criteria such as element type, class, ID, attributes, and their relationships with other elements in the HTML structure.

Types of CSS selectors

There are several types of selectors in CSS. Here is a categorized list of CSS selectors for Python web scraping:

  • Basic selectors

  • Attribute selectors

  • Relationship selectors

  • Advanced selectors

  • Combination and grouping selectors

Press + to interact
CSS selectors for Python web scraping
CSS selectors for Python web scraping

Basic selectors

Basic CSS selectors refer to fundamental patterns that target and extract specific HTML elements from a web page. These selectors are essential for locating and manipulating elements during web scraping tasks. Here are some basic CSS selectors:

Name

Definition

Example

Explanation

Tag Selectors


Class selectors selects the elements by their CSS class name.

elements = soup.select('p')

Select all <p> elements

Class Selectors


Class selectors selects the elements by their CSS class name.

elements = soup.select('.highlight')

Select all elements with the class "highlight"


ID Selectors


ID selectors selects the elements by their HTML ID attribute.

element = soup.select('#header')

Select the element with the ID "header"


Attribute selectors

Attribute CSS selectors are patterns used to select HTML elements based on their attributes and attribute values. These selectors are valuable for targeting elements with specific characteristics or properties. Here is the list of some commonly used attribute selectors:

Name

Definition

Example

Explanantion

Attribute Selector


Selects elements based on their attributes.

elements = soup.select('[data-info]')


Select all elements with a data attribute named "info"


Attribute Value Selectors

Selects elements with specific attribute values.

input_elements = soup.select('input[type="text"]')

Select all <input> elements with the attribute type="text"

Attribute Ends With Selector

Selects elements with attributes that end with a specific value.

pdf_links = soup.select('[href$=".pdf"]')

Select all elements with an href attribute ending in ".pdf"

Relationship selectors

Relationship CSS selectors refer to patterns used to select HTML elements based on their relationships or positions relative to other elements in the HTML document’s structure. These selectors allow us to target elements that have a specific connection or position in relation to other elements. There are a few commonly used relationship selectors are:

Name

Definition

Example

Explanantion

Descendent Selectors


Selects elements that are descendants of other elements.

links_inside_div = soup.select('div a')

Select all <a> elements within a <div>

Child Selectors

Selects elements that are direct children of other elements.

list_items = soup.select('ul > li')

Select all <li> elements that are direct children of <ul>

Adjacent Sibling Selectors



Selects elements that are siblings and come immediately after other elements.

paragraphs_after_h2 = soup.select('h2 + p')


Select all <p> elements that come immediately after <h2>

Advanced selectors

These selectors go beyond basic and relationship selectors to provide fine-grained control over element selection. Advanced CSS selectors can target elements based on various criteria, including attributes, states, and patterns within the HTML structure. Some examples of advanced CSS selectors include:

Name

Definition

Example

Explanantion

Pseudo-Class Selectors

Selects parts of an element's content.

hovered_links = soup.select('a:hover')

Select all <a> elements when hovered over

Not Selector

Selects elements that do not match a given selector.

divs_without_ignore_class = soup.select('div:not(.ignore)')

Select all <div> elements that do not have the class "ignore"

Nth-child Selector

Selects elements based on their position within a parent element.

third_list_item = soup.select('ul li:nth-child(3)')

Select the third <li> element within an <ul>

Combination and grouping selectors

Combination and grouping CSS selectors involve techniques to combine multiple selectors or group them together to target HTML elements more effectively. Here are two commonly used combination and grouping selectors:

Name

Definition

Example

Explanantion

Multiple Selectors

Combines multiple selectors to make complex queries.

headings = soup.select('div.article h2, div.sidebar h2')

Select <h2> elements within <div> with class "article" and "sidebar"

Grouping Selectors

Groups selectors together to apply the same styles or actions to multiple elements.

headings = soup.select('h1, h2')

Select all <h1> and <h2> elements


CSS selectors in developer tools

While learning CSS Selectors patterns for our scraping objectives is essential, there is an easier way to get the correct pattern for an element. Hover over any element in the browser inspection tool and click "Copy > Copy selector" (or "CSS Selector" in Firefox).

Press + to interact
Getting the CSS selector pattern for HTML element
Getting the CSS selector pattern for HTML element

The result of the above image will be:

Press + to interact
#default>div>div>div>div>section>div:nth-child(2)>ol>li:nth-child(1)>article>div.image_container>a>img

Note: It may often result in lengthy paths to access specific elements within the DOM.

Importance of CSS selectors

To achieve our scraping goals, becoming proficient in CSS selectors is crucial. Therefore, as we advance, we will make a concerted effort to utilize them extensively, to gain familiarity with a wide range of patterns. CSS Selectors are like Python regex expressions that provide a one-line syntax to do the job of multiple for loops, which helps simplify the code and make advanced queries for our DOM elements.

Feature

CSS Selectors

BeatifulSoup

Simplicity and Readability

CSS selectors offer a more straightforward and concise syntax for selecting HTML elements.

With Beautiful Soup, we typically need to use Python functions and methods to navigate the HTML structure, which can sometimes result in longer and more complex code.

Efficiency

CSS selectors are usually faster in terms of performance compared to BeautifulSoup, especially when dealing with large HTML documents or scraping multiple pages.

With Beautiful Soup, it is usually slower in terms of performance.

Let's explore the CSS Selectors by considering the following HTML document:

Press + to interact
<div class='class1'>
<h1> Product1 </h1>
<div>
<span> User1
<p> review1 </p>
</span>
<span> User2
<p> review2 </p>
</span>
<span> User3
<p> review3 </p>
</span>
</div>
</div>
<div class='class1'>...</div>
<div class='class1'>...</div>
<div class='class1'>...</div>

If we want to scrape all the reviews and have it in a list, with Beautiful Soup we can do it like this:

Press + to interact
reviews = []
products = soup.find_all("div", {"class":"class1"})
for product in products:
users = product.find("div").find_all("span")
for user in users:
reviews.append(user.p)

With CSS Selector we can use only one line to get the same list and remove all the for loops. It selects all the p elements with <span> as a parent and <div> as a grandparent and <div> with class class1 as a grand grandparent.

Press + to interact
# select() is used to apply CSS selector pattern and get all the matches
soup.select("div.class1 > div > span > p")

Note: In addition to using .select(), we have the option to employ .select_one() to retrieve only the first match.

Data scraping using CSS selectors

Let's scrape the data from the Quotes to Scrape website using CSS Selectors:

Press + to interact
import requests
from bs4 import BeautifulSoup
# maintain the main URL to use when joining page url
base_url = "https://quotes.toscrape.com"
all_quotes = []
all_authors =[]
all_tags = set()
def scrape(url):
"""
request on the URL, get the quotes, find the next page info, recurse.
"""
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
quotes = [x.string for x in soup.select("div.quote span.text")]
authors = [x.string for x in soup.select("small.author")]
tags = [x.string for x in soup.select("a.tag")]
all_quotes.extend(quotes)
all_authors.extend(authors)
all_tags.update(tags)
next_page = soup.select_one("ul.pager > li.next")
# check if we reached the last page or not.
if next_page:
# join the main url with the page sub url
# ex: "https://quotes.toscrape.com" + "/page/2/"
next_page_url = requests.compat.urljoin(base_url, next_page.a['href'])
scrape(next_page_url)
return
scrape(base_url)
print("Total quotes scraped: ", len(all_quotes))
print("Total authors scraped: ", len(all_authors))
print("Total tags scraped: ", len(all_tags))
  • Lines 4–14: Initializing the URL and the output lists, then transforming the URL response using Beautiful Soup.

  • Line 16: We get the quotes by finding all the <span> elements with the text class that are inside <div[class="quote"]> using the pattern div.quote span.text.

  • Line 17: We find all the authors' names by finding all the <small> elements with the author class using the pattern small.author.

  • Line 18: We get all the tags from the <a> elements with the tag class using the pattern a.tag.

  • Lines 24–30: Then we get the next page URL by finding <li> item with the next class whose parent is an <ul> element with ul.pager > li.next. Lastly, we request each URL and repeat the process.

Try it yourself

Scrape the data from Books to Scrape website using CSS Selectors:

Press + to interact
import requests
from requests.compat import urljoin
from bs4 import BeautifulSoup
base_url = "https://books.toscrape.com/"
response = requests.get(base_url)
soup = BeautifulSoup(response.content, 'html.parser')
#ToDo
#fill with the right CSS pattern and the code will test the output using this pattern.
titles_pattern = ""
images_pattern = ""
rates_pattern = ""
prices_pattern = ""

Conclusion

This lesson delved into CSS selectors and how they are helpful for our objectives. Until we learn about other different patterns, we will make CSS selectors our standard method for searching the DOM.

Get hands-on with 1300+ tech skills courses.