CSS Selectors
Learn about CSS selectors and examine a demonstration of their usage.
CSS (Cascading Style Sheets) selectors are patterns used to select and target specific HTML elements on a web page for styling and manipulation. CSS Selectors allow us to apply styles selectively to specific elements based on various criteria such as element type, class, ID, attributes, and their relationships with other elements in the HTML structure.
Types of CSS selectors
There are several types of selectors in CSS. Here is a categorized list of CSS selectors for Python web scraping:
Basic selectors
Attribute selectors
Relationship selectors
Advanced selectors
Combination and grouping selectors
Basic selectors
Basic CSS selectors refer to fundamental patterns that target and extract specific HTML elements from a web page. These selectors are essential for locating and manipulating elements during web scraping tasks. Here are some basic CSS selectors:
Name | Definition | Example | Explanation |
Tag Selectors | Class selectors selects the elements by their CSS class name. |
| Select all |
Class Selectors | Class selectors selects the elements by their CSS class name. |
| Select all elements with the class |
ID Selectors | ID selectors selects the elements by their HTML ID attribute. |
| Select the element with the ID |
Attribute selectors
Attribute CSS selectors are patterns used to select HTML elements based on their attributes and attribute values. These selectors are valuable for targeting elements with specific characteristics or properties. Here is the list of some commonly used attribute selectors:
Name | Definition | Example | Explanantion |
Attribute Selector | Selects elements based on their attributes. |
| Select all elements with a data attribute named |
Attribute Value Selectors | Selects elements with specific attribute values. |
| Select all |
Attribute Ends With Selector | Selects elements with attributes that end with a specific value. |
| Select all elements with an href attribute ending in |
Relationship selectors
Relationship CSS selectors refer to patterns used to select HTML elements based on their relationships or positions relative to other elements in the HTML document’s structure. These selectors allow us to target elements that have a specific connection or position in relation to other elements. There are a few commonly used relationship selectors are:
Name | Definition | Example | Explanantion |
Descendent Selectors | Selects elements that are descendants of other elements. |
| Select all |
Child Selectors | Selects elements that are direct children of other elements. |
| Select all |
Adjacent Sibling Selectors | Selects elements that are siblings and come immediately after other elements. |
| Select all |
Advanced selectors
These selectors go beyond basic and relationship selectors to provide fine-grained control over element selection. Advanced CSS selectors can target elements based on various criteria, including attributes, states, and patterns within the HTML structure. Some examples of advanced CSS selectors include:
Name | Definition | Example | Explanantion |
Pseudo-Class Selectors | Selects parts of an element's content. |
| Select all |
Not Selector | Selects elements that do not match a given selector. |
| Select all |
Nth-child Selector | Selects elements based on their position within a parent element. |
| Select the third |
Name | Definition | Example | Explanantion |
Multiple Selectors | Combines multiple selectors to make complex queries. |
| Select |
Grouping Selectors | Groups selectors together to apply the same styles or actions to multiple elements. |
| Select all |
CSS selectors in developer tools
While learning CSS Selectors patterns for our scraping objectives is essential, there is an easier way to get the correct pattern for an element. Hover over any element in the browser inspection tool and click "Copy > Copy selector" (or "CSS Selector" in Firefox).
The result of the above image will be:
#default>div>div>div>div>section>div:nth-child(2)>ol>li:nth-child(1)>article>div.image_container>a>img
Note: It may often result in lengthy paths to access specific elements within the DOM.
Importance of CSS selectors
To achieve our scraping goals, becoming proficient in CSS selectors is crucial. Therefore, as we advance, we will make a concerted effort to utilize them extensively, to gain familiarity with a wide range of patterns. CSS Selectors are like Python regex
expressions that provide a one-line syntax to do the job of multiple for
loops, which helps simplify the code and make advanced queries for our DOM elements.
Feature | CSS Selectors | BeatifulSoup |
Simplicity and Readability | CSS selectors offer a more straightforward and concise syntax for selecting HTML elements. | With Beautiful Soup, we typically need to use Python functions and methods to navigate the HTML structure, which can sometimes result in longer and more complex code. |
Efficiency | CSS selectors are usually faster in terms of performance compared to BeautifulSoup, especially when dealing with large HTML documents or scraping multiple pages. | With Beautiful Soup, it is usually slower in terms of performance. |
Let's explore the CSS Selectors by considering the following HTML document:
<div class='class1'><h1> Product1 </h1><div><span> User1<p> review1 </p></span><span> User2<p> review2 </p></span><span> User3<p> review3 </p></span></div></div><div class='class1'>...</div><div class='class1'>...</div><div class='class1'>...</div>
If we want to scrape all the reviews and have it in a list, with Beautiful Soup we can do it like this:
reviews = []products = soup.find_all("div", {"class":"class1"})for product in products:users = product.find("div").find_all("span")for user in users:reviews.append(user.p)
With CSS Selector we can use only one line to get the same list and remove all the for loops. It selects all the p
elements with <span>
as a parent and <div>
as a grandparent and <div>
with class class1
as a grand grandparent.
# select() is used to apply CSS selector pattern and get all the matchessoup.select("div.class1 > div > span > p")
Note: In addition to using
.select()
, we have the option to employ.select_one()
to retrieve only the first match.
Data scraping using CSS selectors
Let's scrape the data from the Quotes to Scrape website using CSS Selectors:
import requestsfrom bs4 import BeautifulSoup# maintain the main URL to use when joining page urlbase_url = "https://quotes.toscrape.com"all_quotes = []all_authors =[]all_tags = set()def scrape(url):"""request on the URL, get the quotes, find the next page info, recurse."""response = requests.get(url)soup = BeautifulSoup(response.content, 'html.parser')quotes = [x.string for x in soup.select("div.quote span.text")]authors = [x.string for x in soup.select("small.author")]tags = [x.string for x in soup.select("a.tag")]all_quotes.extend(quotes)all_authors.extend(authors)all_tags.update(tags)next_page = soup.select_one("ul.pager > li.next")# check if we reached the last page or not.if next_page:# join the main url with the page sub url# ex: "https://quotes.toscrape.com" + "/page/2/"next_page_url = requests.compat.urljoin(base_url, next_page.a['href'])scrape(next_page_url)returnscrape(base_url)print("Total quotes scraped: ", len(all_quotes))print("Total authors scraped: ", len(all_authors))print("Total tags scraped: ", len(all_tags))
Lines 4–14: Initializing the URL and the output lists, then transforming the URL response using Beautiful Soup.
Line 16: We get the quotes by finding all the
<span>
elements with thetext
class that are inside<div[class="quote"]>
using the patterndiv.quote span.text
.Line 17: We find all the authors' names by finding all the
<small>
elements with theauthor
class using the patternsmall.author
.Line 18: We get all the tags from the
<a>
elements with thetag
class using the patterna.tag
.Lines 24–30: Then we get the next page URL by finding
<li>
item with thenext
class whose parent is an<ul>
element withul.pager > li.next
. Lastly, we request each URL and repeat the process.
Try it yourself
Scrape the data from Books to Scrape website using CSS Selectors:
import requestsfrom requests.compat import urljoinfrom bs4 import BeautifulSoupbase_url = "https://books.toscrape.com/"response = requests.get(base_url)soup = BeautifulSoup(response.content, 'html.parser')#ToDo#fill with the right CSS pattern and the code will test the output using this pattern.titles_pattern = ""images_pattern = ""rates_pattern = ""prices_pattern = ""
Conclusion
This lesson delved into CSS selectors and how they are helpful for our objectives. Until we learn about other different patterns, we will make CSS selectors our standard method for searching the DOM.
Get hands-on with 1300+ tech skills courses.