Scrapy with Selenium

Explore the integration between Selenium and Scrapy.

Now that we have added middleware to our stack, it is time to learn how to utilize it with Selenium.

Scrapy with dynamic sites

While Scrapy provides excellent modules for optimizing web scraping operations, it lacks built-in functionality to handle dynamic websites. To tackle this challenge, we need to integrate Selenium or another library alongside it. As we have already covered Selenium in previous lessons, we will use it in this module.

To efficiently scrape JavaScript-based websites, we will follow a three-step process:

  1. We will use Scrapy to make the initial request.

  2. We will pass this request to Selenium to load the DOM on our behalf.

  3. Finally, we will use selectors to extract the data from the fully loaded DOM.

We learned that downloader middleware are used to manipulate requests. Consequently, they are the ideal components to facilitate the sequence outlined above.

Get hands-on with 1400+ tech skills courses.