Solution Review: Scrape Top Indices Data from Yahoo Finance
Review the solution for extracting market data.
We'll cover the following
Solution approach
Get the top 3 indices from the main URL.
Extract the URL for the indices.
Loop through them and visit each URL.
Find the rows' elements and scroll to get the required elements.
Get the requested data from each row.
from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException, NoSuchElementException, ElementNotInteractableException from selenium.webdriver.support.wait import WebDriverWait def scrape(): driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options) driver.get("https://finance.yahoo.com/world-indices/") data = [] top_3 = driver.find_elements(By.CSS_SELECTOR, "tr > td:nth-child(1) > span > div > a") top_3_links = [x.get_attribute("href") for x in top_3[:3]] for link in top_3_links: driver.get(link+"/history") rows = [] rows=driver.find_elements(By.CSS_SELECTOR, "table > tbody > tr") for row in rows[:50]: d = {"date": row.find_element(By.CSS_SELECTOR,'td:nth-child(1)').text, "open": row.find_element(By.CSS_SELECTOR,'td:nth-child(2)').text, "close": row.find_element(By.CSS_SELECTOR,'td:nth-child(5)').text} data.append(d) driver.close() return data output = scrape() print("len of scrapped items: ", len(output)) print("Output sample: ", output[0])
Solution code
Code explanation
Lines 8–10: The driver is initialized with the main URL.
Lines 11–13: We find the top three indices and get their URLs.
Line 15: Since we know the history URL, we just append it to each index URL directly.
Lines 17–19: Within the
while
loop, we first get the rendered rows and then scroll to get more data.Lines 20–23: Finally, we get the data from each row limiting to the first 50 rows.
Get hands-on with 1300+ tech skills courses.