Solution Review: Scrape Top Indices Data from Yahoo Finance

Review the solution for extracting market data.

We'll cover the following

Solution approach

  • Get the top 3 indices from the main URL.

  • Extract the URL for the indices.

  • Loop through them and visit each URL.

  • Find the rows' elements and scroll to get the required elements.

  • Get the requested data from each row.

Press + to interact
Inspecting the DOM structure of an index table
Inspecting the DOM structure of an index table
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException, ElementNotInteractableException
from selenium.webdriver.support.wait import WebDriverWait


def scrape():  
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get("https://finance.yahoo.com/world-indices/")
    data = []
    top_3 = driver.find_elements(By.CSS_SELECTOR,
                                "tr > td:nth-child(1) > span > div > a")
    top_3_links = [x.get_attribute("href") for x in top_3[:3]]
    for link in top_3_links:
        driver.get(link+"/history")
        rows = []
        rows=driver.find_elements(By.CSS_SELECTOR, "table > tbody > tr")
        for row in rows[:50]:
            d = {"date": row.find_element(By.CSS_SELECTOR,'td:nth-child(1)').text,
                 "open": row.find_element(By.CSS_SELECTOR,'td:nth-child(2)').text,
                 "close": row.find_element(By.CSS_SELECTOR,'td:nth-child(5)').text}
            data.append(d)
    driver.close()
    return data

output = scrape()

print("len of scrapped items: ", len(output))
print("Output sample: ", output[0])

Solution code

Code explanation

  • Lines 8–10: The driver is initialized with the main URL.

  • Lines 11–13: We find the top three indices and get their URLs.

  • Line 15: Since we know the history URL, we just append it to each index URL directly.

  • Lines 17–19: Within the while loop, we first get the rendered rows and then scroll to get more data.

  • Lines 20–23: Finally, we get the data from each row limiting to the first 50 rows.

Get hands-on with 1300+ tech skills courses.