Our Use Case

Learn about the dataset and libraries that we will be using throughout the rest of the course.

Climatological data

For the rest of the course, we’ll use a specific dataset to illustrate the concepts that we will see. This doesn’t mean that we'll not see applications for other domains or datasets, but we will find that the narrative makes more sense when we’ve got a particular use case in mind. All code snippets in the course can be run within the platform.

We’ll use the dataset from the National Centers for Environmental Information (NCEI), a division of the National Oceanic and Atmospheric Administration of the United States government. The mission of the NCEI is to manage and provide access to environmental data for researchers, government authorities, and private sector organizations. The NCEI has one of the largest archives of atmospheric, coastal, geophysical, and oceanic data in the world, which we can explore on their website.

Our dataset consists of local climatological data from San Francisco County for the period January 1, 2021 to December 31, 2021. The instructions to download it are in the Download the Temperatures Data lesson in the “Appendix” chapter.

Transform

Take some time to explore the above dataset. It is relatively complex, mainly because it aggregates information from various sources. If we look at the REPORT_TYPE column, we’ll see that there are three reports: SOD, SOM, and FM-15. Columns tend to contain information for just one type of report, which results in a very sparse dataset. Looking at column names, we will see that they contain information about average temperature, precipitation, humidity, wind speed, and other variables on local weather at different levels of consolidation, such as monthly and daily. Some columns also represent variations across different time windows.

The code below shows how to read the data:

Get hands-on with 1200+ tech skills courses.