Getting Started

Get an overview of the course’s content, along with its intended audience.

Overview

These are exciting times in statistics and data science education. However, it can also feel a bit overwhelming to stay on top of all the new statistical and technological innovations. This course is a valuable intro to stats because:

  • It provides learners with an experience of the whole data analysis line.

  • It incorporates contemporary, user-friendly R packages directly into the text.

  • It emphasizes models that prepare learners for our multivariate world.

We might think of statistics as just being a bunch of numbers. We commonly hear the word “statistician” when listening to broadcasts of sports events. Statistics, and in particular, data analysis, in addition to describing numbers, like baseball batting averages, plays a vital role in all of the sciences.

We’ll commonly hear the phrase “statistically significant” in the media, and similarly, we’ll see articles that say, “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this course, we’ll be able to understand better whether these claims should be trusted or whether we should be wary of them.

There are many subfields inside data analysis that we’ll discuss in this course. For example:

  • Data collection

  • Data wrangling

  • Data visualization

  • Data modeling

  • Inference

  • Correlation and regression

  • Interpretation of results

  • Data communication/storytelling

These subfields are summarized in what Grolemund and Wickham have previously termed the “data/science pipeline”:

Press + to interact
Data/science pipeline
Data/science pipeline

Prerequisites

This course requires no prerequisites or prior experience of algebra, calculus, or programming/coding. This is intended to be a basic introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.

Course structure

Beginning with data visualization, this course gets learners started on building ggplot2 graphs early on. It aims to reinforce important concepts graphically. After moving through data wrangling and data importing, modeling plays a prominent role, with the goal of building regression models and later, performing inference for regression. Lastly, statistical inference is presented through a computational lens, with calculations done via the infer package.

Press + to interact
Flowchart of the course
Flowchart of the course

Intended audience

This course is intended for individuals who want to start developing their data science toolbox. At the same time, these individuals want to learn about the inferential and modeling tools used in modern day research.

We hope that by the end of this course, you’ll have learned how to:

  • Use R and the tidyverse suite of R packages for data science.

  • Fit your first models to data using a method known as linear regression.

  • Perform statistical inference using sampling, confidence intervals, and hypothesis tests.

  • Tell your story with data using these tools.