Getting Started
Get an overview of the course’s content, along with its intended audience.
We'll cover the following
Overview
These are exciting times in statistics and data science education. However, it can also feel a bit overwhelming to stay on top of all the new statistical and technological innovations. This course is a valuable intro to stats because:
It provides learners with an experience of the whole data analysis line.
It incorporates contemporary, user-friendly R packages directly into the text.
It emphasizes models that prepare learners for our multivariate world.
We might think of statistics as just being a bunch of numbers. We commonly hear the word “statistician” when listening to broadcasts of sports events. Statistics, and in particular, data analysis, in addition to describing numbers, like baseball batting averages, plays a vital role in all of the sciences.
We’ll commonly hear the phrase “statistically significant” in the media, and similarly, we’ll see articles that say, “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this course, we’ll be able to understand better whether these claims should be trusted or whether we should be wary of them.
There are many subfields inside data analysis that we’ll discuss in this course. For example:
Data collection
Data wrangling
Data visualization
Data modeling
Inference
Correlation and regression
Interpretation of results
Data communication/storytelling
These subfields are summarized in what Grolemund and Wickham have previously termed the “data/science pipeline”:
Prerequisites
This course requires no prerequisites or prior experience of algebra, calculus, or programming/coding. This is intended to be a basic introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.
Course structure
Beginning with data visualization, this course gets learners started on building ggplot2
graphs early on. It aims to reinforce important concepts graphically. After moving through data wrangling and data importing, modeling plays a prominent role, with the goal of building regression models and later, performing inference for regression. Lastly, statistical inference is presented through a computational lens, with calculations done via the infer
package.
Intended audience
This course is intended for individuals who want to start developing their data science toolbox. At the same time, these individuals want to learn about the inferential and modeling tools used in modern day research.
We hope that by the end of this course, you’ll have learned how to:
Use R and the
tidyverse
suite of R packages for data science.Fit your first models to data using a method known as linear regression.
Perform statistical inference using sampling, confidence intervals, and hypothesis tests.
Tell your story with data using these tools.