Getting Started

Get an overview of the course’s content, along with its intended audience.

We'll cover the following

Overview
Prerequisites
Course structure
Intended audience

Overview

These are exciting times in statistics and data science education. However, it can also feel a bit overwhelming to stay on top of all the new statistical and technological innovations. This course is a valuable intro to stats because:

It provides learners with an experience of the whole data analysis line.
It incorporates contemporary, user-friendly R packages directly into the text.
It emphasizes models that prepare learners for our multivariate world.

We might think of statistics as just being a bunch of numbers. We commonly hear the word “statistician” when listening to broadcasts of sports events. Statistics, and in particular, data analysis, in addition to describing numbers, like baseball batting averages, plays a vital role in all of the sciences.

We’ll commonly hear the phrase “statistically significant” in the media, and similarly, we’ll see articles that say, “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this course, we’ll be able to understand better whether these claims should be trusted or whether we should be wary of them.

There are many subfields inside data analysis that we’ll discuss in this course. For example:

Data collection
Data wrangling
Data visualization
Data modeling
Inference
Correlation and regression
Interpretation of results
Data communication/storytelling

These subfields are summarized in what Grolemund and Wickham have previously termed the “data/science pipeline”:

Press + to interact

Prerequisites

This course requires no prerequisites or prior experience of algebra, calculus, or programming/coding. This is intended to be a basic introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.

Course structure

Beginning with data visualization, this course gets learners started on building ggplot2 graphs early on. It aims to reinforce important concepts graphically. After moving through data wrangling and data importing, modeling plays a prominent role, with the goal of building regression models and later, performing inference for regression. Lastly, statistical inference is presented through a computational lens, with calculations done via the infer package.

Press + to interact

Intended audience

This course is intended for individuals who want to start developing their data science toolbox. At the same time, these individuals want to learn about the inferential and modeling tools used in modern day research.

We hope that by the end of this course, you’ll have learned how to:

Use R and the tidyverse suite of R packages for data science.
Fit your first models to data using a method known as linear regression.
Perform statistical inference using sampling, confidence intervals, and hypothesis tests.
Tell your story with data using these tools.

Getting Started with Data in R

Data Visualization

Data Wrangling

Data Importing and “Tidy” Data

Basic Regression

Multiple Regression

Statistical Inference with the infer Package

Bootstrapping and Confidence Intervals

Hypothesis Testing

Inference for Regression

Price Prediction With Regression Analysis in R

Tell a Story with Data

Appendix

Uber Data Analysis Using the R Language

Getting Started

Overview

Prerequisites

Course structure

Intended audience