Introduction and Exploratory Data Analysis

Get introduced to multiple regression and learn how to get started with exploratory data analysis.

We introduced ideas related to modeling for explanation, in particular that the goal of modeling is to make explicit the relationship between some outcome variable yy and some explanatory variable xx. While there are many approaches to modeling, we focused on one particular technique, which is linear regression. Linear regression is one of the most commonly used and easy-to-understand approaches to modeling. Furthermore, to keep things simple, we only considered models with one explanatory xx variable that was either numerical or categorical.

In multiple regression, we’ll start considering models that include more than one explanatory variable xx.We can imagine when trying to model a particular outcome variable, like teaching evaluation scores or life expectancy, that it will be useful to include more than just one explanatory variable’s worth of information.

Our regression models will now consider more than one explanatory variable. Hence, the interpretation of the associated effect of any one explanatory variable must be made in conjunction with the other explanatory variables included in our model.

Needed packages

Recall that loading the tidyverse package by running the library(tidyverse) loads the following commonly used data science packages all at once:

  • The ggplot2 package for data visualization

  • The dplyr package for data wrangling

  • The tidyr package for converting data to tidy format

  • The readr package for importing spreadsheet data into R

  • The purrr, tibble, stringr, and forcats packages, which are much more advanced

Get hands-on with 1400+ tech skills courses.