Introduction and Needed Packages

Learn about some basic regression techniques and how to get started with regression in R.

We'll cover the following

Now that we’re equipped with data visualization skills, data wrangling skills, and an understanding of how to import data and the concept of a tidy data format, let’s now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between:

  • An outcome variable 𝑦𝑦, also called a dependent variable or response variable

  • An explanatory/predictor variable 𝑥𝑥, also called an independent variable or covariate.

Another way to state this is using mathematical terminology. We’ll model the outcome variable yyas a function of the explanatory/predictor variable 𝑥𝑥. When we say “function” here, we aren’t referring to functions in R like the ggplot() function, but rather as a mathematical function. However, why do we have two different labels, explanatory and predictor, for the variable 𝑥𝑥? That’s because even though the two terms are often used interchangeably, roughly speaking, data modeling serves one of two purposes:

  1. Modeling for explanation: When we want to explicitly describe and quantify the relationship between the outcome variable 𝑦𝑦 and a set of explanatory variables 𝑥𝑥, we have to determine the significance of any relationships. We have measures that’ll summarize these relationships, and possibly identify any causal relationships between the variables.

  2. Modeling for prediction: When we want to predict an outcome variable 𝑦𝑦 based on the information contained in a set of predictor variables 𝑥𝑥. However, unlike modeling for explanation, we don’t care so much about understanding how all the variables relate and interact with one another. Our focus is on whether we can make good predictions about 𝑦𝑦 using the information in 𝑥𝑥.

Get hands-on with 1400+ tech skills courses.