One Categorical Explanatory Variable

Learn about categorical explanatory variables and how they can be used in regression.

It’s an unfortunate truth that life expectancy isn’t the same across all countries in the world. International development agencies are interested in studying these differences in life expectancy in the hopes of identifying where governments should allocate resources to address this problem. In this lesson, we’ll explore differences in life expectancy in two ways:

  • Differences between continents: Are there significant differences in average life expectancy between the five populated continents of the world—Africa, the Americas, Asia, Europe, and Oceania?

  • Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia?

To answer such questions, we’ll use the gapminder data frame included in the gapminder package. This dataset has international development statistics such as life expectancy, GDP per capita, and population for 142 countries for 5-year intervals between 1952 and 2007. Recall that we visualized some of this data in the grammar of graphics.

We’ll use this data for basic regression again, but now using an explanatory variable 𝑥𝑥 that’s categorical, as opposed to the numerical explanatory variable model. So essentially we’ll have:

  • A numerical outcome variable y (a country’s life expectancy)

  • A single categorical explanatory variable x (the continent that the country is a part of)

Get hands-on with 1200+ tech skills courses.