Case Study: Seattle House Prices-II

Learn about the ModernDive with this case study with EDA.

Exploratory data analysis

Let’s now continue our EDA by creating multivariate visualizations. Unlike univariate histograms and barplots, multivariate visualizations show relationships between more than one variable. This is an important step for an EDA to perform because the goal of modeling is to explore relationships between variables.

Our model involves a numerical outcome variable, a numerical explanatory variable, and a categorical explanatory variable, so we’re in a regression modeling situation.

We, therefore, have two choices of models we can fit. First, an interaction model where the regression line for each condition level will have both a different slope and a different intercept. Second, a parallel slopes model where the regression line for each condition level will have the same slope but different intercepts.

The geom_parallel_slopes() function is a special purpose function that Evgeni Chasnovski created and included in the moderndive package. This was done because the geom_smooth() method in the ggplot2 package doesn’t have a convenient way to plot parallel slopes models. We plot both resulting models in the figure below, with the interaction model.

Get hands-on with 1200+ tech skills courses.