Simulation-Based Inference for Regression
Learn about the confidence interval for a slope in simulation-based inference for regression.
We'll cover the following
When we interpreted the third through seventh columns of a regression table, we stated that R doesn’t do simulations to compute these values. Rather, R uses theory-based methods that involve mathematical formulas.
We’ll use the simulation-based methods to recreate the values in the regression table. In particular, we’ll use the infer
package workflow to:
Construct a 95% confidence interval for the population slope
using bootstrap resampling with replacement. We did this with the pennies
data and with themythbusters_yawn
data.Conduct a hypothesis test of
vs. using a permutation test.
Confidence interval for a slope
We’ll construct a 95% confidence interval for infer
workflow. Specifically, we’ll first construct the bootstrap distribution for the fitted slope
We
specify()
the variables of interest inevals_ch5
with the formula,score ~ bty_avg
.We
generate()
replicates by usingbootstrap
resampling with replacement from the original sample of 463 courses. We generatereps = 1000
replicates usingtype = "bootstrap"
.We
calculate()
the summary statistic of interest that’s the fittedslope
.
Using this bootstrap distribution, we’ll construct the 95% confidence interval using the percentile method and (if appropriate) the standard error method as well. It’s important to note in this case that the bootstrapping with replacement is done row-by-row. Therefore, the original pairs of score
and bty_avg
values are always kept together, but different pairs of score
and bty_avg
values might be resampled multiple times. The resulting confidence interval will denote a range of plausible values for the unknown population slope
Get hands-on with 1400+ tech skills courses.