Recap: Bootstrapping and Confidence Intervals

Revise bootstrapping and confidence intervals.

We'll cover the following

Comparing bootstrap and sampling distributions
- Comparison
Theory-based confidence intervals

Comparing bootstrap and sampling distributions

Let’s talk more about the relationship between sampling distributions and bootstrap distributions.

Recall that earlier we took 1,000 virtual samples from the bowl using a virtual shovel, computed 1,000 values of the sample proportion of red, and then visualized their distribution in a histogram. Recall also that this distribution is called the sampling distribution of $\hat{p}$ . Furthermore, the standard deviation of the sampling distribution has a special name, and that’s the standard error.

We also mentioned that this sampling activity doesn’t reflect how sampling is done in real life. Rather, it was an idealized version of sampling so that we could study the effects of sampling variation on estimates, like the proportion of the shovel’s balls that are red. In real life, however, one would take a single sample that’s as large as possible, much like in the Obama poll we saw previously. However, how can we get a sense of the effect of sampling variation on estimates if we only have one sample and therefore only one estimate? Don’t we need many samples and therefore many estimates?

The workaround to having a single sample was to perform bootstrap resampling with replacement from the single sample. We did this in the resampling activity where we focused on the mean year of the minting of pennies. We used pieces of paper representing the original sample of 50 pennies from the bank and resampled them with replacement from a hat. We had 35 of our friends perform this activity and visualized the resulting 35 sample means $\bar{x}$ in a histogram.

This distribution was called the bootstrap distribution of $\bar{x}$ . We stated at the time that the bootstrap distribution is an approximation to the sampling distribution of $\bar{x}$ in the sense that both distributions will have a similar shape and similar spread. Therefore, the standard error of the bootstrap distribution can be used as an approximation to the standard error of the sampling distribution.

Comparison

Now that we have computed both the sampling distribution and the bootstrap distributions, let’s compare them side by side in the figure below. We’ll make both histograms have matching scales on the x- and y-axes to make them more comparable. Furthermore, we’ll add:

To the sampling distribution on the top, a solid line denoting the proportion of the bowl’s balls that are red, $𝑝$ = 0.375
To the bootstrap distribution on the bottom, a dashed line at the sample proportion $\hat{p}$ = 21/50 = 0.42 = 42% that Ilyas and Yohan observed

Get hands-on with 1400+ tech skills courses.

Getting Started with Data in R

Data Visualization

Data Wrangling

Data Importing and “Tidy” Data

Basic Regression

Multiple Regression

Statistical Inference with the infer Package

Bootstrapping and Confidence Intervals

Hypothesis Testing

Inference for Regression

Price Prediction With Regression Analysis in R

Tell a Story with Data

Appendix

Uber Data Analysis Using the R Language

Recap: Bootstrapping and Confidence Intervals

Comparing bootstrap and sampling distributions

Comparison