Boxplots
Explore boxplots and how to plot them in R.
We'll cover the following
While faceted histograms are one type of visualization used to compare the distribution of a numerical variable split by the values of another variable, another type of visualization that achieves this same goal is a side-by-side boxplot. A boxplot is constructed from the information provided in the five-number summary of a numerical variable.
Five-number summary
The five-number summary consists of five summary statistics: the minimum, the first quartile (25th percentile), the second quartile (median or 50th percentile), the third quartile (75th percentile), and the maximum.
The quartiles are calculated as:
The first quartile (
): The median of the first half of the sorted data The third quartile (
): The median of the second half of the sorted data
The interquartile range (IQR) is defined as
The median and the IQR aren’t influenced by the presence of outliers in the ways that the mean and standard deviation are. They are, therefore, recommended for skewed datasets. We can say, in this case, that the median and IQR are more robust to outliers.
To keep things simple for now, let’s only consider the 2,141 hourly temperature recordings for the month of November, each represented as a jittered point in the following figure.
Get hands-on with 1400+ tech skills courses.