Interpreting Hypothesis Tests

Learn how to interpret hypothesis tests.

We'll cover the following

Two possible outcomes
Types of errors
How do we choose alpha?

Interpreting the results of hypothesis tests is one of the more challenging aspects of this method for statistical inference. Let’s understand the process and address some common misconceptions.

Two possible outcomes

Given a prespecified significance level $\alpha$ , there are two possible outcomes of a hypothesis test:

If the $p$ -value is less than $\alpha$ , then we reject the null hypothesis $H_0$ in favor of $H_A$ .
If the $p$ -value is greater than or equal to $\alpha$ , we fail to reject the null hypothesis $H_0$ .

Unfortunately, the latter result is often misinterpreted as accepting the null hypothesis $H_0$ . While at first glance, it might seem that the statements “failing to reject $H_0$ ” and “accepting $H_0$ ” are equivalent, there actually is a subtle difference. Saying that we accept the null hypothesis $H_0$ is equivalent to stating that we think the null hypothesis $H_0$ is true. However, saying that we fail to reject the null hypothesis $H_0$ is saying something else: While $H_0$ might still be false, we don’t have enough evidence to say so. In other words, there’s an absence of enough proof; however, the absence of proof isn’t proof of absence.

To further shed light on this distinction, let’s use the United States criminal justice system as an analogy. A criminal trial in the United States is a similar situation to hypothesis tests whereby a choice between two contradictory claims must be made about a defendant who’s on trial:

The defendant is truly either innocent or guilty.
The defendant is presumed innocent until proven guilty.
The defendant is found guilty only if there’s strong evidence that the defendant is guilty. The phrase “beyond a reasonable doubt” is often used as a guideline for determining a cutoff for when enough evidence exists to find the defendant guilty.
The defendant is found to be either not guilty or guilty in the ultimate verdict.

In other words, not guilty verdicts aren’t suggesting the defendant is innocent, but instead that “while the defendant may still actually be guilty, there wasn’t enough evidence to prove this fact.” Now let’s make the connection with hypothesis tests:

Either the null hypothesis $H_0$ or the alternative hypothesis $H_A$ is true.
Hypothesis tests are conducted assuming the null hypothesis $H_0$ is true.
We reject the null hypothesis $H_0$ in favor of $H_A$ only if the evidence found in the sample suggests that $H_A$ is true. The significance level $\alpha$ is used as a guideline to set the threshold for the strength of evidence we require.
We ultimately decide to either fail to reject $H_0$ or reject $H_0$ .

So while gut instinct may suggest failing to reject $H_0$ and accepting $H_0$ are equivalent statements, they aren’t. Accepting $H_0$ is equivalent to finding a defendant innocent. However, courts don’t find defendants innocent but rather they find them not guilty. Putting things differently, defense attorneys don’t need to prove that their clients are innocent, rather they only need to prove that their clients aren’t guilty beyond a reasonable doubt.

So going back to our résumés activity, recall our hypothesis test $𝐻_0 \space ∶\space 𝑝_𝑚 − 𝑝_𝑓 = 0$ vs. $𝐻_𝐴 \space∶\space 𝑝_𝑚 − 𝑝_𝑓 > 0$ and that we used a prespecified significance level of $\alpha$ = 0.05. We found a $𝑝$ -value of 0.027. We rejected $H_0$ because the $𝑝$ -value was smaller than $\alpha$ = 0.05. In other words, we found the required levels of evidence in this particular sample to say that $H_0$ is false at the $\alpha$ = 0.05 significance level. We also state this conclusion using non-statistical language and found enough evidence in this data to suggest that there was gender discrimination at play.

Types of errors

Unfortunately, there is some chance a jury or a judge can make an incorrect decision in a criminal trial by reaching the wrong verdict. For example, finding a truly innocent defendant guilty or on the other hand, finding a truly guilty defendant not guilty. This can often stem from the fact that prosecutors don’t have access to all the relevant evidence but instead are limited to whatever evidence the police can find.

The same holds for hypothesis tests where we can make incorrect decisions about a population parameter because we only have a sample of data from the population. Thus, sampling variation can lead us to incorrect conclusions.

There are two possible erroneous conclusions in a criminal trial. Firstly, a truly innocent person is found guilty or secondly, a truly guilty person is found not guilty. Similarly, there are two possible errors in a hypothesis test. Firstly, rejecting $H_0$ when in fact $H_0$ is true. This is called a Type I error. Secondly, failing to reject $H_0$ when in fact, $H_0$ is false. This is called a Type II error. Another term used for Type I error is false positive, while another term for Type II error is false negative.

This risk of error is the price researchers pay for basing inference on a sample instead of performing a census on the entire population. However, as we’ve seen in our numerous examples and activities so far, censuses are often very expensive and other times impossible. Therefore, researchers have no choice but to use a sample. In any hypothesis test based on a sample, we have no choice but to tolerate some chance that a Type I error will be made and some chance that a Type II error will occur.

To help understand the concepts of Type I errors and Type II errors, we apply these terms to our criminal justice analogy in the figure below:

Get hands-on with 1400+ tech skills courses.

Getting Started with Data in R

Data Visualization

Data Wrangling

Data Importing and “Tidy” Data

Basic Regression

Multiple Regression

Statistical Inference with the infer Package

Bootstrapping and Confidence Intervals

Hypothesis Testing

Inference for Regression

Price Prediction With Regression Analysis in R

Tell a Story with Data

Appendix

Uber Data Analysis Using the R Language

Interpreting Hypothesis Tests

Two possible outcomes

Types of errors