Interpreting Hypothesis Tests

Learn how to interpret hypothesis tests.

Interpreting the results of hypothesis tests is one of the more challenging aspects of this method for statistical inference. Let’s understand the process and address some common misconceptions.

Two possible outcomes

Given a prespecified significance level α\alpha, there are two possible outcomes of a hypothesis test:

  • If the pp-value is less than α\alpha, then we reject the null hypothesis H0H_0 in favor of HAH_A.

  • If the pp-value is greater than or equal to α\alpha, we fail to reject the null hypothesis H0H_0.

Unfortunately, the latter result is often misinterpreted as accepting the null hypothesis H0H_0. While at first glance, it might seem that the statements “failing to reject H0H_0” and “accepting H0H_0” are equivalent, there actually is a subtle difference. Saying that we accept the null hypothesis H0H_0 is equivalent to stating that we think the null hypothesis H0H_0 is true. However, saying that we fail to reject the null hypothesis H0H_0 is saying something else: While H0H_0 might still be false, we don’t have enough evidence to say so. In other words, there’s an absence of enough proof; however, the absence of proof isn’t proof of absence.

To further shed light on this distinction, let’s use the United States criminal justice system as an analogy. A criminal trial in the United States is a similar situation to hypothesis tests whereby a choice between two contradictory claims must be made about a defendant who’s on trial:

  • The defendant is truly either innocent or guilty.

  • The defendant is presumed innocent until proven guilty.

  • The defendant is found guilty only if there’s strong evidence that the defendant is guilty. The phrase “beyond a reasonable doubt” is often used as a guideline for determining a cutoff for when enough evidence exists to find the defendant guilty.

  • The defendant is found to be either not guilty or guilty in the ultimate verdict.

In other words, not guilty verdicts aren’t suggesting the defendant is innocent, but instead that “while the defendant may still actually be guilty, there wasn’t enough evidence to prove this fact.” Now let’s make the connection with hypothesis tests:

  • Either the null hypothesis H0H_0 or the alternative hypothesis HAH_A is true.

  • Hypothesis tests are conducted assuming the null hypothesis H0H_0 is true.

  • We reject the null hypothesis H0H_0 in favor of HAH_A only if the evidence found in the sample suggests that HAH_A is true. The significance level α\alpha is used as a guideline to set the threshold for the strength of evidence we require.

  • We ultimately decide to either fail to reject H0H_0 or reject H0H_0.

So while gut instinct may suggest failing to reject H0H_0 and accepting H0H_0 are equivalent statements, they aren’t. Accepting H0H_0 is equivalent to finding a defendant innocent. However, courts don’t find defendants innocent but rather they find them not guilty. Putting things differently, defense attorneys don’t need to prove that their clients are innocent, rather they only need to prove that their clients aren’t guilty beyond a reasonable doubt.

So going back to our résumés activity, recall our hypothesis test 𝐻0 ∶ 𝑝𝑚𝑝𝑓=0𝐻_0 \space ∶\space 𝑝_𝑚 − 𝑝_𝑓 = 0 vs. 𝐻𝐴 ∶ 𝑝𝑚𝑝𝑓>0𝐻_𝐴 \space∶\space 𝑝_𝑚 − 𝑝_𝑓 > 0 and that we used a prespecified significance level of α\alpha = 0.05. We found a 𝑝𝑝-value of 0.027. We rejected H0H_0because the 𝑝𝑝-value was smaller than α\alpha = 0.05. In other words, we found the required levels of evidence in this particular sample to say that H0H_0 is false at the α\alpha = 0.05 significance level. We also state this conclusion using non-statistical language and found enough evidence in this data to suggest that there was gender discrimination at play.

Types of errors

Unfortunately, there is some chance a jury or a judge can make an incorrect decision in a criminal trial by reaching the wrong verdict. For example, finding a truly innocent defendant guilty or on the other hand, finding a truly guilty defendant not guilty. This can often stem from the fact that prosecutors don’t have access to all the relevant evidence but instead are limited to whatever evidence the police can find.

The same holds for hypothesis tests where we can make incorrect decisions about a population parameter because we only have a sample of data from the population. Thus, sampling variation can lead us to incorrect conclusions.

There are two possible erroneous conclusions in a criminal trial. Firstly, a truly innocent person is found guilty or secondly, a truly guilty person is found not guilty. Similarly, there are two possible errors in a hypothesis test. Firstly, rejecting H0H_0 when in fact H0H_0 is true. This is called a Type I error. Secondly, failing to reject H0H_0 when in fact, H0H_0 is false. This is called a Type II error. Another term used for Type I error is false positive, while another term for Type II error is false negative.

This risk of error is the price researchers pay for basing inference on a sample instead of performing a census on the entire population. However, as we’ve seen in our numerous examples and activities so far, censuses are often very expensive and other times impossible. Therefore, researchers have no choice but to use a sample. In any hypothesis test based on a sample, we have no choice but to tolerate some chance that a Type I error will be made and some chance that a Type II error will occur.

To help understand the concepts of Type I errors and Type II errors, we apply these terms to our criminal justice analogy in the figure below:

Get hands-on with 1400+ tech skills courses.