Finer Points of the F-test
Learn how the F-test is a generalization of the t-test.
We'll cover the following
Equivalence to the t-test for two classes and cautions
When we use an F-test to look at the difference in means between just two groups, as we’ve done for the binary classification problem of the case study, the test we are performing actually reduces to what’s called a t-test. An F-test is extensible to three or more groups and so is useful for multiclass classification. A t-test just compares the means between two groups of samples, to see whether the difference in those means is statistically significant.
While the F-test served our purposes here of univariate feature selection, there are a few cautions to keep in mind. Going back to the concept of formal statistical assumptions, for the F-test these include that the data is normally distributed. We have not checked this. Also, in comparing the same response variable, y
, to many potential features from the matrix, X
, we have performed what is known in statistics as multiple comparisons. In short, this means that by examining multiple features in comparison to the same response over and over, the odds increase that we’ll find what we think is a “good feature” just by random chance. However, such features may not generalize to new data. There are statistical corrections for multiple comparisons that amount to adjusting the p-values to account for this.
Get hands-on with 1200+ tech skills courses.