Classification Accuracy
Learn how to assess the quality of the prediction of the binary classification model.
We'll cover the following
Binary classification metrics with logistic regression and near-default options
Now we proceed to fit an example model to illustrate binary classification metrics. We will continue to use logistic regression with near-default options. The following code loads the model class and creates a model object.
from sklearn.linear_model import LogisticRegression
example_lr = LogisticRegression(C=0.1, class_weight=None,
dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100,
multi_class='auto', n_jobs=None,
penalty='l2', random_state=None,
solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
Now we proceed to train the model, as you might imagine, using the labeled data from our training set. We proceed immediately to use the trained model to make predictions on the features of the samples from the held-out test set:
example_lr.fit(X_train, y_train)
LogisticRegression(C=0.1, solver='liblinear')
# LogisticRegression(C=0.1, solver='liblinear')
y_pred = example_lr.predict(X_test)
Understanding the limitations of accuracy
We’ve stored the model-predicted labels of the test set in a variable called y_pred
. How should we now assess the quality of these predictions? We have the true labels, in the y_test
variable. First, we will compute what is probably the simplest of all binary classification metrics: accuracy. Accuracy is defined as the proportion of samples that were correctly classified.
One way to calculate accuracy is to create a logical mask that is True
whenever the predicted label is equal to the actual label, and False
otherwise. We can then take the average of this mask, which will interpret True
as 1
and False
as 0
, giving us the proportion of correct classifications:
is_correct = y_pred == y_test
np.mean(is_correct)
# 0.7834239639977498
This indicates that the model is correct 78% of the time. While this is a pretty straightforward calculation, there are actually easier ways to calculate accuracy using the convenience of scikit-learn. One way is to use the trained model’s .score
method, passing the features of the test data to make predictions on, as well as the test labels. This method makes the predictions and then does the same calculation we performed previously, all in one step. Or, we could import scikit-learn’s metrics
library, which includes many model performance metrics, such as accuracy_score
. For this, we pass the true labels and the predicted labels:
example_lr.score(X_test, y_test)
# 0.7834239639977498
from sklearn import metrics
metrics.accuracy_score(y_test, y_pred)
# 0.7834239639977498
Get hands-on with 1200+ tech skills courses.