Overview

From confusion matrix

A confusion matrix is a graphical representation of how accurate a classifier is at predicting the labels for a categorical variable.

Link to original

MetricMeasureFormula
accuracyProportion of correct predictions(TP + TN) / (TP + TN + FP + FN)
precisionProportion of true positives out of all positive predictionsTP / (TP + FP)
sensitivity (recall)Proportion of true positives out of all actual positivesTP / (TP + FN)
specificityProportion of true negatives out of all actual negativesTN / (TN + FP)
F-betaHarmonic mean of precision and recall

We also have

ROC curve

ROC curve

(MLU-Explain) (Udacity)

ROC curve visualizes some classification metric

The classification algorithm makes a split (at a decision threshold) so that each side after split is as homogeneous as possible. At different decision thresholds, the ROC curve plots true positive rate and false positive rate

  1. True Positive Rate: equivalent to sensitivity.
  1. False Positive Rate:  The ratio between the False Positives and the total count of observations that should be predicted as False. This is equivalent to 1-specificity

Usage

Curves that fall above the ROC Curve of a random classifier (the diagonal line) are good or decent. The higher up they are (i.e. the closer they are to the curve of the elusive perfect classifier), the better.

Code

import matplotlib.pyplot as plt
from sklearn.metrics import RocCurveDisplay
 
# ROC curve
RocCurveDisplay.from_predictions(y_test, y_pred)
 
plt.show()
Link to original

AUC

Area under ROC curve AUC

(MLU-Explain)

Area under ROC curve (or AUC) provides an aggregate measure of performance across all possible classification thresholds.

It is the probability that the model will rank a randomly chosen positive example more highly than a randomly chosen negative example.

AUC ranges in value from 0.0 to 1.0.

  • AUC of 0 100% wrong
  • AUC of 1 100% correct

An AUC smaller than 0.5 indicates that the model performs worse than a random classifier (i.e. a classifier that randomly assigns each example to True or False), and an AUC larger than 0.5 indicates that the model performs better than a random classifier.

Link to original