The AUROC is one of the most commonly used metric to evaluate a classifier's performances. This section explains how to compute it.
AUC (Area Under the Curve) is used most of the time to mean AUROC, which is a bad practice as AUC is ambiguous (could be any curve) while AUROC is not.
Abbreviation | Meaning |
---|---|
AUROC | Area Under the Curve of the Receiver Operating Characteristic |
AUC | Area Under the Curce |
ROC | Receiver Operating Characteristic |
TP | True Positives |
TN | True Negatives |
FP | False Positives |
FN | False Negatives |
TPR | True Positive Rate |
FPR | False Positive Rate |
The AUROC has several equivalent interpretations:
Assume we have a probabilistic, binary classifier such as logistic regression.
Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept of confusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:
To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:
In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.
Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:
To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold (for example ) for the logistic regression, then plot them on a single graph, with the FPR values on the abscissa and the TPR values on the ordinate. The resulting curve is called ROC curve, and the metric we consider is the AUC of this curve, which we call AUROC.
The following figure shows the AUROC graphically:
In this figure, the blue area corresponds to the Area Under the curve of the Receiver Operating Characteristic (AUROC). The dashed line in the diagonal we present the ROC curve of a random predictor: it has an AUROC of 0.5. The random predictor is commonly used as a baseline to see whether the model is useful.
A confusion matrix can be used to evaluate a classifier, based on a set of test data for which the true values are known. It is a simple tool, that helps to give a good visual overview of the performance of the algorithm being used.
A confusion matrix is represented as a table. In this example we will look at a confusion matrix for a binary classifier.
On the left side, one can see the Actual class (being labeled as YES or NO), while the top indicates the class being predicted and outputted (again YES or NO).
This means that 50 test instances - that are actually NO instances, were correctly labeled by the classifier as NO. These are called the True Negatives (TN). In contrast, 100 actual YES instances, were correctly labeled by the classifier as YES instances. These are called the True Positives (TP).
5 actual YES instances, were mislabeled by the classifier. These are called the False Negatives (FN). Furthermore 10 NO instances, were considered YES instances by the classifier, hence these are False Positives (FP).
Based on these FP,TP,FN and TN, we can make further conclusions.
True Positive Rate:
False Positive Rate:
A Receiver Operating Characteristic (ROC) curve plots the TP-rate vs. the FP-rate as a threshold on the confidence of an instance being positive is varied
Algorithm for creating an ROC curve
sort test-set predictions according to confidence that each instance is positive
step through sorted list from high to low confidence
i. locate a threshold between instances with opposite classes (keeping instances with the same confidence value on the same side of threshold)
ii. compute TPR, FPR for instances above threshold
iii. output (FPR, TPR) coordinate