The Best Metric to Measure Accuracy of Classification Models
Measuring accuracy of model for a classification problem (categorical output) is complex and time consuming compared to regression problems (continuous output). Let’s understand key testing metrics with example, for a classification problem.
Pages: 1 2
iv) F1 score
F1 score incorporates both Recall and Precision and is calculated as,
F1 score = 2 * (Precision * Recall) / (Precision + Recall)
The F1 score represents a more balanced view compared to the above 3 metrics but could give a biased result in the scenario discussed later since it doesn’t include TN.
F1 score = 2 * (0.90 * 0.90) / (0.90 + 0.90) = 0.90
v) Matthews Correlation Coefficient (MCC)
Unlike the other metrics discussed above, MCC takes all the cells of the Confusion Matrix into consideration in its formula.
MCC = TP * TN – FP * FN / √ (TP +FP) * (TP + FN) * (TN + FP) * (TN + FN)
Similar to Correlation Coefficient, the range of values of MCC lie between 1 to +1. A model with a score of +1 is a perfect model and 1 is a poor model. This property is one of the key usefulness of MCC as it leads to easy interpretability.
MCC = 90*999,890 – 10*10 / √(90+10)*(90+10)*(999,890+10)*(999,890+10)
MCC = 0.90
Metric Comparison
We will test and compare the result of the classification model at few probability cutoff values using the abovementioned testing metrics.
Scenario A: Confusion Matrix at cutoff value of 0.5
We shall take this scenario (cutoff value of 0.5) as the base case and compare the result of the base case with different cutoff values.
Confusion Matrix
TP = 90  FN = 10 
FP = 10  TN = 999,890 
Testing Metrics
Sensitivity  Specificity  Precision  F1 Score  MCC 

0.90  1.00  0.90  0.90  0.90 
Scenario B: Confusion Matrix at cutoff value of 0.4
Confusion Matrix
TP = 90  FN = 10 
FP = 1910  TN = 997,990 
It can be clearly observed that for Scenario B, there is a substantial increase in FP compared to Scenario A. Hence, there should be deterioration in the metrics.
Testing Metrics
Sensitivity  Specificity  Precision  F1 Score  MCC 

0.90  1.00  0.05  0.09  0.20 
There is no change in Sensitivity & Specificity, which is constant.
Scenario C: Confusion Matrix at cutoff value of 0.6
Confusion Matrix
TP = 90  FN = 1910 
FP = 10  TN = 997,990 
There is a substantial increase in FN compared to Scenario A. Hence, there should be deterioration in the metrics compared to A.
Testing Metrics
Sensitivity  Specificity  Precision  F1 Score  MCC 

0.05  1.00  0.90  0.09  0.20 
Here there is no change in Specificity & Precision while there is a general decline in other metrics.
Based on our findings, we can say that F1 score and MCC is making more sense compared to Sensitivity and Specificity.
In the example, we have built a model to predict Fraud. We can use the same model to predict NonFraud. In such a case, the Confusion Matrix will be as given below:
Scenario D: Confusion Matrix at cutoff value of 0.5
Confusion Matrix
TP = 999,890  FN = 10 
FP = 10  TN = 90 
The above confusion matrix is just the transpose of the matrix given in Scenario A since the model is predicting NonFrauds instead of Frauds. So the True Negatives in Scenario A will be the True Positives for Scenario D, likewise for other cells. Ideally, the testing metrics should be the same for Scenario A and D.
Testing Metrics
Sensitivity  Specificity  Precision  F1 Score  MCC 

1  0.90  1  1  0.90 
Except for MCC all the other testing metrics have changed.
Summary of Testing Metrics for all the scenarios:
Scenario  Sensitivity  Specificity  Precision  F1 Score  MCC 

A  0.90  1.00  0.90  0.90  0.90 
B  0.90  1.00  0.05  0.09  0.20 
C  0.05  1.00  0.90  0.09  0.20 
D  1  0.90  1  1  0.90 
Conclusion
As an analyst, if you are looking at a metric to measure and maximize the overall accuracy of the classification model, MCC seems to the best bet since it is not only easily interpretable but also robust to changes in the prediction goal.
Original post. Reposted with permission.
Bio: Jacob Joseph works with CleverTap, a digital analytics, user engagement and personalization platform where he is an integral part leading their data science team. His role encompasses deriving key actionable business insights and applying machine learning algorithms to augment CleverTap’s effort to deliver worldclass real time analytics to its customers.
Related:
Pages: 1 2
Top Stories Past 30 Days

