# The Best Metric to Measure Accuracy of Classification Models

Measuring accuracy of model for a classification problem (categorical output) is complex and time consuming compared to regression problems (continuous output). Let’s understand key testing metrics with example, for a classification problem.

**By Jacob Joseph, CleverTap.**

Unlike evaluating the accuracy of models that predict a continuous or discrete dependent variable like Linear Regression models, evaluating the accuracy of a classification model could be more complex and time-consuming. Before measuringÂ the accuracy of classification models, anÂ analyst would first measureÂ itsÂ robustness with the help of metrics such as AIC-BIC, AUC-ROC, AUC- PR, Kolmogorov-SmirnovÂ chart, etc. The next logical step is to measureÂ its accuracy. To understand the complexity behind measuringÂ the accuracy, we need to know few basic concepts.

**Model Output**

Most of the classification models output a probability number for the dataset.

E.g. â€“ A classification model like Logistic Regression will output a probability number between 0 and 1 instead of the desired output of actual target variable like Yes/No, etc.

The next logical step is to translate this probability number into the target/dependent variable in the model and testÂ the accuracy of the model. To understand the implication of translating the probability number, letâ€™s understand few basic concepts relating to evaluating a classification model with the help of an example given below.

**Goal:** Create a classification model that predicts fraud transactions

**Output:** Transactions that are predicted to be Fraud and Non-Fraud

**Testing:** Comparing the predicted result with the actual results

**Dataset:** Number of Observations: 1 million;Â Fraud : 100;Â Non-Fraud: 999,900

The fraud observations constitute justÂ **0.1%**Â of the entire dataset, representing a typical case ofÂ **ImbalancedÂ Class**.Â Imbalanced Classes arises fromÂ classification problems where the classes are not represented equally. Suppose you created a model that predicted 95% of the transactions as Non-Fraud, and all the predictions for Non-Frauds turn out to be accurate. But, that high accuracy for Non-Frauds shouldnâ€™t get you excited since Frauds are just 0.1% whereas the Predicted Frauds constitute 5% of the observations.

Assuming you were able to translate the output of your model to Fraud/Non-Fraud, the predicted result could be compared to actual result and summarized as follows:

**a) True Positives:** Observations where the actual and predicted transactions were fraud

**b) True Negatives:** Observations where the actual and predicted transactions werenâ€™t fraud

**c) False Positives:**Â Observations where the actual transactions werenâ€™t fraud but predicted to be fraud

**d) False Negatives:**Â Observations where the actual transactions were fraud but werenâ€™t predicted to be fraud

**Confusion MatrixÂ **is a popular way to represent the summarized findings.

True Positives (TP) |
False Negatives (FN) |

False Positives (FP) |
True Negatives (TN) |

Typically, a classification model outputs the result in the form of probabilities as shown below:

First 5 rows of the dataset:

Observation | Actual | Predicted |
---|---|---|

1 | Non-Fraud | 0.45 |

2 | Non-Fraud | 0.10 |

3 | Fraud | 0.67 |

4 | Non-Fraud | 0.60 |

5 | Non-Fraud | 0.11 |

Suppose we assume 0.5 as the cut-off probability i.e. observations with probability value of 0.5 and above are marked as Fraud and below 0.5 are marked as Non-Fraud as shown in the table below:

Accordingly, the above first 5 rows will be asÂ below:

Observation | Actual | Predicted |
---|---|---|

1 | Non-Fraud | Non-Fraud |

2 | Non-Fraud | Non-Fraud |

3 | Fraud | Fraud |

4 | Non-Fraud | Fraud |

5 | Non-Fraud | Non-Fraud |

Letâ€™s summarize the results from the model of the entire dataset with the help of the confusion matrix:

TP = 90 |
FN = 10 |

FP = 10 |
TN = 999,890 |

We have all non-zero cells in the above matrix. So is this result ideal?

Wouldnâ€™t we love a scenario wherein the model accurately identifies the Frauds and the Non-Frauds i.e. zero entry for cells, FP and FN?

A BIG YES.

Consider a scenario wherein as a marketing analyst; you would like to identify users who were likely to buy but havenâ€™t bought yet. This particular class of users would be the ones who share the characteristics of the users who bought. Such a class would belong to False Positives â€“ Users who were predicted to transact but didnâ€™t transact in reality. Hence, in addition to non-zero entries in TP and TN, you would prefer a non-zero entry in FP too. Thus, the model accuracy depends on the goal of the prediction exercise.

**Key Testing Metrics**

Since we are now comfortable with the interpretation of the Confusion Matrix, letâ€™s look at some popular metrics used for testing the classification models:

**i) Sensitivity/Recall**

Sensitivity also known as the True Positive rate or Recall is calculated as,

Sensitivity = No.Â of True Positives / (No. of True Positives + No.Â of False Negatives)

Sensitivity = TP / (TP + FN)

Since the formula doesnâ€™t contain FP and TN, Sensitivity mayÂ give you a biased result, especially for imbalancedÂ classes.

In the example of Fraud detection, it gives you the percentage of Correctly Predicted Frauds from the pool of Actual Frauds.

Sensitivity = 90 / (90 + 10) = 0.90

**ii) Specificity**

Specificity, also known as True Negative Rate is calculated as,

SpecificityÂ = No.Â of True NegativesÂ / (No. of True NegativesÂ + No.Â of False Positives)

SpecificityÂ = TN / (TN + FP)

Since the formula does not contain FN and TP, Specificity mayÂ give you a biased result, especially for imbalancedÂ classes.

In the example of Fraud detection, it gives you the percentage of Correctly Predicted Non-Frauds from the pool of Actual Non-Frauds.

SpecificityÂ = 999,890 / (999,890 + 10) = 1

**iii)Â Precision**

Precision also known asÂ Positive Predictive Value is calculated as,

PrecisionÂ = No. of True PositivesÂ / (No. of True PositivesÂ + No.Â of False Positives)

PrecisionÂ = TP / (TP + FP)

Since the formula does not contain FN and TN, Precision mayÂ give you a biased result, especially for imbalanced classes.

In the example of Fraud detection, it gives you the percentage of Correctly Predicted Frauds from the pool of Total Predicted Frauds.

PrecisionÂ = 90 / (90 + 10) = 0.90