By Tim Graettinger, Ph.D., Feb 2012
Yes or no. Buy or sell. Renew or cancel. Many customer behaviors have this flavor of a choice between two alternatives. Suppose software called a "classifier" (explained more below) is available to predict customer choices in advance. Would you use it? Why or why not? Perhaps you'd like to test it to see how well it performs before you commit. In this installment of my ongoing series on the nuts and bolts of data mining, I discuss the use of classifiers and the question of performance. Regarding performance, we specifically consider hits, misses, false alarms, and the ROC curve that pulls them all together.
The trade-off between hit rate and false alarm rate depends on both the classifier and the choice of threshold. Since we are testing a particular classifier, we make no changes there. That leaves the threshold as the sole adjustment knob. Stop reading for a moment and imagine what happens to the hit rate and the false alarm rate as you raise the threshold. What happens when you lower it?
Here's the answer: when you lower the threshold, more claims fall into the disallowed bucket, which generally will include more hits. As a result, the hit rate goes up -- but so does the false alarm rate. Conversely, when you raise the threshold, you get higher-quality hits, but fewer of them. On the plus side, you produce fewer false alarms.
The ROC Curve
Wouldn't you like to try a few different thresholds to see how the hit rates and false alarm rates change? Why not try ALL threshold values?! That, in fact, is the purpose of the ROC curve shown in Figure 1. The horizontal axis is the false alarm rate, and the vertical axis is the hit rate. One choice of threshold produces one point on the curve, a combination of false alarm rate and hit rate. Another choice produces another point. By varying the threshold across a range of values, say from 0 to 1, you generate the entire curve (in blue) for the classifier. Small values for the threshold produce points in the northeast corner of the graph, corresponding to high hit rates and high false alarm rates. High threshold values, conversely, generate points in the southwest corner, where the hit rates and false alarm rates are both low.