KDnuggets : News : 2009 : n15 : item17 < PREVIOUS | NEXT >

Publications

From: BruceRatner
Date: Mon, 03 Aug 2009
Subject: Confusion Matrix: Perhaps Confusing, but Definitely Biased

The traditional statistical paradigm for building a binary classification model is: The data analyst fits the data to the logistic regression model, whose equation is the sum of weighted predictor variables, which are declared statistically significant.

The weights (better known as regression coefficients) are the main appeal of the statistical paradigm, as they provide the key to interpreting what the equation means. The information needed to assess the goodness of a classification model exists within the confusion matrix, whose construction is part of the traditional three-step approach:

1) Construction of the 2x2 table of actual versus predicted outcomes - the confusion matrix itself;

2) Calculation of the six standard terms based on the confusion matrix; and,

3) Rote understanding and inseeing interpretation of the sextuplet terms.

The latter is what gives the modifier confusion to the term matrix. This article is focused on the database-marketing logistic model; accordingly, I use the binary (dichotomous) target variable Response, which assumes 0 and 1. (The treatment of this topic can easily be extended to a polychotomous (multinomial) target variable.)

Read more.


KDnuggets : News : 2009 : n15 : item17 < PREVIOUS | NEXT >

Copyright © 2009 KDnuggets.   Subscribe to KDnuggets News!