KDnuggets : News : 2009 : n07 : item37 < PREVIOUS | NEXT >

Publications

From: Bruce Ratner
Date: Mon, 06 Apr 2009
Subject: Variable Selection Methods in Regression: Do they produce bad models?

Variable Selection Methods in Regression: Many Statisticians Know Them, But Few Know They Produce Poorly Performing Models

Variable selection in regression - identifying the best subset among many variables to include in a model - is arguably the hardest part of model building. Many variable selection methods exist. Many statisticians know them, but few know they produce poorly performing models. The wanting variable selection methods are a miscarriage of statistics because they are developed by debasing sound statistical theory into a misguided pseudo-theoretical foundation.

The purpose of this article is three-fold:

1) To review five widely used variable selection methods, itemize some of their weaknesses, and answer why they are used;

2) to present a well-defined enhanced variable selec-tion method, which is a prominent by-product of the GenIQ Model�, a machine-learning regres-sion technique.

3) Lastly, because a free-form GenIQ model is concurrently built during the enhanced variable selection - to introduce the GenIQ Model for building database marketing regression models, which seek to maximize cum lift, a measure of model predictiveness of identifying the up-per performing individuals.

http://www.geniq.net/res/variable-selection-methods-produce-bad-models.html


KDnuggets : News : 2009 : n07 : item37 < PREVIOUS | NEXT >

Copyright © 2009 KDnuggets.   Subscribe to KDnuggets News!