| KDnuggets : News : 2008 : n17 : item24 | |
PublicationsFrom: Bruce RatnerDate: 05 Sep 2008 Subject: Genetic Data Mining: The Correlation Coefficient Assessing the relationship between a predictor variable and a target variable is an essential task in statistical linear regression model building. If the relationship is straight-line (linear), then no extra work of straightening the relationship is needed: Simply test the predictor variable's statistical importance to stay in the model. If the relationship is not linear, then one of the two variables is re-expressed (altho, sometimes both variables are re-expressed) to affect the observed relationship such that the "re-expressed" relationship is as linear as the data permit. Then, the re-expressed variable is tested for inclusion into the model. Most methods of assessing relationships among variables are based on the well-known correlation coefficient, which is often misused because its linearity assumption (i.e., the true underlying relationship is straight-line) is not tested by the scatterplot. The purpose of this article is to illustrate a genetic data mining method -- the GenIQ Model -- that is one of the better "data-straightener" methods available. I use a small dataset to make the "GenIQ data-straightener" method tractable and attractive for the everyday model builder to make it part of the modeler's toolkit. I present a succinct discussion of the genetic-based method, along with a basic statement of the GenIQ Model, and what's "good-to-know" about the GenIQ Model output to ease the understanding of genetic data mining for the correlation coefficient. Read more. |
| KDnuggets : News : 2008 : n17 : item24 | |
Copyright © 2008 KDnuggets. Subscribe to KDnuggets News!