Parsed RELR Substantially Reduces Variables in Reduced Error Logistic Regression (KDnuggets News 08:17, item 8, Software)

KDnuggets : News : 2008 : n17 : item8

Software

From: Dan Rice
Date: 27 Aug 2008
Subject: Parsed RELR Substantially Reduces Variables in Reduced Error Logistic Regression

September 2, 2008. St. Louis, MO (USA) - Rice Analytics, a SAS Alliance Partner, and the exclusive provider of Reduced Error Logistic Regression (RELR) software announced today that it has invented a process to reduce the number of variables in RELR models substantially. This process is called Parsed Reduced Error Logistic Regression (Parsed RELR or PRELR). Parsed RELR is based upon a backward selection process with innovations to maximize the out-of-sample reliability of the selected variables. Like Full RELR, Parsed RELR appears to give very accurate validation sample classification without significant overfitting. Like Full RELR, Parsed RELR is suited for high dimensional datasets with thousands of variables and interactions as inputs. However, Parsed RELR gives extremely parsimonious solutions that often select fewer than ten variables for high dimensional, multicollinear datasets that would require hundreds of variables in Full RELR solutions. A Free Tutorial on Reduced Error Logistic Regression that includes Parsed RELR can be downloaded with concrete and easy-to-understand examples at www.RiceAnalytics.com.

While other predictive modeling methods can select models with few variables, reliability is a major cause for concern unless the sample size is extremely large or unless the variables are not at all collinear. For example, relatively few of the genes selected by standard predictive modeling methods or Bayesian methods to predict human diseases are replicated with different samples of observations, even when that different sample is a correlated bootstrapped sample rather than an independent sample (please see review reference in Tutorial on Reduced Error Logistic Regression). Obviously, if the selected variables and their regression coefficients are simply an artifact of the sample of observations, then these models cannot be explanatory models that help uncover causal variables and/or their correlates.

In contrast, Parsed RELR models appear to have very good reliability of selected variables and their regression coefficients at much smaller sample sizes than required by other methods. This reliability directly results from the error reduction features of RELR that give extremely tight confidence intervals around regression coefficients compared to other shrinkage methods such as Penalized Logistic Regression. Because RELR models do not require inordinate sample sizes for reliable variable selections, Parsed RELR models can show extremely good face validity that make them potential explanatory models even with highly multicollinear datasets. Parsed RELR is a set of SAS macros and can be implemented as stand-alone macros or as an extension node to SAS Enterprise Miner. Please visit www.RiceAnalytics.com for order and installation information.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.

KDnuggets : News : 2008 : n17 : item8

PREVIOUS | NEXT