SoftwareFrom: Clemens van Brunschot c.vanbrunschot@chello.nlDate: Fri, 27 Apr 2001 05:39:01 +0200 Subject: Optimal Binning Macro in SAS Base Clemens van Brunschot (Netherlands) has written a SAS macro (Optibin) for preparing nominal, ordinal or continuous variables for predictive statistical modelling with a binary target variable. The location of the file (including documentation) is: http://members.brabant.chello.nl/~c.vanbrunschot/macros.html General description: a SAS macro for dummification or linearisation of nominal, ordinal or continuous predictor variables. In linearisation the dataset is augmented with a variable where (ranges of) original values of a predictor variable (bins) are replaced by mean values on a target variable. The result is a linear relationship between the new predictor variable and the target variable. In this macro this is done after merging (bins of) original values using a chi square test. This merging is aimed at bivariately optimising the relation between the predictor and a target variable. In dummification a dummy variable (0,1) is created for each of the resulting bins. The original predictor values are optionally ranked into quantiles at the start of the process. A weight variable is allowed (however, not used for ranking nor for the chi square test). And a set of missing values on the predictor may be defined. The process is controlled by a number of parameters declared in the macro invocation. Parameters are either obligatory or there is a default available. The process may be applied to a selection of the dataset. There is an option for printing iteration information. A graph is produced for a visual inspection of the variable created. The process ends in a print of information that shows the relationship between the original and the created predictor. This print should also be used to write out the model built with the linearised variables. If desirable, a set of dummy variables instead of one linearised variable is produced. (One from a set of dummy variables will have to be left out of predictive modelling.) More than 1 predictor at a time can easily be handled, as long as they require the same parameter setting. However, the merging of bins is done bivariately with the target variable. e-mail: vanBrunschot@bigfoot.com |
Copyright © 2001 KDnuggets. Subscribe to KDnuggets News!