KDnuggets News 01:09, item 7, Software

KDnuggets : News : 2001 : n09 : item7 (previous | next)

Software

From: Clemens van Brunschot c.vanbrunschot@chello.nl
Date: Fri, 27 Apr 2001 05:39:01 +0200
Subject: Optimal Binning Macro in SAS Base

Clemens van Brunschot (Netherlands) has written a SAS macro
(Optibin) for preparing nominal, ordinal or continuous variables for
predictive statistical modelling with a binary target variable.
The location of the file (including documentation) is:
http://members.brabant.chello.nl/~c.vanbrunschot/macros.html

General description: a SAS macro for dummification or linearisation of
nominal, ordinal or continuous predictor variables. In linearisation
the dataset is augmented with a variable where (ranges of) original
values of a predictor variable (bins) are replaced by mean values
on a target variable. The result is a linear relationship between the
new predictor variable and the target variable. In this macro this is
done after merging (bins of) original values using a chi square test.
This merging is aimed at bivariately optimising the relation between
the predictor and a target variable. In dummification a dummy variable
(0,1) is created for each of the resulting bins.

The original predictor values are optionally ranked into quantiles at
the start of the process. A weight variable is allowed (however, not
used for ranking nor for the chi square test). And a set of missing
values on the predictor may be defined.

The process is controlled by a number of parameters declared in the
macro invocation. Parameters are either obligatory or there is a
default available. The process may be applied to a selection of the
dataset. There is an option for printing iteration information.
A graph is produced for a visual inspection of the variable created.
The process ends in a print of information that shows the relationship
between the original and the created predictor. This print should also
be used to write out the model built with the linearised variables.
If desirable, a set of dummy variables instead of one linearised
variable is produced. (One from a set of dummy variables will have to
be left out of predictive modelling.)

More than 1 predictor at a time can easily be handled, as long as they
require the same parameter setting. However, the merging of bins is
done bivariately with the target variable.

e-mail: vanBrunschot@bigfoot.com

KDnuggets : News : 2001 : n09 : item7 (previous | next)