KDnuggets News 02:12, item 4, News

KDnuggets : News : 2002 : n12 : item4 (previous | next)

Features

From: Gregory Piatetsky-Shapiro
Date: June 17, 2002
Subject: Jerry Friedman wins SIGKDD 2002 Innovation award

ACM SIGKDD Innovation Award is the premiere technical award in Data Mining and Knowledge Discovery field, a "Nobel Prize" of data mining, and is given to one individual or one group who has made significant technical innovations in the field of Data Mining and Knowledge Discovery that have been transferred to practice in significant ways, or that have significantly influenced direction of research and development in the field.

The winner of the SIGKDD 2002 Innovation Award is Jerome H. Friedman. Professor, Department of Statistics, Stanford University, and Leader, Computation Research Group, Stanford Linear Accelerator Center.

Jerry Friedman has contributed a remarkable array of topics and methodologies to data mining and machine learning during the last 25 years. In 1977, as leader of the numerical methods group at the Stanford Linear Accelerator Center (SLAC), he coauthored several algorithms for speeding up nearest-neighbor classifiers. In the following seven years, he collaborated with Leo Breiman, Richard Olshen, and Charles Stone to produce a landmark work in decision tree methodology, "Classification and Regression Trees" (1984), and released the commercial product CART(R). This work introduced the gini, twoing, and ordered twoing splitting rules, cost-complexity pruning, oblique splitters, the use of a misclassification cost matrix to influence the growing of trees, and the application of cross validation to decision trees. Part of this work was pre-figured in his 1977 paper on decison tree induction.

During this time, he also introduced Projection Pursuit Regression (PPR) for predictive modeling and interactive data visualization. Although PPR has had only a modest following, it was arguably the first instance of a feed-forward, single hidden layer, back propagation neural network with a remarkable twist: the activation function is itself estimated as part of the learning process and the number of hidden units to use is determined dynamically in a stagewise process (1974, 1981, 1987).

In 1991 Jerry extended recursive partitioning ideas to regression in his Multivariate Adaptive Regression Splines (MARS(tm)). In MARS, linear and logistic regressions are built up through searching for breakpoints in the predictor space. Variable selection, missing value handling, and variable transformation are all automated. MARS can be described as the first truly successful stepwise regression methodology. Richard DeVeaux, in a comparative study of MARS and Neural Networks (1993), found that MARS frequently outperformed neural networks in engineering applications and trained hundreds of times faster; similar findings have been reported by others more recently. In 1994, Jerry extended the MARS methodology to permit a dynamic spline version of discriminant analysis.

In the early 1990s Jerry focused on interactive data mining methods, introducing the Patient Rule Induction Method (PRIM, 1997), which he described as "Bump Hunting in High Dimensional Data." PRIM searches for data regions containing unusually high concentrations (or values) of a target variable and allows the analyst to interactively modify its rules and stretch or shrink the "boxes" defining the regions in question. PRIM has become one of the analytical methods of choice at Australia's CSIRO, a government-funded R&D and consulting lab with extensive data mining activity.

More recently, Jerry has focused on the study of boosting, both to understand why it is so successful and to develop improved boosting methodology. In a key article co-authored with Stanford statisticians Trevor Hastie and Rob Tibshirani, Jerry showed that boosting is a form of additive logistic regression and he identified the objective function that boosting seeks to maximize. He followed up with Stochastic Gradient Boosting, which generalizes boosting to a very large class of problems, eliminating the tendency of classical boosting to seriously mistrack when presented with mislabeled target data. In stochastic gradient boosting, small trees, very slow learning rates, mandatory sampling from the training data, and redefinition of the target variable are all combined to produce a remarkably fast and robust learner, capable of handling both regression and classification even under fairly adverse circumstances of dirty data. The methodology, called "MART" for Multiple Additive Regression Trees, includes visualization to convey the relationships between the target and predictors; it has been released commercially as TreeNet(tm).

Finally, Jerry has written a series of expository articles and a substantial book seeking to explain data mining to experienced data analysts and to relate machine learning to statistical foundations. Taken together, this list of new methodology, including CART, MARS, PRIM, PPR, and Gradient Boosting, constitutes one of the broadest ranges of contributions by any one person in the field.

On behalf of SIGKDD 2002 Awards committee,

Gregory Piatetsky-Shapiro (KDnuggets), Chair
Rakesh Agrawal (IBM)
Daryl Pregibon (AT&T)
Jiawei Han (U. Illinois Urbana Champaign)
Foster Provost (New York University)
Thomas G. Dietterich (Oregon State University)

References.

Friedman, J. H. (1977) "A Recursive Partitioning Decision Rule for Nonparametri Classification". IEEE Transactions on Computers, April, 404-408.
De Veaux R.D., Psichogios D.C., and Ungar L.H. (1993), A Comparison of Two Nonparametric Estimation Schemes: MARS and Neutral Networks, Computers Chemical Engineering, Vol.17, No.8.
Friedman, J. H (1994) "An Overview of Computational Learning and Function Approximation" In: From Statistics to Neural Networks. Theory and Pattern Recognition Applications. (Cherkassy, Friedman, Wechsler, eds.) Springer-Verlag 1.
Friedman, J. H (1991). " Multivariate Adaptive Regression Splines" (with discussion). Annals of Statistics 19, 1.
Friedman, J. H. (1989) "Regularized Discriminant Analysis." J. Amer. Statist. Assoc. 84, 165
Friedman, J. H. (1987) "Exploratory Projection Pursuit." J. Amer. Statist. Assoc. 82, 249
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone., C. J. (1983). "Classification and Regression Trees." Wadsworth
Friedman, J. H. and Stuetzle, W. (1981). "Projection Pursuit Regression." J. Amer. Statist. Assoc. 76, 817.
Friedman, J. H., Bentely, J. and Finkel, R. A. (1977). "An Algorithm for Finding Best Matches in Logarithmic Expected Time" ACM Trans. Math. Software 3, 209.
Fisherkeller, M. A., Friedman, J. H. and Tukey J. W. (1975). "PRIM-9: An Interactive Multidimensional Data Display and Analysis System." Proceedings: ACM Pacific 75.
Friedman, J. H. and Tukey, J.W. (1974). "A Projection Pursuit Algorithm for Exploratory Data Analysis." IEEE Trans. Computers. C-23, 881.
Friedman, J. H., Hastie, T. and Tibshirani, R. "Additive Logistic Regression: a Statistical View of Boosting." (Aug. 1998)
Friedman, J. H. "Data Mining and Statistics: What's the Connection?" (Nov. 1997b).
Friedman, J. H. and Fisher, N. I. "Bump Hunting in High-Dimensional Data." (Oct. 1997a)*.
Friedman, J. H. "Stochastic Gradient Boosting ." (March 1999b)
Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." (Feb. 1999a)
Friedman, J. H. "Flexible Metric Nearest Neighbor Classification." Nov. 1994

KDnuggets : News : 2002 : n12 : item4 (previous | next)