Short course: Statistical Learning and Data Mining II: tools for tall and wide data (KDnuggets News 08:02, item 12, Courses)

KDnuggets : News : 2008 : n02 : item12

Courses

From: Robert Tibshirani
Date: 10 Jan 2008 08:53:55 -0800
Subject: Short course: Statistical Learning and Data Mining II: tools for tall and wide data

Trevor Hastie and Robert Tibshirani, Stanford University

Sheraton Hotel
Palo Alto, CA
March 6-7, 2008
www-stat.stanford.edu/~hastie/sldm.html

This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips.

This course is the third in a series, and follows our popular past offerings "Modern Regression and Classification", and "Statistical Learning and Data Mining".

The two earlier courses are not a prerequisite for this new course.

In this course we emphasize the tools useful for tackling modern-day data analysis problems. These include gradient boosting, SVMs and kernel methods, random forests, lasso and LARS, ridge regression and GAMs, supervised principal components, and cross-validation. We also present some interesting case studies in a variety of application areas.

This course focusses on both "tall" data ( N>p where N=#cases, p=#features) and "wide" data (p>N). Typical examples of tall data are credit risk and churn prediction, and email spam filtering. Topics include linear and ridge regression, lasso, and LARS, support vector machines, random forests and boosting. We give in-depth discussion of validation, cross-validation and test set issues.

For wide data, typical examples are gene expression and protein mass spectrometry data, and image analysis. Topics include clustering and data visualization, false discovery rates and SAM, regularized logistic regression and discriminant analysis, supervised and unsupervised principal components, support vector machines and the kernel trick, and the careful model selection strategies.

The material is based on recent papers by the authors and other researchers, as well as the best selling book:

Elements of Statistical Learning: data mining, inference and prediction

Hastie, Tibshirani & Friedman, Springer-Verlag, 2001

http://www-stat.stanford.edu/ElemStatLearn/

Prof. Robert Tibshirani
Depts of Health Research and Policy, and Statistics
Stanford Univ
Stanford CA 94305
http://www-stat.stanford.edu/~tibs

KDnuggets : News : 2008 : n02 : item12

PREVIOUS | NEXT