10 hot ideas for learning from data. This course focuses on both "tall" data and "wide" data, and gives a detailed overview of statistical models for data mining, inference and prediction.
State-of-the-Art Statistical Methods for Data Analysis:
Ten Hot Ideas for Learning from Data
Trevor Hastie and Robert Tibshirani, Stanford University
Harvard Conference Center, Boston
September 22 and 23, 2011
This two-day course gives a detailed overview of statistical models
for data mining, inference and prediction. With the rapid
developments in internet technology, genomics, financial risk
modeling, and other high-tech industries, we rely increasingly more on
data analysis and statistical models to exploit the vast amounts of
data at our fingertips.
This course is the third in a series, and follows our popular past
offerings "Modern Regression and Classification", and "Statistical Learning and Data Mining".
The two earlier courses are not a prerequisite for this new course.
In this course we emphasize the tools useful for tackling modern-day
data analysis problems. These include gradient boosting, SVMs and
kernel methods, random forests, lasso and LARS, ridge regression and
GAMs, supervised principal components, and cross-validation. We also
present some interesting case studies in a variety of application areas.
This course focuses on both "tall" data ( N>p where N=#cases,
p=#features) and "wide" data (p>N). Typical examples of tall data are
credit risk and churn prediction, and email spam filtering. Topics
include linear and ridge regression, lasso, and LARS, support vector
machines, random forests and boosting. We give in-depth discussion of
validation, cross-validation and test set issues.
For wide data, typical examples are gene expression and protein mass
spectrometry data, and data from signals and images. Topics include
clustering and data visualization, false discovery rates and SAM,
regularized logistic regression and discriminant analysis, supervised
and unsupervised principal components, support vector machines and the
kernel trick, and the careful use of model selection strategies.
The material is based on recent papers by the authors and other
researchers, as well as the best selling book:
Elements of Statistical Learning: data mining, inference and prediction,
Hastie, Tibshirani & Friedman, Springer-Verlag, 2008 (2nd edition)
A copy of this book will be given to all attendees.
The lectures will consist of video-projected presentations and discussion.
Go to the site
for more information and registration details.
|