Short course: Statistical Learning and Data Mining IV, Washington, DC, Oct 19-20

This new two-day course gives a detailed and modern overview of statistical models used by data scientists for prediction and inference, including sparse models and deep learning.



Trevor Hastie and Robert Tibshirani Short course: STATISTICAL LEARNING AND DATA MINING IV
by Trevor Hastie and
Robert Tibshirani

Stanford University

State-of-the-Art
Statistical Methods
for Data Science
including sparse models and deep learning

Georgetown Conference Center,
Washington DC, Oct 19-20, 2016

This new two-day course gives a detailed and modern overview of statistical models used by data scientists for prediction and inference. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips.

In this course we emphasize the tools useful for tackling modern-day data analysis problems. Many of these are essential building blocks, but we also include techniques at the cutting-edge of technology for handling big-data problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting.

Our topics include:

Linear methods: regression, logistic regression (binary and multiclass), Cox model.

Bootstrap, cross-validation, and permutation methods.

Regularized linear models: ridge, lasso, elastic net. Post-selection inference. Glmnet package in R, and other software.

Trees, random forests, and boosting.

Unsupervised methods: clustering (prototype, hierarchical, spectral,...), principal components and other low-rank methods, sparse decompositions.

Support-vector machines and kernel methods.

Deep learning and neural networks.

Our earlier courses are not a prerequisite for this new course. Although there is overlap with past courses, our new course contains topics not covered by us before. We illustrate many of the methods using examples developed in R.

The material is based on recent papers by the authors and other researchers, as well as our best selling book:

Elements of Statistical Learning: data mining, inference and prediction (2nd Edition) (with J. Friedman, Springer-Verlag, 2009).

The lectures will consist of high-quality projected presentations and discussion. A copy of Elements of Statistical Learning will be given to all attendees, as well as a color booklet containing the course slides in a convenient two-up, double-sided format.

The authors have two other popular books that are also relevant to this course:
  • An Introduction to Statistical Learning, with applications in R (with Gareth James and Daniela Witten, Springer-Verlag, 2013).
  • Statistical Learning with Sparsity: the Lasso and Generalizations (with Martin Wainwright, Chapman and Hall, 2015).
All three books are available for free in pdf form from our websites.

Go to

www-stat.stanford.edu/~hastie/sldm.html

for more information and online registration.