An Introduction to Statistical Learning: The Free eBook

This week's free eBook is a classic of data science, An Introduction to Statistical Learning, with Applications in R. If interested in picking up elementary statistical learning concepts, and learning how to implement them in R, this book is for you.

After taking a week off, here's another free eBook offering to add to your collection.

This time, let's check out another classic of the genre, An Introduction to Statistical Learning, with Applications in R, written by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. The book, a staple of statistical learning texts, is accessible to readers of all levels, and can be read without much of an existing foundational knowledge in the area.




From the book's website:

This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

An Introduction to Statistical Learning, with Applications in R (ISLR) can be considered a less advanced treatment of the topics found in another classic of the genre written by some of the same authors, The Elements of Statistical Learning. Another major difference between these 2 titles, beyond the level of depth of the material covered, is that ISLR introduces these topics alongside practical implementations in a programming language, in this case R.

The book's preface explicitly addresses the relationship between these 2 texts, as well as potential readership:

We consider ESL to be an important companion for professionals (with graduate degrees in statistics, machine learning, or related fields) who need to understand the technical details behind statistical learning approaches. However, the community of users of statistical learning techniques has expanded to include individuals with a wider range of interests and backgrounds. Therefore, we believe that there is now a place for a less technical and more accessible version of ESL.

The book's table of contents is as follows:

  1. Introduction
  2. Statistical Learning
  3. Linear Regression
  4. Classification
  5. Resampling Methods
  6. Linear Model Selection and Regularization
  7. Moving Beyond Linearity
  8. Tree-Based Methods
  9. Support Vector Machines
  10. Unsupervised Learning

There are lots of books available, including free ones, on the ample theory involved in data science and machine (and statistical) learning. It should be apparent from the website and book excerpts and table of contents above (and perhaps even the title) that this book focuses on the practical.

If you have some idea of the theoretical concepts related to the topics in the table of contents, ISLR is especially helpful. Already have a good understanding of classification concepts, but want to implement them using R? This book's for you. Want to learn about implementing linear models in R? This book's for you. Interested in effectively implement support vector machines using R? Again, this book's for you.

But don't take my word for it! Some reviews of and reactions to this book from influential readers:

"ISL makes modern methods accessible to a wide audience without requiring a background in Statistics or Computer Science. The authors give precise, practical explanations of what methods are available, and when to use them, including explicit R code. Anyone who wants to intelligently analyze complex data should own this book."

—Larry Wasserman, Professor, Department of Statistics and Department of Machine Learning, CMU.

"It’s thorough, lively, written at level appropriate for undergraduates and usable by nonexperts. It’s chock full of interesting examples of how modern predictive machine learning algorithms work (and don’t work) in a variety of settings."

—Matthew Richey, The American Mathematical Monthly, Vol. 123, No. 7 (August-September 2016).

Also, note that, while the book's exercises are in R, Giannis Tolios has pointed out the following on Facebook:

This book is a great introduction to the theoretical aspect of machine learning. In case you are a Python developer, and are deterred by the use of R, you should reconsider, as R is only used for the practical examples at the end of each chapter. Furthermore, there are Python versions of those examples in the following Github repository:

You can access a PDF here. Code for the labs in the book are available here.

Best of luck with the latest free eBook in our growing collection. Next week will bring another.