Platinum Blog15 Free Data Science, Machine Learning & Statistics eBooks for 2021

We present a curated list of 15 free eBooks compiled in a single location to close out the year.

At KDnuggets, we have brought a number of free eBooks to our readers this past year. Among other articles highlighting such materials, I have written a series of posts since the pandemic erupted, in the case that more people spending more time at home may result in more time for reading. Of course, a packed reading schedule is obviously not what the past nine months has had in store for everyone, but for those who could spare a moment here or there, we hope that some of the eBooks we have shared during this tough time may have been useful.


As I go back through those books of which I wrote reviews in 2020, I have decided to close out the year by compiling a list of 15 in a single place. If you originally missed out on a few or all of them in totality, this is your chance to catch up on some reading.

With this we randomly present (again) 15 top notch free eBooks to start 2021 with, alongside a text selection from my original review.

Data Science and Machine Learning: Mathematical and Statistical Methods, by D.P. Kroese, Z.I. Botev, T. Taimre & R. Vaisman

Data Science and Machine Learning: Mathematical and Statistical Methods is a practically-oriented text, with a focus on doing data science and implementing machine learning models using Python. It does a good job of explaining relevant theory and introducing the necessary math as needed, which results in very nice pacing for a practical book.

Text Mining for R: A Tidy Approach, by Julia Silge and David Robinson

Text Mining for R: A Tidy Approach is code-heavy and seems to explain concepts well. The focus is on practical implementation, which should be of no surprise given the book's title, and to an R novice it seems to do a very good job. I have not followed along to the entire book, but I did read the first 2 chapters and feel that I got out of it what was intended.

Causal Inference: What If, by Miguel Hernán and Jamie Robins

Causal inference is a complex, encompassing topic, but the authors of this book have done their best to condense what they see as the most important fundamental aspects into ~300 pages of text. With few accessible books dedicated to the subject, this one may be your go-to choice if you are interested in building your own conceptual foundation.

Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence, by Yoni Nazarathy and Hayden Klok

The book gets into statistical concepts in the next chapter, and from that point onward the concepts build upon one another, leading up to more advanced topics such as statistical inference, confidence intervals, hypothesis testing, linear regression, machine learning, and more.

This is the resource I've been waiting for to effectively learn Julia for data science the way I have been wanting to learn it. I hope you're as excited as I am to get moving on your journey.

Foundations of Data Science, by Avrim Blum, John Hopcroft, and Ravindran Kannan

In many contemporary books, data science has been reduced to a series of programming tools which, if mastered, promise to do the data science for you. There seems to be less emphasis on the underlying concepts and theory divorced from code. This book is a good example of the opposite to this trend, a book which will undoubtedly arm you with the theoretical knowledge necessary to approach a career in data science with a strong set of foundations.

Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David

Once the possible shock of math-heavy theory wears off, you will find thorough treatments of topics from bias-variance trade-off to linear regression to model validation strategies to model boosting to kernel methods to prediction problems and beyond. And the benefit of such a thorough treatment is that your understanding will go deeper than just grasping the abstract intuition.

Natural Language Processing with Python, Steven Bird, Ewan Klein and Edward Loper

The book starts off slow — describing NLP, how Python can be used to perform some NLP programming tasks, how to access natural language content to process — and moves on to bigger concepts, both conceptually (NLP) programmatically (Python). Soon it gets to categorization, text classification, information extraction, and other topics more often thought of as classic NLP. After getting the basics of NLP with this book, you can move on to more modern and cutting edge techniques, perhaps through the use of materials such as some of Stanford's free courses.

Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD, by Jeremy Howard and Sylvain Gugger

The book is unusual in that it's taught "top down". We teach almost everything through real examples. As we build out those examples, we go deeper and deeper, and we'll show you how to make your projects better and better. This means that you'll be gradually learning all the theoretical foundations you need, in context, in such a way that you'll see why it matters and how it works. We've spent years building tools and teaching methods that make previously complex topics very simple.
—Jeremy Howard

Python For Everybody, by Charles R. Severance

448 ratings of the book with an average of 4.6 out of 5 should tell you that many others have also found Python for Everybody useful. The consensus seems to be that the book quickly covers concepts, does so in an easily understandable manner, and jumps right into the corresponding code.

Automated Machine Learning: Methods, Systems, Challenges, edited by Frank Hutter, Lars Kotthoff and Joaquin Vanschoren

If you have little to no understanding of what automated machine learning is in practice, don't worry. The book starts off with a solid introduction to the topic, and lays out explicitly what you can expect chapter by chapter, which is important in a book comprised of independent separate chapters. After this, in first section of the book, you get right in to reading about the important topics of contemporary AutoML, and be confident of this since the book was put together in 2019. The next section is a walkthrough of a half dozen tools for implementing these AutoML concepts. The last section is an analysis of the AutoML Challenge Series that existed for a few years during 2015 to 2018, the time that interest in automated approaches to machine learning seemed to explode.

Deep Learning, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

This is a bottom-up, theory-heavy treatise on deep learning. This is not a book full of code and corresponding comments, or a surface-level hand wavy overview of neural networks. This is an in-depth mathematics-based explanation of the field.

Dive Into Deep Learning, by Aston Zhang, Zachary C. Lipton, Mu Li and Alexander J. Smola

What makes Dive into Deep Learning (D2K) unique is that we went so far with the idea of *learning by doing* that the entire book itself consists of runnable code. We tried to combine the best aspects of a textbook (clarity and math) with the best aspects of hands-on tutorials (practical skills, reference code, implementation tricks, and intuition). Each chapter section teaches a single key idea through multiple modalities, interweaving prose, math, and a self-contained implementation that can easily be grabbed and modified to give your projects a running start. We think this approach is essential for teaching deep learning because so much of the core knowledge in deep learning is derived from experimentation (vs. first principles).
—Zachary Lipton

Mathematics for Machine Learning, by Marc Peter Deisenroth, A Aldo Faisal and Cheng Soon Ong

The first part of the book covers pure mathematical concepts, without getting into machine learning at all. The second part turns its attention to applying these newfound maths skills to machine learning problems. Depending on your desires, you could take either a top-down or bottom-up approach to leanring both machine learning and its underlying maths, or pick one part the other on which to focus.

The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

All this is to say that the authors, who are also researchers and instructors, have an approach to how they are conveying their expertise. Their method seems to follow a logical ordered approach to what, and when, readers should be learning. However, individual chapters stand on their own as well, and so picking up the book and heading straight to the chapter on model inferences, for example, will work perfectly well, so long as you already have an understanding of what comes in the book before it.

An Introduction to Statistical Learning, with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

An Introduction to Statistical Learning, with Applications in R (ISLR) can be considered a less advanced treatment of the topics found in another classic of the genre written by some of the same authors, The Elements of Statistical Learning. Another major difference between these 2 titles, beyond the level of depth of the material covered, is that ISLR introduces these topics alongside practical implementations in a programming language, in this case R.