Silver BlogThe Best Free Data Science eBooks: 2020 Update

The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.

By Brenda Hali, Marketing Data Specialist


Image by the Author (Brenda Hali)


We are in an ever-advancing industry, and learning resources are unlimited.

Last year I put together a compilation of ebooks that have helped me in my data science learning path and have been recommended by mentors and professors to solve specific projects or deepen concepts.

As I spent time deepening my learning, I discovered new books that I didn’t recommend before or found updates of all books I’ve recommended. All the eBooks are legally for free or in a ‘Pay What you Want' concept with $0 as a minimum.

If you enjoyed a book and you have the resources to do so, I suggest that you look for a way to support the author by buying the printed version, supporting them on Patreon, or Buying them a Coffee.

Let’s keep quality education content available for the masses.

Disclaimer: Python and sometimes R are my go-to programming languages and that is why most of the books are based on these. If you have recommendations of other books in other languages, please share them on the comments or send me a tweet and I will add them.


Probability and Statistics

OpenIntro Statistics (2019) by David DiezMine Cetinkaya-RundelChristopher Barr, and OpenIntro

Description: A complete foundation for Statistics, also serving as a foundation for Data Science. OpenIntro Statistics offers a traditional introduction to statistics at the college level. This textbook is widely used at the college level and offers an exceptional and accessible introduction for students from community colleges to the Ivy League.

Introduction to Probability — 2019'z Official book for Harvard’s Stats 110 by Joseph K. Blitzstein and Jessica Hwang

Description: This book provides essential language and tools for understanding statistics, randomness, and uncertainty. The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC). Additional application areas explored include genetics, medicine, computer science, and information theory.

The authors present the material in an accessible style and motivate concepts using real-world examples. Be prepared, it is a big book!.

Also, check out their great probability cheat sheet here.

Probabilistic Programming & Bayesian Methods for Hackers (2020) by Cam Davidson-Pilon

Description: Bayesian Methods for Hackers is designed as an introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course, as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with a less mathematical background or one who is not interested in mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.

Check their amazing Github using TensorFlow repo here.

Practical statistics for Data Scientist (2017) by Peter Bruce and Andrew Bruce

Description: This book is aimed at the data scientist with some familiarity with the R programming language and with some prior (perhaps spotty or ephemeral)exposure to statistics. Both of us came to the world of data science from the world of statistics, so we have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a discipline is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia of an ocean liner.



R programming for Data Science by Roger d. Peng

Description: This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science.

Exploratory Data Analysis with R by Roger d. Peng

Description: This book teaches you to use R to effectively visualize and explore complex datasets. Exploratory data analysis is a key part of the data science process because it allows you to sharpen your question and refine your modeling strategies. This book is based on the industry-leading Johns Hopkins Data Science Specialization.

Data Science at the Command Line (2020) by Jeroen Janssens

Description: This book Obtain data from websites, APIs, databases, and spreadsheets

  • Perform scrub operations on text, CSV, HTML/XML, and JSON
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow
  • Create reusable command-line tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines
  • Model data with dimensionality reduction, clustering, regression, and classification algorithms

Python 3 101 (2019 — updated) by Michael Driscoll

Description: Learn how to program with Python 3 from beginning to end. Python 101 starts off with the fundamentals of Python and then builds onto what you’ve learned from there. The audience of this book is primarily people who have programmed in the past but want to learn Python. This book covers a fair amount of intermediate level material in addition to the beginner material.

Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper

Description: This book is a practical introduction to NLP. You will learn by example, write real programs, and grasp the value of being able to test an idea through implementation. If you haven’t learned already, this book will teach you programming. Unlike other programming books, we provide extensive illustrations and exercises from NLP. The approach we have taken is also principled, in that we cover the theoretical underpinnings and don’t shy away from careful linguistic and computational analysis. We have tried to be pragmatic in striking a balance between theory and application, identifying the connections and the tensions. Finally, we recognize that you won’t get through this unless it is also pleasurable, so we have tried to include many applications and examples that are interesting and entertaining, sometimes whimsical.

Mining of Massive Datasets (2019) by Jure Leskovec (Stanford University), Anand Rajaraman(Rocketship Ventures), and Jeffrey D. Ullman (Stanford University)

Description: This book focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of their examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.

Machine Learning Yearning (2016) by Andrew Ng

Description: AI is transforming numerous industries. Machine Learning Yearning, teaches you how to structure Machine Learning projects.
This book is focused not on teaching you ML algorithms, but on how to make ML algorithms work. After reading Machine Learning Yearning, you will be able to:

  • Prioritize the most promising directions for an AI project
  • Diagnose errors in a machine learning system
  • Build ML in complex settings, such as mismatched training/ test sets
  • Set up an ML project to compare to and/or surpass human-level performance
  • Know when and how to apply end-to-end learning, transfer learning, and multi-task learning.


Leading a Data Science Team

Executive Data Science (2018) by Brian Caffo, Roger D. Peng, and Jeffrey Leek

Description: This book teaches you how to assemble and lead a data science enterprise so that your organization can move towards extracting information from big data.

Is there another ebook that MUST be on this list? Share with me on the comments or send me a tweet

Bio: Brenda Hali (LinkedIn) is a Marketing Data Specialist based in Washington, D.C. She is passionate about women's inclusion in technology and data.

Original. Reposted with permission.