The Best Free Data Science eBooks: 2020 Update
The author has updated their list of best free data science books for 2020. Read on to see what books you should grab.
By Brenda Hali, Marketing Data Specialist
We are in an everadvancing industry, and learning resources are unlimited.
Last year I put together a compilation of ebooks that have helped me in my data science learning path and have been recommended by mentors and professors to solve specific projects or deepen concepts.
As I spent time deepening my learning, I discovered new books that I didn’t recommend before or found updates of all books I’ve recommended. All the eBooks are legally for free or in a ‘Pay What you Want' concept with $0 as a minimum.
If you enjoyed a book and you have the resources to do so, I suggest that you look for a way to support the author by buying the printed version, supporting them on Patreon, or Buying them a Coffee.
Let’s keep quality education content available for the masses.
Disclaimer: Python and sometimes R are my goto programming languages and that is why most of the books are based on these. If you have recommendations of other books in other languages, please share them on the comments or send me a tweet and I will add them.
Probability and Statistics
OpenIntro Statistics (2019) by David Diez, Mine CetinkayaRundel, Christopher Barr, and OpenIntro
Description: A complete foundation for Statistics, also serving as a foundation for Data Science. OpenIntro Statistics offers a traditional introduction to statistics at the college level. This textbook is widely used at the college level and offers an exceptional and accessible introduction for students from community colleges to the Ivy League.
Introduction to Probability — 2019'z Official book for Harvard’s Stats 110 by Joseph K. Blitzstein and Jessica Hwang
Description: This book provides essential language and tools for understanding statistics, randomness, and uncertainty. The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC). Additional application areas explored include genetics, medicine, computer science, and information theory.
The authors present the material in an accessible style and motivate concepts using realworld examples. Be prepared, it is a big book!.
Also, check out their great probability cheat sheet here.
Probabilistic Programming & Bayesian Methods for Hackers (2020) by Cam DavidsonPilon
Description: Bayesian Methods for Hackers is designed as an introduction to Bayesian inference from a computational/understandingfirst, and mathematicssecond, point of view. Of course, as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with a less mathematical background or one who is not interested in mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.
Check their amazing Github using TensorFlow repo here.
Practical statistics for Data Scientist (2017) by Peter Bruce and Andrew Bruce
Description: This book is aimed at the data scientist with some familiarity with the R programming language and with some prior (perhaps spotty or ephemeral)exposure to statistics. Both of us came to the world of data science from the world of statistics, so we have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a discipline is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia of an ocean liner.
Programming
R programming for Data Science by Roger d. Peng
Description: This book brings the fundamentals of R programming to you, using the same material developed as part of the industryleading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science.
Exploratory Data Analysis with R by Roger d. Peng
Description: This book teaches you to use R to effectively visualize and explore complex datasets. Exploratory data analysis is a key part of the data science process because it allows you to sharpen your question and refine your modeling strategies. This book is based on the industryleading Johns Hopkins Data Science Specialization.
Data Science at the Command Line (2020) by Jeroen Janssens
Description: This book Obtain data from websites, APIs, databases, and spreadsheets
 Perform scrub operations on text, CSV, HTML/XML, and JSON
 Explore data, compute descriptive statistics, and create visualizations
 Manage your data science workflow
 Create reusable commandline tools from oneliners and existing Python or R code
 Parallelize and distribute dataintensive pipelines
 Model data with dimensionality reduction, clustering, regression, and classification algorithms
Python 3 101 (2019 — updated) by Michael Driscoll
Description: Learn how to program with Python 3 from beginning to end. Python 101 starts off with the fundamentals of Python and then builds onto what you’ve learned from there. The audience of this book is primarily people who have programmed in the past but want to learn Python. This book covers a fair amount of intermediate level material in addition to the beginner material.
Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper
Description: This book is a practical introduction to NLP. You will learn by example, write real programs, and grasp the value of being able to test an idea through implementation. If you haven’t learned already, this book will teach you programming. Unlike other programming books, we provide extensive illustrations and exercises from NLP. The approach we have taken is also principled, in that we cover the theoretical underpinnings and don’t shy away from careful linguistic and computational analysis. We have tried to be pragmatic in striking a balance between theory and application, identifying the connections and the tensions. Finally, we recognize that you won’t get through this unless it is also pleasurable, so we have tried to include many applications and examples that are interesting and entertaining, sometimes whimsical.
Mining of Massive Datasets (2019) by Jure Leskovec (Stanford University), Anand Rajaraman(Rocketship Ventures), and Jeffrey D. Ullman (Stanford University)
Description: This book focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of their examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machinelearning engine of some sort.
Machine Learning Yearning (2016) by Andrew Ng
Description: AI is transforming numerous industries. Machine Learning Yearning, teaches you how to structure Machine Learning projects.
This book is focused not on teaching you ML algorithms, but on how to make ML algorithms work. After reading Machine Learning Yearning, you will be able to:
 Prioritize the most promising directions for an AI project
 Diagnose errors in a machine learning system
 Build ML in complex settings, such as mismatched training/ test sets
 Set up an ML project to compare to and/or surpass humanlevel performance
 Know when and how to apply endtoend learning, transfer learning, and multitask learning.
Leading a Data Science Team
Executive Data Science (2018) by Brian Caffo, Roger D. Peng, and Jeffrey Leek
Description: This book teaches you how to assemble and lead a data science enterprise so that your organization can move towards extracting information from big data.
Is there another ebook that MUST be on this list? Share with me on the comments or send me a tweet https://twitter.com/brendahali
Bio: Brenda Hali (LinkedIn) is a Marketing Data Specialist based in Washington, D.C. She is passionate about women's inclusion in technology and data.
Original. Reposted with permission.
Related:
 The Elements of Statistical Learning: The Free eBook
 Understanding Machine Learning: The Free eBook
 Python For Everybody: The Free eBook
Top Stories Past 30 Days

