More Free Data Mining, Data Science Books and Resources

More free resources and online books by leading authors about data mining, data science, machine learning, predictive analytics and statistics.



The list below based on the list compiled by Pedro Martins, but we added the book authors and year, sorted alphabetically by title, fixed spelling, and removed the links that did not work. The descriptions are by Pedro.

  1. data-mining-booksAn Introduction to Data Science by Jeffrey Stanton, Robert De Graaf, 2013.
    An introductory level resource developed by Syracuse University
  2. An Introduction to Statistical Learning: with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, 2013.
    Overview of statistical learning based on large datasets of information. The exploratory techniques of the data are discussed using the R programming language.
  3. A Programmer’s Guide to Data Mining by Ron Zacharski, 2012.
    A guide through data mining concepts in a programming point of view. It provides several hands-on problems to practice and test the subjects taught on this online book.
  4. Bayesian Reasoning and Machine Learning by David Barber, 2012.
    focusing on applying it to machine learning algorithms and processes. It is a hands-on resource, great to absorb all the knowledge in the book.
  5. Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners by Jared Dean, 2014.
    On this resource the reality of big data is explored, and its benefits, from the marketing point of view. It also explains how to storage these kind of data and algorithms to process it, based on data mining and machine learning.
  6. Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, May 2014.
    A great cover of the data mining exploratory algorithms and machine learning processes. These explanations are complemented by some statistical analysis.
  7. Data Mining and Business Analytics with R by Johannes Ledolter, 2013.
    Another R based book describing all processes and implementations to explore, transform and store information. It also focus on the concept of Business Analytics.
  8. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J.A. Berry, Gordon S. Linoff, 2004.
    A data mining book oriented specifically to marketing and business management. With great case studies in order to understand how to apply these techniques on the real world.
  9. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery by Graham Williams, 2011.
    The objective of this book is to provide you lots of information on data manipulation. It focus on the Rattle toolkit and the R language to demonstrate the implementation of these techniques.
  10. Gaussian Processes for Machine Learning by Carl Edward Rasmussen and Christopher K. I. Williams, 2006.
    This is a theoretical book approaching learning algorithms based on probabilistic Gaussian processes. It’s about supervised learning problems, describing models and solutions related to machine learning.
  11. Inductive Logic Programming Techniques and Applications by Nada Lavrac and Saso Dzeroski, 1994.
    An old book about inductive logic programming with great theoretical and practical information, referencing some important tools.
  12. Information Theory, Inference, and Learning Algorithms by David MacKay, 2009.
    An interesting approach to information theory merged with the inference and learning concepts. This book taught a lot of data mining techniques making the relationship with information theory.
  13. Introduction to Machine Learning by Amnon Shashua, 2008.
    A simple, yet very important book, to introduce everyone to the machine learning subject.
  14. Machine Learning by Abdelhamid Mellouk and Abdennacer Chebira, 2009.
    A very complete book about the machine learning subject approaching several specific, and very useful techniques.
  15. Machine Learning, Neural and Statistical Classification by Donald Michie, David Spiegelhalter, Charles Taylor, 1994.
    A good old book about statistical methodology, learning techniques and another important issues related to machine learning.
  16. Machine Learning - Wikipedia Guide
    A great resource provided by Wikipedia assembling a lot of machine learning in a simple, yet very useful and complete guide.
  17. Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014.
    The focus of this book is provide the necessary tools and knowledge to manage, manipulate and consume large chunks of information into databases.
  18. Modeling With Data by Ben Klemens, 2008.
    This book focus some processes to solve analytical problems applied to data. In particular explains you the theory to create tools for exploring big datasets of information.
  19. Pattern Recognition and Machine Learning (Information Science and Statistics) by Bishop, Christopher, 2006.
    This book presents you a lot of pattern recognition stuff based on the Bayesian networks perspective. Many machine learning concepts are approached and exemplified.
  20. Probabilistic Programming & Bayesian Methods for Hackers by Cam Davidson-Pilon, 2013.
    A book about Bayesian networks that provide capabilities to solve very complex problems. Also discusses programming implementations on the Python language.
  21. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani and Jerome Friedman, 2009.
    This is a conceptual book in terms of data mining and prediction with a statistical point of view. Covers many machine learning subjects too.
  22. Reinforcement Learning: An introduction by Richard S. Sutton and Andrew G. Barto, 2015.
    A solid approach to the reinforcement learning thematic providing solution methods. It describes also some very important case studies.
  23. Think Bayes, Bayesian Statistics Made Simple by Allen B. Downey, 2013.
    A Python programming language approach to the Bayesian statistical methods, where these techniques are applied to solve real-world problems and simulations.

Bio: Pedro Martins is one half of Data On Focus team, fascinated with all issues related to IT. He is in mid-twenties, from Portugal, has an informatics engineering background, and passion for data mining and data science.

Related: