Top 10 Amazon Books in Data Mining, 2016 Edition
Given the ongoing explosion in interest for all things Data Mining, Data Science, Analytics, Big Data, etc., we have updated our Amazon top books lists from last year. Here are the 10 most popular titles in the Data Mining category.
The recent explosion of interest in data science, data mining, and related disciplines has been mirrored by an explosion in book titles on these same topics. One of the best ways to decide which books could be useful for your career is to look at which books others are reading. This post details the 10 most popular titles in Amazon's Data Mining Books category as of Nov 10, 2016, skipping over repeated titles as well as titles which have been obviously miscategorized and are of no use to our readers.
Note: KDnuggets gets absolutely no royalties from Amazon - this list is presented only to help our readers evaluate interesting books.
1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
Trevor Hastie, Robert Tibshirani, Jerome Friedman
4.1 out of 5 stars (78 reviews)
This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book.
2. Data Science from Scratch: First Principles with Python 1st Edition
4.2 out of 5 stars (65 reviews)
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.
3. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking 1st Edition
Foster Provost, Tom Fawcett
4.6 out of 5 stars (152 reviews)
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.
This book fills the need for a concise and conversational book on the growing field of Data Analytics and Big Data. Easy to read and informative, this lucid book covers everything important, with concrete examples, and invites the reader to join this field. The chapters in the book are organized for a typical one-semester course. The book contains case-lets from real-world stories at the beginning of every chapter.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world’s leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you’ll soon be able to answer some of the most important questions facing you and your organization.
6. Data Smart: Using Data Science to Transform Information into Insight 1st Edition
John W. Foreman
4.7 out of 5 stars (105 reviews)
Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet.
Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.
8. R in Action: Data Analysis and Graphics with R 2nd Edition
4.8 out of 5 stars (33 reviews)
R in Action, Second Edition teaches you how to use the R language by presenting examples relevant to scientific, technical, and business developers. Focusing on practical solutions, the book offers a crash course in statistics, including elegant methods for dealing with messy and incomplete data. You'll also master R's extensive graphical capabilities for exploring and presenting data visually. And this expanded second edition includes new chapters on forecasting, data mining, and dynamic report writing.
9. Doing Data Science: Straight Talk from the Frontline 1st Edition
Cathy O'Neil, Rachel Schutt
4.0 out of 5 stars (50 reviews)
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.
10. Big Data: Principles and best practices of scalable realtime data systems 1st Edition
Nathan Marz, James Warren
4.4 out of 5 stars (33 reviews)
Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.