KDnuggets Big Data Science Summer Reading List
Here is our summer book list, covering the Big Data, Data Science, Power of Predictive Analytics, Learning and Optimization, and fun science fiction appreciated even more by those who understand NP-completeness and emacs.
By Gregory Piatetsky, Jul 10, 2013. comments
I was inspired by ZDnet Big data summer reading list and came up with my own summer reading list for Analytics, Big Data, and Data Science, and some nerd science fiction.
Here are my recommendations.
What are you going to read this summer? Please comment below.
Data Science and its Relationship to
Big Data and Data-Driven Decisionmaking paper, by F. Provost and T. Fawcett from the inaugural issue of Big Data Journal (which is freely available).
... ... there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, by Eric Siegel.
It has been called "The Freakonomics of big data," and "the definitive book of this industry" that is "an operating manual for 21st century life."
Here is my capsule summary:
"Written in a lively language, full of great quotes, real-world examples, and case studies, it is a pleasure to read. The more technical audience will enjoy chapters on The Ensemble Effect and uplift modeling-both very hot trends. I highly recommend this book!"
Big Data: A Revolution that Will Transform How We Live, Work and Think, by Viktor Mayer-Schonberger and Kenneth Cukier.
It is a comprehensive and very readable overview of the benefits and risks associated with big data, mainly addressed to non-technical people. This book has received many reviews, and some reviewers complained that Mayer-Schonberger and Cukier were too optimistic about the power of Big Data.
Gil Press asked the authors if they are cheerleaders for big data , and Ken Cukier replied
"We are messengers of big data, not its evangelists"
LIONbook: Learning and Intelligent Optimization, by Roberto Battiti and Mauro Brunato, freely available on the web, chapter by chapter.
The LION: "Learning and Intelligent OptimizatioN" approach is the combination of learning from data and optimization applied to solve complex and dynamic problems.
This book was written by the developers of LionSolver software.
Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do by Kaiser Fung, a professional statistician with expertise in marketing and advertising analytics. From the author blog:
I analyze claims made in the media that are supported by analyses of data. I show you how I dissect these claims to decide whether they are credible, or they are bogus.
The ability to analyze and interpret data analyses will be a critical skill in the world of Big Data. So far, the conversation around Big Data is focused around the collection and processing of mountains of data. The real challenge of Big Data is the proliferation of data analyses: it will be a confusing world of claims and counterclaims.
Mining of Massive Datasets Book, by A. Rajaraman, J. Ullman.
For those who are serious about learning data science, I recommend this excellent book by top Stanford researchers which covers Data Mining, Map-Reduce, Finding similar items, Mining Data Streams, and much more.
By agreement with the publisher, you can still download a draft version free from
Charles Stross The Laundry Files series.
Bob Howard is a computer-hacker desk jockey, who has more than enough trouble keeping up with the endless paperwork he has to do on a daily basis. He should never be called on to do anything remotely heroic. But for some reason, he is.
Great book for lighter reading for my fellow nerds!
Start with the first book, The Atrocity Archives.