My Brief Guide to Big Data and Predictive Analytics for non-experts

My brief guide to Big Data and Predictive Analytics for non-experts suggests key books, films, and websites to learn more.

By Gregory Piatetsky, @kdnuggets.

Last October, The Guardian newspaper asked to me contribute to a feature called "The Experts' Guide to the 21st Century", where each expert would direct readers to 4-5 key books, films, or websites they should investigate in their field. My topic was Big Data and Predictive Analytics.

I submitted my writeup in October and their editor told she liked it very much. However, perhaps other experts did not respond as quickly. The feature was supposed to run in November, then December, then January, and a few days ago I learned it was canceled. Surprisingly, The Guardian even promised to send some payment for "spiked" contribution.

Since they will not publish, I am publishing my contribution here. Let me know what you think in comments below.

My Brief Guide to Big Data and Predictive Analytics

Big Data Big Data is both an over-hyped buzzword and a real trend, reflecting the rapidly growing digitization of our world, and the amazing, and sometimes scary implications. However, Big Data by itself is just numbers - what makes it so powerful is Predictive Analytics (also called Data Mining or Data Science) - the ability to model our world, predict events, and make data-driven decisions, with accuracy approaching and sometimes even exceeding our human abilities.

Here are a few books, movies, and websites related to Big Data, Data Science, and Data Mining, from more popular to more technical.

1. Big Data: A Revolution That Will Transform How We Live, Work, and Think, by Victor Mayer-Schoenberger, Kenneth Cukier.

This is a very good, high-level, although sometimes overly enthusiastic explanation of the impact of Big Data on our world.

2. The Signal and the Noise: Why So Many Predictions Fail - but Some Don't., by Nate Silver.

This book give a great explanation of how and where predictive analytics works well and why it is so easy to make bad predictions. Nate Silver became famous after his near perfect predictions of US Presidential election results in 2008 and 2012, but not everything is so predictable - Nate had rather mediocre results in Oscar predictions.

3. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, by Eric Siegel.

Eric gives a more advanced look into the world of predictive analytics in action, including how one's location can be predicted, the infamous Target case of predicting teen pregnancy before her father knew, Netflix prediction of movie ratings by viewers, and more. More advanced readers will benefit from a chapter on the important topic of Uplift or Persuasion modeling, very effectively used by Obama campaign in 2012.

Deep Learning 4. Deep Learning is the revolutionary machine learning method that has been achieving superhuman levels of performance, especially in image recognition.

Here are some useful websites/links for Deep Learning for more advanced readers


Minority Report 5. The flip side of Big Data is the erosion of privacy. We leave so many digital trails, it is hard to remain private and anonymous. Many of those issue were well raised in an excellent 2002 movie by Steven Spielberg: Minority Report.

Its uncanny vision of surveillance society with "precogs" who predict crime ahead of time and department stores that recognize people and push ads to them is actually already happening today.

6. Finally, just for fun, you can read Charles Stross Sci-Fi novel The Rhesus Chart, where data mining plays a key role in uncovering vampires in London. Here is Chapter One (free).