Gregory PS: HBR edited and shortened my original article to fit their size limitations. Here is Big Data Hype (and Reality), Harvard Business Review, by Gregory Piatetsky-Shapiro, October 18, 2012 and below is the expanded version.
Big Data Hype and Reality - Mars Rover vs Mars Candy
Big data and Data Science are receiving tremendous attention this year. Just recently "Data Scientist" was proclaimed the sexiest job of the 21st century. Big Data popularity is growing exponentially, and it is the subject of countless publications (including HBR Big Data section) and perhaps too many meetings and conferences. Google searches for "Big Data" and job ads for "Big Data" on aggregator site Indeed.com are growing exponentially.
No doubt, the Big Data phenomenon is real and potentially it can create a second Industrial Revolution. However, what can we actually expect from Big Data?
A major part of the Big Data promise is the ability to do current activities better by allowing better understanding and prediction - improving upon the previous generation predictions from analytics and data mining methods which did not have access to Big Data.
Can Big Data improve prediction?
Some activities which are governed by physical laws like gravity can be predicted to an amazing degree. Mars Rover Curiosity performed a fantastically complex landing and ended only 1.5 mi from the target after a 350 Million miles journey - a testament to human ingenuity and an ability to predict behaviour of space objects very precisely.
But many, perhaps most of the Big Data efforts are directed towards predicting consumer behavior. Can Big Data predict who will buy "Mars" candy?
Let's examine prediction results in 3 of the most common consumer applications: movie recommendations, telephone churn prediction, and web ads.
Movie recommendations. As a company that thrives when people consume more content, Netflix routinely serves up personalized recommendations to customers based on their feedback on films they've already viewed. This is a prediction challenge; Netflix must venture an informed guess that, if someone gave a certain rating to movie a, they will rate movie b similarly.
in 2007 Netflix launched a competition to improve on the Cinematch algorithm it had developed over many years. It released a record-large (for 2007) dataset, with about 480,000 anonymized users, 17,770 movies, and user/movie ratings ranging from 1 to 5 (stars). Before the competition, the error of Netflix's own algorithm was about 0.95 (using a root-mean-square error, or RMSE, measure), meaning that its predictions tended to be off by almost a full "star." The Netflix Prize of $1 million would go to the first algorithm to reduce that error by just 10%, to about 0.86.
In just two weeks, several teams have beaten the Netflix algorithm, although by very small amounts, but after that, progress was surprisingly slow.
Netflix Price Competition Progress
It took about three years before the BellKor's Pragmatic Chaos team managed to win the prize with a score of 0.8567 RMSE. The winning algorithm was a very complex ensemble of many different approaches - so complex that it was never implemented by Netflix. With three years of effort by some of the world's best data mining scientists, the average prediction of how a viewer would rate a film improved by less than 0.1 star.
Customer attrition. Now consider the bane of wireless service providers: the churn in their customer bases. If predictive analytics drawing on big data could accurately point to who in particular was about to jump ship, direct marketing dollars could be efficiently deployed to intervene, perhaps by offering those wavering customers new benefits or discounts. Analysts measure how accurate the list of potential churners is by using a measure called "lift." Let's say, for example, that a wireless provider has a churn rate of 2% per month. If an algorithm can learn indicators of customer defection, and generate a list of the subscribers most likely to leave, and 8% of those subsequently do leave, then this list has a lift of 4 (because the method produced a list with four times more defectors than a random sampling would have). Such a list would be very valuable, given the costs of the marketing and inducements it would save. But still, it is 92% wrong. With the benefit of big data, will marketers get much better prediction accuracy?
A study [pdf] that Brij Masand and I conducted would suggest the answer is no. We looked at some 30 different churn-modeling efforts in banking and telecom, and surprisingly, although the efforts used different data and different modeling algorithms, they had very similar lift curves. The lists of top 1% likely defectors had a typical lift of around 9-11. Lists of top 10% defectors all had a lift of about 3-4. Very similar lift curves have been reported in other work. (See here and here.) All this suggests a limiting factor to prediction accuracy for consumer behavior such as churn.