Gregory Piatetsky-Shapiro, Mar 8, 2012.
Amanda Brandon: You recently said on Twitter that detective Sherlock Holmes would have been "a good data scientist." What skills do you think he possessed that anyone can use to discover the unknowns in data?
Gregory Piatetsky-Shapiro: I read most of Arthur Conan Doyle books on Sherlock Holmes as a kid, and loved his powers of deduction. Holmes was a keen observer of facts and had good logic. He also was on the cutting edge of the science in his time; so today's Sherlock Holmes would be analyzing social networks of the suspects (and perhaps hacking them) in addition to looking for fingerprints. Finally, he had great intuition and knew how to reject wrong hypotheses. No matter how initially appealing they appeared, if these hypotheses were not supported by facts, he rejected them.
AB: Holmes was known for seeing "hidden" information and identifying opportunities and threats. In the most recent film, the viewer was given a glimpse of his method of calculating the next move before it happened in sort of a flash-forward scene. Do you think today's predictive analytics technologies and data visualizations give us this "extra sense" that Holmes had? What does this do for companies in competitive situations?
GPS: Of course, analytics is much easier when we predict behavior of inanimate objects - asteroids, viruses, etc. Predicting human behavior is much more difficult. Predictive analytics today gives us the ability to predict future behavior, but for non-trivial predictions they are only accurate in aggregate, not for individuals.
For example, say Verizon's monthly customer churn rate is 2%. That means that every month 2 out of 100 customers switch. With analytics, we can select a group of customers where churning is 5-7 times more frequent. The difference is that the expected churn rate in the analytics-selected list can be 10%-14% versus 2% in a random list. Of the 100 selected customers we expect only 10-14 to churn.
for more details, see my presentation Estimating Campaign Benefits and Modeling Lift
This has enormous business value because these customers can be contacted much more efficiently and personalized offers can be made to retain them. However, this example shows the limits of analytics in predicting human behavior "in aggregate" - it is far from perfect. But it does NOT need to be perfect to be useful.
Analytics can be used when there are many thousands of similar customers. When we have a single person, statistical methods are not relevant and need to be augmented with knowledge and rules. Ultimately, a combination of artificial intelligence and statistical methods can be a powerful predictor.
Here is the rest of the interview