Interview: Michael Brodie – We Can’t Rely on Machines
Michael Brodie, a leading database researcher, is convinced that Big Data has more potential than the hype suggests, but also more risks.
By Patricia Faller (Editor in Chief ZHAW Datalab).
Michael L. Brodie is a Research Scientist in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology (MIT). With over 40 years of experience he advises startups and is a member of Advisory Boards of national and international research organizations. He is an adjunct professor at the National University of Ireland. For more than 20 years he was Chief Scientist of IT at the US telecom company Verizon, a Fortune 20 company. There he was responsible for advanced technologies, architectures, and methodologies for Information Technology strategies and for guiding industrial scale deployments of emergent technologies.
He is concerned with the Big Picture aspects of information ecosystems including business, economic, social and technical applications. His current research and applied interests include Big Data and Data Science. Brodie holds a PhD in Databases from the University of Toronto and a Doctor of Science from the National University of Ireland. Recently Michael L. Brodie was invited as keynote speaker at the 2nd «Swiss Conference on Data Science» organized by the ZHAW Datalab. The topic of his talk: “The Emerging Discipline of Data Science: Principles and Techniques for Data-Intensive Analysis”.
Everybody is talking about Big Data. Is it more than a hype?
Michael L. Brodie: It is a little bit like at the beginning of the internet. The hype was largely based on people who tried to make money selling their products. But the internet has changed our world in ways the hype couldn’t conceive. So at the moment the hype for Big Data comes- from IBM, SAS, SAP – the large vendors of these solutions. Forecasts of the Big Data Market show a huge market growth from 7.6 billion dollars in 2011 to 84.6 billion dollars in 2026.
So yes, there is a lot of hype?
But I actually think it is far more profound and powerful than most people are conceiving it at the moment. It has already changed a very large number of operating processes in health care, manufacturing, marketing and stock markets. How-ever, it is not as widely used as one might think. Big Data and Big Data Analytics are in their infancy with respect to operational deployment and our understanding of it.
So Big Data isn’t promoted beyond its value?
No, these methods are actually seen as the fourth scientific paradigm, meaning that they have the potential of a completely different and faster way of solving many challenging problems of humanity, of health care and poverty. Gartner, one of the world’s leading information- technology research companies, estimates that 80 percent of all business processes worldwide will change within the next five to ten years, all based on Big Data Analytics. Gartner also predicts that 85 percent of the Fortune 500 will be unable to exploit Big Data in 2015. The much bigger impact will be over the next decades.
Sometimes it seems that there is a blind faith in Big Data.
Right. My experience shows that people who are very excited about Big Data may not be very familiar with forecasts or statistically based prognosis. In order to use statistical techniques to analyse data you have to understand the power and the limits of statistics. And almost every statistical outcome is probabilistic. That means it may happen only within the predicted error bounds and confidence level. When you get an answer you still can’t say what SAP’s stock price will be next Wednesday. You say with a probability of 0.75 the stock price- will be this or that amount of money. So it is always qualified by some sort of error bars. Error bars give a general idea of how precise a measurement is. But 80 percent of the people I interact with and who want to consume Big Data results don’t know what an error bar or probabilistic answer is.
What does this mean from a customer’s point of view?
Most customers of data anal-ysis want to know something like: “Will I sell strawberry Pepsi more than vanilla? Because I have to tell my manufacturing line how to switch.” And you say to them: “Well, with this or that probability you are likely to sell more strawberry in New Jersey over these weeks, and in California you are more likely to sell vanilla over the same period.” So they have to understand that the answers can only be probabilistic. They never get a complete certified answer. Almost every measurement one makes in the world is probabil-istic. That means every computational answer made based on data is also probabilistic. So obviously education is going to be critical. Not only in understand-ing it from a customer’s point of view but also in expressing it from a Data Scientist’s who produces these answers.