Big Data Winter ahead – unless we change course, warns Michael Jordan

We have to have error bars around all our predictions, says machine learning expert Michael Jordan. Otherwise it's gambling, and too many failed predictions can lead to big disappointment with Big Data - a Big Data Winter.

By Gregory Piatetsky, @kdnuggets, Oct 30, 2014.

Michael Jordan, Prof. UC Berkeley Michael Jordan, a leading expert on Machine Learning, and a professor at UC Berkeley, recently gave an extensive interview in IEEE Spectrum , where he touched on many topics.

I am not much concerned about his critique of people using "brain metaphors" when talking about computing. Of course, current artificial neural networks are very different from one in a human brain. However, the brain provided an inspiration for neural approach, just like the bird flight provided an inspiration and proof of possibility for building heavier-than-air flying machines. Airplanes don't flap their wings, but they fly faster than birds!

However, I share Michael Jordan concern about Big Data winter, due to simple-minded and statistically unsound approaches which will produce too many false positives.

Michael Jordan: And for any particular database, I will find some combination of columns that will predict perfectly any outcome, just by chance alone. If I just look at all the people who have a heart attack and compare them to all the people that don't have a heart attack, and I'm looking for combinations of the columns that predict heart attacks, I will find all kinds of spurious combinations of columns, because there are huge numbers of them.

Monkeys TypingSo it's like having billions of monkeys typing. One of them will write Shakespeare.

We have to have error bars around all our predictions.

That is something that's missing in much of the current machine learning literature.

... if you list all the hypotheses that come out of some analysis of data, some fraction of them will be useful. You just won't know which fraction. ... unless you're actually doing the full-scale engineering statistical analysis to provide some error bars and quantify the errors, it's gambling.

When asked about adverse consequences might if Big Data remain on the trajectory he described, he added
The main one will be a "big-data winter." After a bubble, when people invested and a lot of companies overpromised without providing serious analysis, it will bust. And soon, in a two- to five-year span, people will say, "The whole big-data thing came and went. It died. It was wrong." I am predicting that.

Big Data Winter

It's what happens in these cycles when there is too much hype, i.e., assertions not based on an understanding of what the real problems are or on an understanding that solving the problems will take decades, that we will make steady progress but that we haven't had a major leap in technical progress.

And then there will be a period during which it will be very hard to get resources to do data analysis. The field will continue to go forward, because it's real, and it's needed. But the backlash will hurt a large number of important projects.

This is hardly a novel warning - many experts have been warning of dangers of overfitting (see related posts below), but it is a very serious one.

What do you think? How likely is the Big Data Winter?