Research Leaders on Data Mining, Data Science and Big Data key advances, top trends

Research Leaders in Data Science and Big Data reflect on the most important research advances in 2015 and the key trends expected to dominate throughout 2016.

By Gregory Piatetsky @kdnuggets, and Anmol Rajpurohit @hey_anmol

deep_learningContinuing our practice of yearly review of Data Science landscape through feedback from research leaders, we reached out to a number of Data Mining, Data Science and KDD research leaders last month with the following two questions:

1. What were the most important research advances in Data Science / Data Mining / Machine Learning in 2015?

2. Which Data Science / Data Mining / Machine Learning trends do you expect to dominate in 2016?

The most popular research advances in 2015 were: Deep Learning (particularly in speech and vision), distributed machine learning, and commoditization of analytics.

The most popular trends for 2016 were: Deep Learning, self-learning systems, Big Data, and Internet-of-Things (IoT).

Here are the individual answers:

Qiang Yang, Professor, HKUST and the head of Noah's Ark Lab: 

1. I think that there are two. 

First, an important development in our field in 2015 is the realization that Deep Learning can be used to scale up the reinforcement learning problems, and beat human performance in Atari games [DeepMind's Nature article, 2015].  This work cleverly bypasses the problem explosive state space and demonstrated the ability to solve the complex real time planning problem.
A second development is the realization that using one example, a learning system can also acquire the ability to generate intelligent behavior that mimics that of a human’s.  This one-shot learning style allows us to get closer to human-like learning, and potentially expands the scope of applications of data mining and machine learning.

Human-level concept learning through probabilistic program induction,
Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum
Science, 11 Dec 2015, VOL 350 ISSUE 6266

2. I think more cognitively motivated learning algorithms and their applications will emerge as we march into 2016.  These abilities include computer-human dialog systems with long and short term memory, ability to understand text and image documents much deeper, and abilities to carry out sophisticated planning and reasoning tasks.

Padhraic SmythProfessor of Computer Science, UCI and Director of UCI Data Science Initiative:

1. Significant advances in the application of deep learning for image recognition/computer vision problems.

2. Deep learning in general, with more research and applications using recurrent neural network architectures.

Pedro Domingos, Professor, U. of Washington:

1. 2015 was once again dominated by progress in deep learning (e.g., further substantial improvements in object recognition and detection, etc.).

2. The main event of 2016 is that progress in deep learning will start to flatten out, barring the emergence of major new directions (i.e., the techniques that have propelled progress in recent years - convnets, backprop, LSTMs, etc. - will start to run out of steam).

Ingo Mierswa, Founder & CTO, RapidMiner:

1. Distributed machine learning, most notably the research in deep learning.  I have to admit that the theory is less fascinating than many other research results of the past decades.  But it is still impressive to see the complexity of problems which can now be solved thanks to the massive computing horsepower.  I am sure we will look back to deep learning and will compare this to how Deep Blue won against Kasparov: Deep Blue was not actually smarter, but the additional compute power was enough to outperform this chess legend.

2. "Hardcore" machine learning meets democratization.  Those are the two mega trends in data science and I expect that both trends will now actually meet each other.  I expect that many of the new research results will bubble up as open source libraries and will find new applications TensorFlowcreated by developer data scientists.  This trend has been massively accelerated by the innovative open source nature of Hadoop.  Distributed computing will become the standard for machine learning (see above).  Google's TensorFlow, which got a bit of attention, is only one of the many deep learning toolkits out there and some others are even more advanced.  But at the same time the ubiquitous need for machine learning across all functions and industries will further increase the demand of "data science for anybody".  It will be exciting to see how well designed software can bridge the skills gap so that analysts without a deep computer science or math background can still create value with those advanced machine learning techniques.

Philip S. Yu, UIC Distinguished Professor and Wexler Chair in Information Technology:

1. I don't see any major research advances. Deep learning certainly has been widely popular.

2. I expect big data related topics and applications will continue to dominate.

Eamonn Keogh, Professor at UCR, a world leader in time series analysis:

1. Deep Learning: I for one miss the simplicity and elegant of the nearest neighbor algorithm and decision trees etc. But it is hard to argue with the recent amazing results of deep learning.

2. It is clear deep learning will continue to have a huge impact. I hope to see more work explaining why it works, and by implication, when it (currently) does not work.

Continued on next page ...