Research Leaders on Data Mining, Data Science and Big Data key advances, top trends

Research Leaders in Data Science and Big Data reflect on the most important research advances in 2015 and the key trends expected to dominate throughout 2016.

Bing Liu, Professor, UIC; Chair, SIGKDD:

1. In my area of text mining research, deep learning and different embedding methods have been quite popular and made important advances in the past year. Lifelong machine learning is also emerging as an important direction. As we go forward, traditional memory-less learning, which typically runs a learning algorithm on a given data, is no longer sufficient. A learning system should gradually acquire the capability to retain the learned knowledge, make inferences based on the knowledge, and use the knowledge to help future learning and problem solving. That is, it should have the capability to learn as humans do. In terms of text-based systems, I am particularly impressed by advances made in intelligent person assistants and chatterbots such as Siri, Cortana, Google Now, and XiaoIce.
2. No doubt deep learning and other popular methods will continue to make major advances in the coming year. I also believe that self-learning and self-improvement of intelligent personal assistants will be a major direction of research in the next year and for some years to come. These systems will expand their services and become “truly intelligent” in the next few years, and will potentially change our lives profoundly. I further believe that lifelong machine learning in the text mining area will make major advances in both research and applications too. Text data is particularly suited for lifelong learning because of extensive sharing of concepts and relationships across tasks and across domains through the same vocabulary, syntax, and semantics, which enables accumulation of knowledge and transferring of knowledge from task to task and from domain to domain in learning and problem solving.

Prof. Dr. Michael Berthold, Founder and President of AG, makers of the popular KNIME open source data mining and processing platform:

- 2015 was the year of the rebirth of Neural Networks. More layers, more neurons, more power gives a few more percent.

- 2016 will, unfortunately, also be dominated by (Deep) Neural Networks. Lots of promises will be made, small progress will be achieved.

- 2017 (or maybe 2018) will be the year of disappointment when, once again, we realize that neural networks (no matter how big) won't solve all problems...

Too pessimistic? Then - neural network less -:

- 2015 was the year when advanced analytics became main stream. People realize that decisions without consulting your data are just wild guesses.

- 2016 will be the year where we will focus much more on data-and-tool-blending, because in order to analyze heterogeneous data sources one needs to use heterogeneous, powerful tools.

Charu Aggarwal, Distinguished Research Staff Member at the IBM T. J. Watson:

1. It is easier to discuss research advances in terms of broad topics rather than specific research papers. The year 2015 was much of a continuation of 2014 in terms of similar topics being emphasized.

In particular, deep learning, healthcare and internet-of-things continued to dominate in 2015. One could see a clear increase in the number of deep learning papers in various conferences. This is partially a result of the fact that deep learning has proven its mettle by winning several computer vision competitions over the previous three years.

2. I expect advances in deep learning, health-care and internet of things to continue to dominate in 2016. Some of these trends are inter-connected because deep learning methods are used in many applications across the spectrum.

Dimitrios Gunopulos, Professor, University of Athens:

1. 3 papers that I have found interesting in 2015:

Flavio Chierichetti, Alessandro Epasto, Ravi Kumar, Silvio Lattanzi, Vahab S. Mirrokni: Efficient Algorithms for Public-Private Social Networks. ACM KDD 2015

Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, Naren Ramakrishnan: Multi-Task Learning for Spatio-Temporal Event Forecasting. ACM KDD 2015

Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth: From Group to Individual Labels Using Deep Features. ACM KDD 2015

2. Urban data and mobility data analysis; Social networks and search.

Mohammed J. Zaki, Professor of Computer Science at RPI:

1. Reflecting my interest in large-scale data mining, I believe there have been important advances in frameworks and algorithms for distributed data mining and machine learning. These reflect the trend/quest for commoditization of data analytics. While there is a long way to go, these systems make it possible to mine big(ger) data. Examples include Spark MLlib and Petuum for general mining/learning tasks, GraphX for graph-parallel computations, Arabesque for graph pattern mining, various systems to extend R for distributed computing (e.g., HP's Distributed R and Spark.R), and so on.

2. I expect to see more focus on mining rich data types, combining heterogeneous inter-linked data inputs (e.g., applications in smart health, smart cities, connectome, etc.).