Predictions for Data Science in 2017

Our predictions include: 2017 will be the year of Deep Learning (DL) technology, Artificial General Intelligence is still far away, Software and Hardware Progress will accelerate, and AI will have unexpected socio-political implications.

By Francesco Gadaleta,

What should we expect from data science and predictive analytics in the near future?


At we strongly believe 2017 will be the year of Deep Learning (DL) technology. Nowadays, DL is so bound to Artificial Intelligence (AI) that people are mistakenly switching the two terms deliberately as referring to the same thing.

As a matter of fact DL has brought impressive progress in many fields, from artificial vision, to speech recognition, and Natural Language Processing. Many other fields are ready to enjoy the powerful features of DL and intelligent software too.

In order to summarize the major impact of DL in the data science community we can clearly state that

  1. Deep learning has improved state-of-the-art in several domains, computer vision and a speech recognition. NLP and text analytics are following as more data get available
  1. Coding deep learning is becoming easier and easier due to massive improvements on libraries and APIs (eg. Tensor-Flow by Google being the most impressive and easiest of all)
  1. Practitioners start to believe that deep learning is not a technology that is useful just for self driving vehicles but can be applied across domains with knowledge transfer methods that allow to train a network somewhere and predict somewhere else

What follows is our list of predictions in data science for 2017, and the reasons behind these statements. We are glad to share our vision with you and know about your opinion and expectations regarding the world of analytics and artificial intelligence.

Prediction #1: Artificial General Intelligence is not yet a reality

General Artificial Intelligence is not going to happen any time soon, and definitely not in 2017. The only possibility to give AI a chance into the real world is to enhance traditional apps and services with intelligence sprinkled here and there. This will lead to applications and services that slowly solve the same tasks just with a more intelligent approach. Very few might notice the difference however. Another alternative is to train specialized networks that solve very specific tasks. Then find a way to connect these independent units together in order to solve what appears to be a more complex and general task.  We see many specialized machines in the near future, rather than a super intelligence that is good at everything. This strategy comes from many biologic systems that work like that. Even the Romans solved problems the same way and summarized this strategy under the term divide et impera, divide and conquer. Engineers who break up large problems into tasks that can be managed by intelligent software and then combine them all together will have significant advantage with respect to those who try to apply AI massively from the beginning.

Prediction #2: Software and Hardware Progress

History tells us how progress in software has been regulated and dominated by progress in hardware technology. Today there is a pretty diverse scenario in which hardware progress is governed by software. NVIDIA, Intel and Amazon, just to name a few, are switching their product lines towards hardware that is Deep Learning friendly. This trend will continue.

In contrast, software progress will be more focused towards the core algorithms behind deep learning such as Stochastic Gradient Descent (SGD) and Backpropagation.

We expect major improvements in those areas as this is considered the bottleneck of the entire system. Researchers should consider investigating new methods of optimization that are more efficient in general, not only in terms of computation but also energy consumption.

Prediction #3: AI will have unexpected socio-political implications

Our main thoughts are all about socio-political implications of Artificial Intelligence. In one episode of Data Science at Home we mentioned that soon data scientists will disappear as they will be completely automated. We still agree with our previous statement by adding a fundamental condition. We do not think this is going to happen tomorrow due to the fact that human beings are not ready to delegate to any machine some of their critical tasks. If there will be bureaucracy to ask an algorithm to drive a car to the airport, there will be even more bureaucracy to ask the same algorithm to deliver the prognosis for a cancer patient. Humans are not ready for that, not because of lack of technology. Rather the need to decline responsibility to a physical institution/person for their mistakes.

In order to mitigate such an issue we have reasons to believe that AI will be run as a utility, a public service to everyone, leaving some degree of decision power to humans in order to help the system managing the rare exceptions.

As a conclusion, loss of jobs, as stated by many, will not yet be the main socio-political consequence of AI as humans still prefer to keep control of their tasks until a trained algorithm will perform that task with almost-zero errors and definitely cheaper. This can happen soon but not in 365 days. Moreover, it will not necessarily mean that there will be people losing jobs. Many are ignoring the fact that in such a scenario humans can and will focus on other aspects of analytics or just other tasks.

AI can be fooled

To support the lack of perfect accuracy in AI driven tasks Adversarial examples (Papernot et al., 2016; Kurakin et al., 2016) are clearly showing how easily it is to fool intelligent software. The authors forged a set of images that have no visually relevant difference with respect to their original version, and can still fool a neural classifier. A similar scenario in healthcare, finance or logistics, would have dramatic consequences.

The smartphone effect

Producing deep learning based solution is becoming so easy that starting a neural pipeline will be accessible to many. This might lead to the smartphone-effect, when a device entered our daily life until the point to which we depend on.

Prediction #4: Hot hiring market diversifying educational backgrounds: Hadoop is out

According to several external studies the list of top skills required in 2017 should include Hadoop technology. At we do not believe Hadoop is going to be such an essential skill. Google abandoned the MapReduce paradigm more than 6 years ago for a reason. In-memory computing (provided by frameworks like Apache Spark) is definitely killing Hadoop, as those solutions allow one to execute traditional database queries (pre- existing SQL code) on distributed data.

Prediction #5: More and more big data: the end of unsupervised learning

Data becomes more and more available as it is easier to collect and cheaper to store. Big data solutions that were considered quite premature so far will become more a necessity for many. This will bring new products and new hardware solutions for small enterprise and for home computing too. If more and more data is a correct assumption, it will literally kill unsupervised learning which is already quite dead at the moment. We do not expect unsupervised learning to come back in the near future. As data becomes more and more available, all the methods designed to deal with few observations, such as unsupervised learning and neural network pre-tuning will cease to make sense. There are already much better techniques to cope with small datasets, one of which being knowledge transfer.

Prediction #6: A healthcare industry led by data science

Healthcare is not fully enjoying the potential benefits of AI and DL even though there are some interesting cases in which this technology is moving at a slower pace with respect to other domains. Healthcare startups are already using data science to move towards personalized medicine and using artificial intelligence to examine images like x-rays and MRIs to diagnose problems quickly and accurately.

There are major examples of data science used to improve the outcomes of epidemics and predicting patient behaviour. Only in 2015, data scientists helped predict further West Nile virus outbreaks in the United States, with 85% accuracy. Earlier this year, a team of scientists developed a model that can predict the likelihood of bats carrying Ebola.

We expect data science usage in the healthcare industry to grow further in 2017 as healthcare professionals are looking for ways to improve day-to-day needs and save lives without delays. As a matter of fact the ultimate goal is to treat data science and Artificial Intelligence as medical devices.

But, in order to bring the technology that has proved to be effective in domains like social media and finance, we really believe that figures living at the border between healthcare and the technology itself will make a difference. These figures will be essential to bridge the gap between two worlds that really seem to speak very different languages. The major reason why AI has not taken over healthcare yet is because there are a lot of critical tasks that are pretty much surrounded by bureaucracy and responsibilities that are difficult to transfer. We have seen a similar scenario to the one encountered in the self-driving car example where, as previously mentioned, there is not really a technological gap, rather a bureaucratic one.

Who is going to pay the eventual damage inflicted by an AI that goes mistaken and drives over a pedestrian or another car? In healthcare it would be difficult to deal with wrong prognoses. An artificial intelligence that is known to be wrong 5% of the times and that is scaled to millions, should deliver thousands of wrong prognoses and misleading predictions. Are we ready to abandon the idea that there is always someone to blame when things do not work?

Prediction #7: The year of awareness: technology consolidation

We strongly believe that 2017 will be the year of deep learning. This will happen at one condition: practitioners and people from the business should first become aware of such technology. Hence, researchers must educate their business counterparts about the potential benefits of DL and convince that it is indeed a game changer.

This is partially happening with the benchmarks usually provided by the literature. But that will not be sufficient. We do not believe that DL will be massively adopted within a well established domain like healthcare if a neural model has been benchmarked on ImageNet or digit recognition tasks or any other standard benchmark used in published literature.

In order to establish DL in the industry (especially for very well consolidated domains like finance and healthcare) it is essential to show how this technology performs in such domains. And how can researchers accomplish this if they do not get the chance to put DL to the test? As always this is a chicken-and-egg problem that must be overtaken.

Time will tell.

Stay tuned!

Bio: Francesco Gadaleta, Ph.D is a Data Scientist and Machine Learning expert. He worked as clinical data scientist at Gasthuisberg Research Hospital (Belgium) and currently data scientist in the Advanced Analytics Team of Johnson&Johnson.