Big Data: Main Developments in 2016 and Key Trends in 2017

As 2016 comes to a close and we prepare for a new year, KDnuggets has solicited opinions from numerous Big Data experts as to the most important developments of 2016 and their 2017 key trend predictions.

At KDnuggets, we try to keep our finger on the pulse of main events and developments in industry, academia, and technology. We also do our best to look forward to key trends on the horizon.

We recently asked some of the leading experts in Big Data, Data Science, Artificial Intelligence, and Machine Learning for their opinion on the most important developments of 2016 and key trends they 2017.

Big Data experts

In the first of 3 such related posts, we bring you the collected responses to the question:

"What were the main Big Data related events in 2016 and what key trends do you see in 2017?"

(Note: posts with top experts opinions on Data Science/Predictive Analytics and AI/Machine Learning will be published later in December)

We generally asked participants to keep their responses to within 100 words or so, but were amenable to longer answers if the situation warranted. Without further delay, here is what we found.

Craig Brown PhD, Social Influencer; Big Data; Data Science; Database Technology; Technology Mentor; Author; Youth Mentor

In 2016, there were a huge increase in data volume that emerged, as represented by the surge in Data Cloud Services like Amazon Web Services, Rackspace and Azure, among other providers. This surge of Data volume will continue to increase in 2017. Also in 2017 I believe there will be a surge of projects that will include Machine Learning, Cognitive Computing and predictive analytics, however, data privacy challenges will continue to persist in 2017. Data Scientist and Chief Data Officer/Architect positions will become more utilized and more clearly defined in 2017. Real time data streaming and more sophisticated data pipelines will help redefine big data into more categories like actionable data and fast data. Over all Big data is still a growing arena. 2017 will certainly provide more than what was experienced in 2016. The data volume will be the driver and the tools that are provided will be the passenger.

James Kobielus, Big Data Evangelist, IBM Software

Hadoop declined more rapidly in 2016 from the big-data landscape than I expected. MapReduce, HBase, and even HDFS are less relevant to data scientists than ever.

The dominant 2017 trend will be programmers' rush to gain data science skills in order to grow their careers. The hottest projects in data science in 2017 will focus on streaming media analyticsembedded deep learning, cognitive IoT, cognitive chatbots, embodied robotic cognition, autonomous vehicles, computer vision, and autocaptioning. Also, we’re going to see mass deployment of a new generation of optimized neural chipsets, GPUs, and other high-performance cognitive computing architectures in 2017.

Douglas Laney, VP and Distinguished Analyst, Gartner

The biggest big data event of 2016 was people ceasing to talk about big data. Big data now 'just is'. The focus has become more business-oriented, with Gartner client discussions being about managing, measuring and monetizing 'information assets.'

2017 will be the year of trying to sort-out information rights, privileges, responsibilities, ownership and sovereignty--especially for IoT generated data. The accounting profession, the courts and the insurance industry are thoroughly confused. They continue to stumble toward recognizing information as an asset on par with traditional balance sheet assets. But things are changing as institutional investors and equity analysts are starting to recognize and reward companies that are more infosavvy. It will also be the year of corporate land grabs for information properties, i.e. data brokers and other information aggregators.

Yves Mulkers, Blogging on 'All Things Data' @ 7wData, Maintaining The Data Landscape

2016, felt like Big Data was losing the buzz as compared to a few years ago. As Big Data infrastructure, software and methodology grew towards each other, it shows the analytics solutions build in a Big Data world are maturing and become more available, and are not only reserved to the few front runners.. Building onto these integrations and moving up the maturity curve, we see self-service and automation gaining a lot of attention. Next to the ease of use of the analytics and Big Data solutions, where as you needed the skillset of a communicational, technical wiz kid PhD, it's time to move to the next phase of solutions. We also saw more interest in machine learning, artificial intelligence, virtual reality, augmented reality, IoT and containerized solutions. The boundaries of Moore's law keep being pushed.

Big Data Predictions Wordcloud

Mark van Rijmenam, Founder of Datafloq and author of Think Bigger

2016 was an exciting year for big data, as finally, Big data is no longer a hype or a buzzword. This means that organisations are actually developing real world solutions and applications with big data analytics that have a big impact on their bottom line.

2017 will see a continuation of this trend and with technology increasingly becoming smarter, we will see new applications being developed. Deep learning and artificial intelligence will become smarter and will be applied more often by organisations, since the required computing power and available data is no longer the problem to develop intelligent applications. Therefore, 2017 will be an exciting year in terms of big data with smarter applications, more intelligent connected products and, unfortunately, as a result also increased data security breaches.

See here for more on Mark's 2017 predictions.

Ronald van Loon (LinkedIn), Director Adversitement

This year has seen a change in the world of Big Data, from departmental to multi-disciplinary customer centric data driven organizational structures. We have experienced the start of widespread adoption of IoT data application in key sectors. Additionally, there has been substantial growth in machine learning applications as it is supported by large cloud platforms.

Next year, we will see growing adoption of artificial intelligence, explosive growth of IoT applications and machine learning will be widely applied. The technology is ready, customer appetite for experience improvement is strong and the number of connected devices will grow from 10 billion to 34 billion by 2020.

Jeff Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus). His interests include database theory, database integration, data mining, and education using the information infrastructure

The EU has established new privacy laws regarding how data is used and how models are built, to take effect in January 2018.  The effect is still unclear, but may rule out "unexplainable" models such as what comes from deep learning.  Companies are struggling with both what data they will be permitted to use, and what methods for analysis can be used.  For example, will Google be able to explain why it classified an email as spam with the usual "it looked like other messages that people reported as spam," or will they have to say something like "we think it is spam because the sender claims to be a Nigerian prince"?

Matei Zaharia, Chief Technologist at Databricks, creator of Apache Spark

Here are some trends I've seen from Apache Spark and Databricks:

1) Public cloud is starting to be the dominant way to deploy big data. In the Apache Spark user survey that Databricks ran this summer, the percent of users using Spark on the public cloud (61%) was higher than the percent using Hadoop YARN (36%), and furthermore, the share of cloud users grew from 2015 (51% to 61%) while the share of YARN decreased (40% to 36%). One reason is that cloud storage such as Amazon S3 is generally more cost-effective, more reliable and easier to manage than HDFS.

2) Apache Spark 2.0 was released in July, with significant performance improvements to take advantage of modern hardware in Spark SQL and Dataframes. Anecdotally, we are already seeing fast uptake of 2.0, with around 40% of the clusters on Databricks using it.