Silver BlogBig Data: Main Developments in 2017 and Key Trends in 2018

As we bid farewell to one year and look to ring in another, KDnuggets has solicited opinions from numerous Big Data experts as to the most important developments of 2017 and their 2018 key trend predictions.

At KDnuggets, we try to keep our finger on the pulse of main events and developments in industry, academia, and technology. We also do our best to look forward to key trends on the horizon.

To close out 2017, we recently asked some of the leading experts in Big Data, Data Science, Artificial Intelligence, and Machine Learning for their opinion on the most important developments of 2017 and key trends they 2018. This post, the first in this series of such year-end wrap-ups, considers what happened in Big Data this year, and what may be on the horizon for 2018.

Big Data wordcloud

Specifically, we asked experts in this area:

"What were the main Big Data related developments in 2017, and what key trends do you see in 2018?"

We solicited responses from numerous individuals, and asked them to keep their answers to under approximately 200 words, though we were not overly strict and allowed interesting responses to go longer.

As a quick review, last year's trends and predictions centered on the major themes of:

  • Security and privacy
  • The cloud's increasing role in data management
  • A tempering of expectations and hype
  • The Internet of Things' role in generating data

There's no denying that the therm Big Data is no longer what it used to be; note that the term disappeared from the Gartner curve in 2015. We now all just assume and understand that our everyday data is huge. There is, however, still value in treating Big Data as an entity or a concept which needs to be properly managed, an entity which is distinct from much smaller repositories of data in all sorts of ways.

To see what developments are recognized as the year's most important, and find out where our experts think Big Data is headed in 2018, read the contributions below.

A final note: We would not be able to present these posts without the gracious participation of the experts upon whom we have called. While not everyone we requested was able to participate, we are grateful for those who were. Enjoy their insights.

Marcus Borba is a Data & Analytics Expert, and Founder and Primary Consultant of Borba Consulting.

In 2017, Big Data is definitely no longer a buzzword, and we are increasingly less likely to use the term big data, we just call data. The companies are shifting from the big data departmental approach to a business-driven data approach, focusing on agility in the use of big data analytics capabilities, using it to drive initial and also long-term business value.

Looking forward to 2018, will be the year of growth of convergence between big data with others technologies. With machine learning is being increasingly applied, artificial intelligence becoming smarter, the fastest growth in IoT usage and the increase adoption of a cloud-first strategy for big data analytics will allow more and more smart technological capabilities be incorporated in the most varied types of enterprise solutions, products and services, enabling people and companies to take advantage of them.

Craig Brown, PhD is a Social Influencer and expert in Big Data, Data Science, Database Technology. He is a Technology Mentor; Author; Youth Mentor; Technical Coach.

In 2017, there were a few areas that received more limelight than I would have anticipated. The IoT era got off to an early start. Artificial Intelligence and Machine Learning both also received quite a bit of traction and Data Science started to gain more momentum. Based on new data platform releases in 2017 there has been a new trend made towards hybrid data management and hybrid cloud. 2017 has turned out to be a year of strategy and reassessment. Big Data is still very much a part of the strategy but with less emphasis on Hadoop and more emphasis on data management, data visualization and hybrid cloud.

I see a few areas becoming the focus in 2018. Big Data projects will resume with more realistic USE CASES and less failures. Data Streaming will take off as the data platforms continue to mature. There will more emphasis on NoSQL in 2018 and it will finally be established as a realistic solution for both Structured and Unstructured data including TRANSACTION Data. Software development will gain the spotlight as these data trends start advancing the need for new functionality with regards to data pipelines . 2018 will be the beginning of Hadoop and NoSQL (Data Platform) adoption.

Meta S. Brown is the author of Data Mining for Dummies, and President of A4A Brown Inc.

The stand-out Big Data stories of 2017 include:

  • Behavioral targeting behind Trump 2016 presidential campaign. But, the same technology failed two other candidates and Clinton won the popular vote. Oh, and Trump’s staff says its analytics firm lied.
  • Paradise Papers. A consortium of journalists took on the biggest-ever data leak with the help of Big Data technology, revealing tax avoidance strategies of multinational corporations and the super-rich. The impact was worldwide publicity, public backlash and the resignation of one influential businessman, likely the first of many to come.
  • Murder Accountability Project. Journalist Thomas Hargrove wants to save lives. He wondered if he could teach a computer to recognize murders that were part of a series. Turns out, he could.

These stories lead to these Big Data analytics lessons for 2018:

  • Neither massive data nor fancy calculations have magical powers.
  • Analytics magic, in the form of actionable new information, evolves from cooperation among data owners, technologists and subject matter experts. It’s disciplined process and commitment to collaboration that assure useful results.

Vasant Dhar is a Professor at NYU, the Chief Editor of "Big Data," and Founder at SCT Capital Management.

2017 was a big year for big data. There are several broad new themes that emerged regarding the impacts of big data on society. One big development was the recognition of fake news and the misuse of social media platforms, which is the subject of the December 2017 special issue of Big Data on “Computational Propaganda.” Unregulated digital platforms pose a grave risk to liberal democratic societies, so there is a pressing need for research aimed at understanding their use for propaganda and how to mitigate the risks that they pose. The December 2017 issue of Big Data provides the first set of academic articles on this important subject. I would expect to see a lot more research in this area going forward.

There is also considerable interest in fairness, bias, and ethical considerations in conducting “discrimination aware” data science, which was the subject of the June 2017 Big Data issue on “Social and Technical Tradeoffs.” I expect to see a lot more research in this arena going forward that examines the tradeoffs between maximizing objectives like accuracy and social objectives like a “fairness” tradeoff that requires a nuanced formulation of the overall objective function and a critical examination of the data and the assumptions or biases that might be embedded in it.

On the technical side, we continue to see the emergence of novel algorithms, especially in the deep learning space, fueled by the massive datasets available in areas of perception. As systems become better at seeing, reading, and handling unstructured data in general, and as recognition accuracy increases, we should expect to see rapid progress in autonomous vehicles, language processing, and the interpretation of unstructured data in general. I would not be surprised to see the emergence of intelligent systems that interpret and translate video images for learning algorithms across a range of applications. Sports, for example, are already becoming more data driven and we are likely to see the emergence of tools and systems for coaching at the team and individual levels. Big Data is planning a special issue in sports next year. Virtual reality is another area where handling full motion video data will lead to novel applications in a number of areas.

Finally, the natural and health sciences are seeing a huge interest in big data and machine learning. Many physicists have embraced the new methods and even contributing to the development of methods for interpreting the gobs of data from astronomy or the particle level in a new way. Similarly, there’s a lot of activity in medical imaging linking such data to medical conditions, where such relationships were not previously imagined. I came across some research, for example, that posits a relationship between the condition of the eyes and health. This is an arena in which there are probably large numbers of interesting relationships to be discovered between data and health.

In general I see acceleration in the use of big data in 2018 across the sciences, business, and government. In the sciences, we are seeing the use of data for theory development. In business we are seeing new efficiencies created by the cloud and predictive analytics being deployed to improve decision making and efficiency. Government and regulations are focused on security, stability of critical systems and better governance. 2018 will be a year of considerable progress across these areas.

Tamara Dull is Emerging Tech Evangelist at SAS, and Director of Emerging Technologies, SAS Best Practices.

In recent years, we've cracked the nut of what big data is and how to store and process it faster and cheaper than ever before. In 2017, however, we watched the big data story get even bigger - even though the term itself started to fade. We began to get a solid grasp on what the Internet of Things (IoT) is all about and understand how (big) data is its lifeblood. We began to get more serious about how to engage the machines - via machine learning (ML) and artificial intelligence (AI) - to help us make sense of all this data. And we began to realize that the sci-fi of yesterday - i.e., IoT, ML, and AI - is now in our kitchens and back pockets. We call her Alexa and Siri.

Where are we headed in 2018?
We'll stay plenty busy on our current data-fueled IoT/ML/AI path, but we'll also work on cracking the next big nut: the "where" nut. Generating data is easy. The hard part is figuring out where to process, analyze, and store it - both short-term and long-term. We have a lot of options these days: on the device, on-premises, in the cloud, on the edge, or some combination thereof. And then once the data is stored, how are we going to protect and secure it? We still have a long way to go. It's going to be another busy year.

Bill Inmon is a Best-Selling Author, and Founder, Chairman, and CEO of Forest Rim Technology, LLC.

From the beginning the amount of hype that accompanied Big Data was over the top. The expectations were set at unprecedented levels. Outlandish promises were made to top management. Organizations spent huge amounts of money on the proposition that there was gold just waiting to be mined using Big Data.

That was yesterday.

And predictably Big Data travelled up the Gartner Group hype curve to dizzying heights. And equally predictably as Big Data was employed, Big Data headed for the Gartner Trough of Disappointment that all technologies go through on the Gartner Hype curve.

In 2017 Big Data began to emerge from the Trough of Disappointment. Developers began to think about new development with Big Data. Once upon a time the question was – what can we build with Big Data?

In 2017 the question turned to – what business value can we satisfy with Big Data? In doing so, Big Data – quite predictability - starts to emerge from Gartner’s ubiquitous Trough of Disappointment.

In 2017 developers started taking a much more rational, much more mature attitude toward the deployment of Big Data.

James Kobielus is Lead Analyst for Data Science, Deep Learning, and Application Development at SiliconANGLE Media, Inc.

What were the main Big Data related developments in 2017?

In 2017, Hadoop began to show its age, being rapidly marginalized in more organizations' data-lake architectures as users moved away from HDFS toward alternative data stores (in-memory, object, file, graph, multi-model, key-value, etc.) that are better suited to specific analytic workloads. Also, the entire big-data market began to shift toward streaming/edge cloud architectures in which the more data storage, analytic workloads, and real-time inferencing will be done on IoT devices or gateways, with the centralized big-data clusters increasingly specializing in massively parallel training of the AI models that are pushed down to the edges.
What key trends do you see in 2018?

In 2018, we'll see the data-science/app-development community converge on an open AI development framework that wraps a common abstraction layer around modeling tools such as TensorFlow, MXNet, Caffe2, Spark, etc, We'll also see automation of the ML/DL/AI development pipeline pick up pace as diverse innovations coalesce around a new DevOps-focused "model factory" paradigm. Furthermore, we'll see significant growth in startups that specialize in delivering synthetic labeled training data to satisfy the growing demand from data scientists/application developers who don't have the resources to provision such data internally for their model training needs. We'll see an explosion in 2018 in availability of low-cost, low-power, high-performance AI chipsets for edge-based inferencing, which will drive the development of a new generation of ML/DL compilers, integrated with data science dev tools, that can optimize models for deployment to these disparate hardware substrates. We'll see a mania for generative adversarial networks to drive materials fabrication, style transfer, and many other practical applications of AI in the real/physical world. And mainstream data scientists will rapidly gain expertise in reinforcement learning to apply to the growing range of intelligent robotics projects coming into the business world.

Doug Laney is VP & Distinguished Analyst, Chief Data Officer Research, Gartner, and Author of "Infonomics" (Sept '17).

By far, the biggest data and analytics story of the past year was the resurgence of AI — this go round as a primary tool of data scientists and a means to derive value from a burgeoning array of available information assets. Increasingly, organizations are dispensing with analytics projects to produce pretty pie charts and dashing dashboards in favor of automated and optimized discovery, decision-making and processes. Gartner analysts handled nearly 10,000 client inquiries the past year on this topic alone.

As for the coming year, key trends I see emerging are more strategic than technical:

  • The application of traditional asset management principles and practices (e.g. supply chain, ITAM, records management) toward managing information as an actual asset. This will include inventorying information, measuring information management maturity, and eventually lead to extended information ecosystems among partners.
  • The formal, proactive curation of external information assets from partners, suppliers and other 3rd parties to provide improved leading indicators.
  • Broadened monetization of information assets, internally and externally, directly and indirectly, and via an emerging crop of data marketplaces.
  • Data governance efforts dropping the unhelpful notion of data “owners” in favor of “trustees”, and expanding the data steward role to include information advocacy.
  • A primary focus on, and even automation of, data-related compliance.
  • Quantifying the value of one’s information assets. (Boards and CEOs are starting to ask, even as the accounting profession is still oblivious to the Information Age.)
  • The recognition and exploitation of information’s unique economic properties (“infonomics”).
  • The mainstreaming of the CDO role, and bifurcation of IT orgs into separate “I” and “T” groups.

Ying Li, Chief Technology Officer at DataSpark, Inventor with over 70 patents in data mining, text mining, machine learning, software optimization; Chair/Co-Chair of more than 15 international conferences/workshops, including KDD-2008.

In 2017, we see much more real time capabilities, and big data analytics for IoT is being productized into IoT product offerings.

In 2018 I expect to see deep learning will be added into big data production systems that operate on multiple diverse data sources to bring impact to traditional businesses.

Yves Mulkers is a Business Intelligence & Data Architect, Social media Influencer, and Founder of 7wData.

In the first months of 2017 we were still talking big data, but after the first quarter it was all IoT, and half way through the year we had to make room for nothing but Artificial Intelligence and deep learning.

For the techies among us, Spark got the needed traction and attention, with major feature improvements.

Faster time to market and a great USP for the vendors, self-service data science and BI, ETL automation had more visibility and interest.

Along the same line new vendors bring more 'no-ETL' solutions, to avoid to model and physically store the data befor you can even start your analysis.

Platforms for Data science preparation got more SQL-ified, because this is the language most of us data professionals still master the best.

IoT is creating great opportunities for analytics, or maybe it's the other way round. Anyway, Analytics are showing value of the IoT data. This brings new ways for calculating, and with the huge volumes introduced by IoT, calculation is moving to where the data resides, the Edge.

As May 2018 is approaching, we give a big hooray to the European GDPR regulation (well in a certain way). This initiative makes companies finally revisit data flows and data architecture. It will finally help companies understand their data is really an asset, and needs to be treated that way.

Several severe data breaches, got the attention drawn to cyber security, a good morning wake up call to safeguard your data hubs.

What will 2018 bring?

Even more data. AI and deep learning will continue to mature, but will need more years, to prove their success and use cases. Also AI is starting to play a bigger role in Data science, helping (not replacing) the data scientist.

Data science platform are moving more and more to the cloud, but hybrid cloud is lurking around and offering the cloud on premise, a kind of best of both worlds.

GPU DB's surfaced already at the end of 2016, but got the right attention in combination with powerful in memory architecture and new chipsets from major vendors increase performance 100 - 1000 fold.

And sure not to forget, blockchain is around the corner, providing more security and a new way of storing your data in a distributed safe way.

Data is here to stay.

William Schmarzo is CTO, Dell EMC Services Big Data, and known for good reason as the "Dean of Big Data."

Main 2017 Big Data Developments

  • Emergence of Advanced Analytics (Deep Learning, Machine Learning, Artificial Intelligence) as business differentiators driving Big Data maturation. Advanced analytics potential spurring business executives to increase big data investments.
  • Democratization of Advanced Analytics. Open source frameworks and the cloud are enabling organizations of all sizes to exploit the business potential of Advanced Analytics.
  • Maturation of Chief Data Officer (CDO). Today CDO role looks like Chief Information Officer “mini-me”. But increasing awareness that CDO should really be “Chief Data Monetization Officer” who focuses on driving data monetization.

Key 2018 Trends

  • Continued transformation of Big Data from an IT “task” to a Business mandate. Business leadership and ownership mandatory for organizations trying to “monetize their data.”
    Organizations that master Big Data are better positioned to exploit Advanced Analytics. More accurate, more complete, more enriched Big Data is the nitrous oxide for Advanced Analytics.

  • Continued business adoption of Economic Value of Data. Continued research (with University of San Francisco) will highlight economic potential of data and analytic assets; digital assets that never deplete, never wear out and can be used across infinite use cases at zero marginal cost. Data isn’t the new oil; Data is the new sun!

Ronald van Loon is Director of Adversitement, where he is helping data driven companies generating success. He is a Top 10 Big Data, Data Science, IoT, AI Influencer.

2017 brought about an adaptation speed and focus on Machine Learning across the technology industry that exceeded expectation. A majority of technology vendors integrated Machine Learning capabilities into their offerings, building a real-time data and analytics foundation that augments human capabilities and increases efficiency levels. And mainstream Machine Learning applications are going to increasingly mature in 2018 as organizations start reshaping their infrastructures to automate repetitive tasks, and improve not only their productivity levels, but their ability to better cater to their customers.

In 2018 companies will need to effectively handle the growth of different data streams which necessitates fully integrated data management platforms. This gives organizations the capability to transform all of this data into actionable insights that can be applied towards accelerating competitive strategies, such as creating a personalized and responsive Customer Experience.

Deep Learning and Artificial Intelligence applications in voice recognition and video analytics will also be on an escalated trajectory in 2018, responding to the fast growth of video usage from connected devices such as tablets, security cameras, and smartphones. As video camera use increases, video analytics will continue to be fueled by digital technology trends like Deep Learning and Artificial Intelligence, which provides companies with the means to analyze massive volumes of streaming video data to identify objects, people, and environments in real-time. This will facilitate swift communication across systems for fast predictions, decision making capabilities, and a proactive approach to the development of intelligent systems.

Edge Analytics is another 2018 trend that arises from the substantial increase in connected devices, which is predicted to reach 30 billion by 2020. Edge Analytics applications gives organizations the capability to improve their responsiveness and perform analyses in real-time at whatever point data is generated, efficiently overcoming data management challenges related to overtaxed central systems, slow network availability, and large volumes of streaming data from so many connected devices.

Mark van Rijmenam is founder of & Datafloq, internationally renowned Big Data and blockchain strategist and keynote speaker.

2017 was a great year for big data. In my trends prediction for 2017, I called it the Year of Intelligence. We saw artificial intelligence taking a leap forward with deep learning, developing AIs that go about their own way. In addition, we saw conversational AI, a.k.a. chatbots, really taking off with many organisations developing chatbots to improve their customer service. In the field of Big Data, predictive analytics has now become a prerequisite for organisations to remain competitive advantage, making every organisation a data organisation.

2018 promises to be another exciting year in terms of technology. I called it the Year of Transition as it is safe to say that we have left behind the Information Revolution of the 1970s and are on our way to the 4th Industrial Revolution. However, we are not there yet and many technologies will require a few more years of development before they will truly cause a paradigm shift in our societies and organisations.

Nevertheless, 2018 will see artificial intelligence becoming more intelligent, this time without being trained on human data, removing the possibility of bias in the outcome. Blockchain, and especially ICOs, will become more regulated and we will see a true arms race in the field of quantum computing. All these extra processing capabilities will help organisations move to prescriptive analytics and help them benefit from the final stage in big data analytics. 2018 will be an exciting year and if you wish to learn more about the top 7 technology trends of 2018 read more on:

Matei Zaharia is Chief Technologist at Databricks, also known for starting up the Apache Spark project as a graduate student.

2017 saw continued growth and rapid evolution of big data tools in the cloud. Through offerings from many vendors, we see that big data in the cloud is not just a matter of “forklift” deployment of on-premise systems, but instead means new systems that take advantage of the scale, elasticity and management capabilities. For example, per-second billing and serverless computing enables truly elastic computation, while services like S3 Select enable fundamentally new ways of querying data. Neither of these has an equivalent on-premise. I expect to see cloud data management systems continue to evolve in 2017 and ultimately enable new data architectures beyond today's data lakes and warehouses. At Databricks, we have already started down this path by announcing Databricks Delta, a system that combines the cost-efficiency of S3 with the high performance of a data warehouse.