Industry AI, Analytics, Machine Learning, Data Science Predictions for 2020

Predictions for 2020 from a dozen innovative companies in AI, Analytics, Machine Learning, Data Science, and Data industry.

Last week we published Here is a third part of our 2020 prediction series with a roundup of predictions from the some of the most innovative companies in the AI/Analytics/DS/ML industry.

Some of the common themes were: Data, Business, democratization of Data Science, AutoML, NLP, Cloud, and DataOps.

Here are the answers from @Alluxio, @Alteryx, @AppenGlobal, @CasertaData, @Circonus, @dotDataUS, @Infoworksio, @Izenda, @Lexalytics, @Mathworks, @Percona, @Sisudata, and @StreamSets.

Predictions 2020 Industry Word Cloud

2020 AI Predictions by Haoyuan Li, founder and CTO of Alluxio

One Machine Learning framework to rule them all
Machine learning with models has reached a turning point, with companies of all sizes and at all stages moving towards operationalizing their model training efforts. While there are several popular frameworks for model training, a leading technology hasn't yet emerged. Just like Apache Spark is considered a leader for data transformation jobs and Presto is emerging as the leading tech for interactive querying, 2020 will be the year we'll see a frontrunner dominate the broader model training space with PyTorch or Tensorflow as leading contenders.

"Kubernetifying" the analytics stack
While containers and Kubernetes works exceptionally well for stateless applications like web servers and self-contained databases, we haven't seen a ton of container usage when it comes to advanced analytics and AI. In 2020, we'll see a shift to AI and analytic workloads becoming more mainstream in Kubernetes land. "Kubernetifying" the analytics stack will mean solving for data sharing and elasticity by moving data from remote data silos into K8s clusters for tighter data locality.

AI & analytics teams will merge into one as the new foundation of the data organization
Yesterday's Hadoop platform teams are today's AI/analytics teams. Over time, a multitude of ways to get insights on data have emerged. AI is the next step to structured data analytics. What used to be statistical models has converged with computer science to become AI and ML. So data, analytics, and AI teams need to collaborate to derive value from the same data they all use. And this will be done by building the right data stack - storage silos and computes, deployed on-prem, in the cloud, or in both, will be the norm. In 2020 we'll see more organizations building dedicated teams around this data stack.

Alan Jacobson, Chief Data and Analytics Officer, Alteryx.

Democratization of data takes the fore

2020 will be marked as the year that data finally became democratized. The movement of analytics away from data science teams and towards full saturation throughout the business will finally boil over after simmering for the past few years. This self-service revolution will change how organizations interact with their data, bridging the gap between people with business knowledge and people with data knowledge.

Enabled by easy-to-use APIs and the union of a large range of data sources, self-service analytics will allow for one of the most important stages of digital transformation - data integration. The typical data worker is beginning to move away from the IT domain and into the domain of business, resulting in a larger volume of workers conducting data tasks. The result will be more data being processed, a higher quantity of analyses and ultimately a larger, more positive impact on the business.

Wilson Pang, CTO of Appen.
  • Natural language processing advances enables broad adoption of chatbots, and online Q&A for customer service and more:
    We've seen some NLP breakthroughs this year and last. BERT, for example, has expanded what is now possible with NLP models. We will see more AI applications like service chat bot, online question & answer, sentiment analysis, etc being adopted by more and more companies in 2020.
  • ML tools & AIOps gain more traction in enterprise:
    Over the last few years, we have witnessed the maturation of the whole ecosystem of machine learning and AI tools. Tools around the the entire tech stack-data annotation, model training, debugging, model serving, deployment and production monitor-will grow massively next year. To help manage all of these tools, more companies will turn in 2020 to the practice of AIOps. Large companies' platforms, like AWS, GCP, and Microsoft Azure already have good tools to support AIOps, but many Fortune 500 companies are still wary of deploying to the cloud, where those platforms reside.
  • Security and ethics best practices drive more on-premise AI deployments:
    As more organizations experiment with more data for their AI initiatives, security and ethical use of AI will become more and more important. Chief among the concerns in this arena are data leaks, especially with personally identifiable information (PII), and new product ideas and proprietary information. These concerns should lead to more on-premises solutions for enabling AI creation, including solutions for data annotation and leveraging a diversified crowd securely. Ensuring secure data practices will be just part of a growing approach to more ethical AI use. This approach will also include caring about the wellness of the crowd and more carefully considering how AI applications will impact people who use them or the lives of the people AI was meant to improve.

From Joe Caserta, the founding President of Caserta.

2019 saw the understanding on the part of business leaders that using the greatest analytic platforms just to create reports was insufficient. 2020 will see the realization of analytics maturity from a people, process, and technology perspective. Organizations will begin to innovate how they do data discovery and business intelligence and will start to use data spiders, bots, artificial intelligence, and NLP to query data and get to insights faster. We are in store for another data revolution that will drastically change the current landscape and turn modern data engineering on its head.

Bob Moul, CEO of machine data intelligence platform Circonus.
  • Value of IoT Data Comes to Fruition - Decisions that result from analyzing IoT data at scale will deliver a gold mine of business opportunities, helping to lower costs, mitigate downtime and prevent problems before they happen.
  • Container observability - Over the past few years, many folks were dipping their toes in Kubernetes, learning and doing proof of concepts. In 2020, we're going to see a huge number of those deployments go online, tightly aligned with the DevOps function within enterprises.The caveat is that container environments emit an enormous volume of metrics, and many legacy monitoring products won't be able to handle the high cardinality requirements.
  • Growth of IoT will necessitate an innovative storage solution - Gartner predicts there will be approximately 20 billion IoT-connected devices by 2020. As IoT networks swell and become more advanced, the resources and tools that managed them must do the same. Companies will need to adopt scalable storage solutions to accommodate the explosion of data that promises to outpace current technology's ability to contain, process and provide valuable insights.
  • Increased complexity in monitoring infrastructure - We're seeing a large rise in the volume of metrics, being driven by DevOps practices such as blue-green deployment. When you take those practices and combine them with rapid CI/CD, you see some agile organizations doing upwards of a dozen releases today. There will be a need for significant changes in tooling to help support these use cases.

Ryohei Fujimaki, Ph.D., CEO, and founder of dotData.

In 2019, AutoML gained increased traction as organizations have realized the power and need for automating as much of the data science as possible. Traditional AutoML, however, has also shown to be limited and hampered by the highly manual and time-consuming process of designing features necessary for AutoML to succeed. 2019 Was also the year that saw the rise of AutoML 2.0 - a new iteration of the AutoML experience that uses AI to leverage raw business data in relational data sets to automatically create, evaluate and score features that are then evaluated against machine learning algorithms.

In 2020 we expect this trend towards full-cycle automation of data science to accelerate as more vendors jump on the AutoML 2.0 train. Another big trend in 2020 will be the operationalization and productization of ML pipelines. It will become increasingly important to automate as much of this as feasible with early MLOps trials already in place.

Infoworks CEO, Buno Pati.

  • The Ability to Harness the Power of Data will Accelerate Disruption Across the Economy and Create Winners and Losers More Quickly than in the Past
    New challengers will rise faster than seen before in this next decade and incumbent leaders will fall just as fast. Research from BCG shows that for large companies, there is now less correlation between past and future financial and competitive performance over multiple years. Data scientists across all industries currently spend about 80% of their time on lower-value activity such as ingesting data, incrementally updating data, organizing and managing data, optimizing pipelines and delivering data to applications. The cost: only 20% of data scientists' time is spent on developing applications to further growth and competitive advantage for business. Those who truly harness the power of data via new, automated approaches to data operations and orchestration will thrive, as this will enable them to focus their data science talent on creating business value. The impact of digital transformation will be felt across all segments of the economy - in expected (technology, financial services, retail/etail, etc.) and unexpected places (agriculture, home improvement, public sector, etc.).
  • We Will See a Dramatic Increase in Consumer Control Over "Personal" Data as Privacy Laws Evolve Over the Next Decade
    GDPR and CCPA (California Consumer Privacy Act) are just the tip of the iceberg with regards to the protection and consumer control of consumer data. Over the course of the next decade, consumer control of personal data can be expected to increase dramatically as governments and regulators drive new privacy legislation. In time, these regulatory actions will likely lead to complete consumer control of personal data and opportunities for consumers to directly monetize their data or directly exchange data for goods and services.
  • The Clean-Power Movement Will Create a Deluge of Data and New Analytics Use Cases Over the Next Decade
    The fastest growing industries in America today are solar and wind, and jobs in these industries are expected to grow twice as fast as any other occupation over the next decade. (source: U.S. Representative from California's 17th congressional district, Ro Khanna) Technological advancements in these industries have driven costs down and sparked a clean-power movement that quadrupled global renewable energy capacity within the past nine years (source: UNEP). This capacity, which is more than every power plant in the U.S. combined, will create a deluge of data and new analytics use cases aimed at maximizing the benefits, and optimizing the use of these technological developments over the next decade. Managing and utilizing this tsunami of data will require sophisticated systems for data operations and orchestration, which transcend the manually-intensive methodologies of the past and enable data scientists to focus on the best and highest use of their talents - driving continued growth in the industry through data-driven processes and insights.

From Izenda

People: If 2019 was all about machines, 2020 will be all about people. This year, we saw AI and machine learning in data analysis being used in earnest - resulting in quicker (and more valuable) insights than ever before. The next step is to democratize that process - removing the burden of data projects from highly-skilled workers and empowering the non-technical end-user to discover those same kinds of insights. No need to hire additional analysts. No need to train users on query language. Users will be able to explore their data with the same ease that they use Google.

Democratization of Data Science
Natural-language processing via text or voice will help foster the boom of "citizen data scientists." And while a few BI tools have already added NLP functionality in their platform, there's still one thing that keeps them from being accessible: pricing. In 2020, we'll begin to see affordable SaaS BI tools with similar power and functionality as tools that cost tens of thousands of dollars. That combination of Machine Learning capabilities and self-service functionality all in an affordable platform will give businesses of all sizes the power to find actionable insights in their data.

Jeff Catlin, CEO of Lexalytics

As someone running a text-focused AI/ML business, there were two trends that jumped out in 2019: The permeation of models like BERT and XLNet, and a noticeable 2nd half-of-the-year pivot of data scientists from writing everything themselves to solving problems using AI tools and platforms. The two are related: While BERT's a game-changer in providing great results using a fraction of the training data, it's a heavy technical lift to become proficient, hence the pivot to platforms that include all the plumbing built-in.

For 2020, AI will will solidify its position as the defining technology of the next decade. Providers will pull back on the "magical" angle, pushing the correct message that AI can aid humans, making them faster and better at their jobs. Also, NLP will become a bigger part of RPA, where vendors are sorely lagging in NLP. As companies automate larger processes, NLP vendors offering on-premise + hybrid cloud options, easy-to-integrate APIs, customizability, quick ROI -- will address the need.

By Bruce Tannenbaum, Senior Manager of Product Management, MathWorks
  • AI becomes more accessible across the workplace
    As AI-related industrial growth continues, the technology will expand past the realm of data science, impacting applications such as medical devices, automotive design, and industrial workplace safety.
  • AI will deploy to low power, low cost embedded devices
    Over the next year, we will witness the deployment of AI on low power, low cost devices. AI has typically used floating-point math for higher accuracy and easier training of models, but it ruled out low cost, low power devices that use fixed-point math. Recent advances in software tools now support AI inference models with different levels of fixed-point math.
  • Reinforcement Learning moves from gaming to real-world industrial applications
    In 2020, reinforcement learning (RL) will transform from playing games to enabling real-world industrial applications, particularly for automated driving, autonomous systems, control design, and robotics. We'll see successes where RL is used as a component to improve a larger system, such as improving driver performance in an autonomous driving system.
  • Simulation lowers a primary barrier to successful AI adoption - lack of data quality Data quality is a top barrier to successful adoption of AI - per analyst surveys. The normal, everyday system operation generates large amounts of useable data. However, the hard to find data from anomalies or critical failure conditions is often more valuable. Training accurate AI models requires lots of this data and simulation will help get data AI-ready and lower this barrier in 2020.

Matt Yonkovit, Chief Experience Officer at Percona.

Databases will get more autonomous
There is a skill shortage in the area of database implementation, particularly around the cloud. More companies want to take advantage of their data, but they are finding it difficult to run operations successfully at the speed that they want to achieve. Developers picking databases to run with their applications just want them to work, without the administrative duties and having to become DBAs to make this happen.

The database vendors have responded in the past by launching more managed services - however, this can move the problem elsewhere. This year, companies have started talking through how to automate database management and make these instances autonomous and self-healing. It was a big theme at Oracle's customer conference, and we, at Percona, have launched our own initiatives into how to make databases in the cloud more autonomous.

Next year, more autonomous database services will become available to meet the need for speed. However, the important thing to be aware of here is how this autonomous service is designed and delivered. What is great for the majority may not be suitable for everyone.

From Peter Bailis, CEO at Sisu

From our work with customers seeking this promised golden age of data, we see four major shifts gaining momentum in 2020. Starting with the rise of a new analytics stack, we'll also see a shift in focus away from dashboards to a more diagnostic approach to analysis, a demand for more useful facts, and the emergence of a new role - the Operational Analyst.

1. The rise of a new, more flexible analytics stack. Starting with an investment in cloud data warehouses like Redshift, Snowflake, and BigQuery, companies are also adopting modern data pipeline and ETL tools like Fivetran and Stitch to funnel more data into these structured storage solutions. What's next? Companies will rebuild their diagnostic tools to cope with the influx of richer data.

To handle the dozens of data sources and near-real time data volumes in a typical organization, IT and data teams will rebuild their analytics infrastructure around four key layers:
  • A cloud data warehouse, like Snowflake, BigQuery, Redshift, or Azure
  • Data pipeline tools like Fivetran and Stitch
  • Flexible dashboarding and reporting tools like Looker
  • Diagnostic analytics tools to augment the abilities of analysts and BI teams
Beyond 2020, governance comes back to the forefront. As platforms for analysis and diagnosis expand, derived facts from data will be shared more seamlessly within a business, as data governance tools will help ensure the confidentiality, proper use, and integrity of data improve to the point they fade into the background again. In 2020, we'll see a shift in how companies use and perceive analytics.

2. Diagnosis over dashboarding. Combined with this infrastructure change, we're seeing board rooms asking why metrics are changing and what those changes mean for day to day business operations. Competitive moats are being built (and crossed) based on the effective use of data, and successful companies will need to stop thinking about their data as a passive archive and more of a competitive asset.

3. Rise of the Operational Analyst. The future of data analytics is that we'll see the rise of the operational analyst. Data is not the sole domain of the data scientist anymore. Everyone in an organization will start acting more like a data analyst on a daily basis, and we'll see new skills and tools focused on specific use cases emerge. Analyzing trends, changes, and using data to make impactful decisions will become the new employee norm and expectation. It is no longer limited to the business analyst or the marketing analytics team.

Kirit Basu, VP, Products, StreamSets

DataOps will gain recognition in 2020
As organizations begin to scale in 2020 and beyond - and as their analytic ambitions grow - DataOps will be recognized as a concrete practice for overcoming the speed, fragmentation and pace of change associated with analyzing modern data. Already, the number of searches on Gartner for "DataOps" has tripled in 2019. In addition, StreamSets has recognized a critical mass of its users embracing DataOps practices. Vendors are entering the space with DataOps offerings, and a number of vendors are acquiring smaller companies to build out a discipline around data management. Finally, we're seeing a number of DataOps job postings starting to pop up. All point to an emerging understanding of "DataOps" and recognition of its nomenclature, leading to the practice becoming something that data-driven organizations refer to by name.

Arvind Prabhakar, co-founder and CTO, StreamSets

Businesses will need to fill the Apache Spark skills gap
In 2020, we will see more technologies come to life that enable companies to solve core business problems and extract insights from data without being required to have a deep technical understanding of Apache Spark. Businesses will need to take advantage of tools like Apache Spark without having a set of specialized skills. This will enable organizations to achieve continuous data and monitoring for their organization and see just how every operation and application is performing for their business.

See also