Industry 2021 Predictions for AI, Analytics, Data Science, Machine Learning

We bring you industry predictions from 12 innovative companies - what key trends they expect in 2021 in AI, Analytics, Data Science, and Machine Learning?

Earlier we published Here is last part in our 2021 Predictions series - the predictions from the industry. We received many submissions, and to keep this article size manageable, we limited this to 12 companies: Alluxio, Alteryx, Diamanti, Dremio, Indicative, Lexalytics, Luminoso, MathWorks, MobiDev, Qlik, SAS, and Splice Machine.

Predictions 2021 Logos

Haoyuan Li, Founder and CEO, Alluxio

"Containers" everywhere for analytics and AI
  • Containerized application deployments and Kubernetes have started to gain traction with enterprises increasingly moving away from traditional Hadoop based data lakes. While moving away, enterprises are realizing the benefit of abstracting the physical infrastructure while also adopting public clouds for agility. Vendor lock in is a concern but at the same time a uniform toolset across environments is a must to reduce spending on the expertise required to operate across environments, such as hybrid and multi-cloud. Container based deployments for compute abstraction alongside new abstraction services for storage anywhere, will be the solution of choice for enterprises moving off Hadoop.
Convergence of Machine Learning frameworks
  • Companies of all sizes and at all stages are moving aggressively towards operationalizing machine learning efforts. There are several popular frameworks for model training, including TensorFlow and PyTorch, leading the game. Just like Apache Spark is considered a leader for data transformation jobs and Presto is emerging as the leading tech for interactive querying, 2021 will be the year we'll see a front-runner dominate the broader model training space with PyTorch or TensorFlow as leading contenders.
AI & Analytics provided by the same platform (team)
  • AI and Analytics capabilities were provided by different platforms / teams in the past. Over the years, we are seeing the platform is converging and the AI team is more focused on the algorithmic side, while AI & Analytics platform teams merged to provide the software infrastructure for both analytics and AI use cases.

Alan Jacobson, Chief Data and Analytics Officer, Alteryx

Upskilling will play a bigger role in corporate boardrooms as well as in employees lives.

While it is always important for companies to offer training to employees, the fields of data science and digital transformation are challenging companies to break the mold and deliver new and constantly evolving ways to upskill and deliver ROI. More and more, we're going to see upskilling programs that help people learn and apply skills in real time. Hackathons are one example of how this is happening currently in many companies. We're going to see an expansion of these and other on the job experiences that use real data and real problems with a goal of creating real value. Data science has evolved to the point where people don't need to go back to college to learn, they'll learn on the job or while at home by encountering new tools and technologies. And with a huge shortage of those with analytic skills, many will start new jobs and careers based on the new skills.

The "analytic divide" is going to get worse.

Like the much-publicized "digital divide" we're also seeing the emergence of an "analytic divide." Many companies were driven to invest in analytics due to the pandemic, while others have been forced to cut anything they didn't view as critical to keep the lights on - and a proper investment in analytics was, for these organizations, analytics was on the chopping block. This means that the analytic divide will further widen in 2021, and this trend will continue for many years to come. Without a doubt, winners and losers in every industry will continue to be defined by those that are leveraging analytics and those that are not.

Analytics platforms and processes will increasingly outperform ad-hoc, siloed solving.

Businesses are already starting to democratize data across the organization, arming more employees with real-time insights. I see this accelerating with both a cultural shift and a technology shift. This trend will result in data gurus and citizen data scientists with deep domain knowledge increasingly joining forces as part of a more holistic and effective problem-solving process.

Boris Kurktchiev, Field CTO, Diamanti
  • Everyone thought that AI/ML was going to be the next big thing, but I think there is confusion around what AI/ML means. There's a lot of confusion to the end user about how and why AI matters to them. We'll see more advances in augmented technology to determine what the application of AI and ML means and how to use it to make technology better for the end user.
  • Everyone wants hybrid cloud, and hybrid cloud relies on one thing: federated Kubernetes. This idea has been the twinkle in the developer community's eye since 2015. 2021 is the year that we see a proper implementation of that to the point where organizations can truly have a hybrid cloud. Without federated Kubernetes, organizations must contend with disparate components living in different clouds but not able to truly integrate with one another.

Tomer Shiran, co-founder of Dremio

Separation of Compute and Data Becomes the Default Choice

The rise of cloud data lake storage (e.g., Amazon S3 and Azure Data Lake Storage) as the default bit bucket in the cloud, combined with the infinite supply and elasticity of cloud compute resources, has ushered in a new era in data analytics architectures. Just as applications have moved to microservice architectures, data itself is now able to fully exploit cloud capabilities. Data can be stored and managed in open source file and table formats such as Apache Parquet and Apache Iceberg, and accessed by decoupled and elastic compute engines such as Apache Spark (batch), Dremio (SQL) and Apache Kafka (streaming). With these advances data will, in essence, become its own tier, enabling us to rethink data architectures and leverage application design benefits for big data analytics.

The Shine of the Cloud Data Warehouse Wears Off

The cloud data warehouse vendors have leveraged the separation of storage from compute to deliver offerings with a lower cost of entry than traditional data warehouses, as well as improved scalability. However, the data itself isn't separated from compute-it must first be loaded into the data warehouse, and can only be accessed through the data warehouse. This includes paying the data warehouse vendor to get the data into AND out of their system. So, while upfront expenses for a cloud data warehouse may be less, the costs at the end of the year are likely significantly higher than expected. By leveraging modern cloud data lake engines and open source table formats like Apache Iceberg, however, companies can now query data in the data lake directly without any degradation of performance, resulting in an extreme reduction in complex and costly data copies and movement.

Data Privacy and Governance Kicks Into Another Gear in the United States

Users are increasingly concerned about their online privacy making it much more likely that the United States will adopt regulations similar to Europe's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). This will require companies to double down on privacy and data governance in their data analytics infrastructure. Furthermore, companies will realize that data privacy and governance cannot be achieved with separate standalone tools, and instead must be implemented as an integral part of the analytics infrastructure. Because of this, data version control will become standard in cloud data lakes and open source technologies such as Project Nessie will enable companies to securely manage and govern data in an enterprise-wide platform.

Jeremy Levy, CEO of Indicative.
  • As data professionals, we have a responsibility to the broader public. I think that within the next year we will see progress toward a code of ethics within the data analytics space, led by conscious companies who recognize the seriousness of potential abuses. Perhaps the US government will intervene and pass some version of its own GDPR, but I believe that technology companies will lead this charge. What Facebook has done with engagement data is not illegal, but we've seen that it can have deleterious effects on child development and on our personal habits. In the coming years, we will look back on the way companies used personal data in the 2010s and cringe in the way we do when we see people smoking on a plane in films from the 1960s.

Lexalytics CEO Jeff Catlin and Chief Scientist Paul Barba:
  • Data Annotation will become the next big "side hustle" in 2021. It's already a common way to make an extra buck or two, but there's been a race to the bottom in pricing, where annotations are largely sourced well below minimum wage in industrialized nations. However, as AI sees successes in industries requiring expertise, like health care or law, the demand for specialist knowledge will see the development of infrastructure for matching more lucrative annotation contracts to professionals.
  • There will be more consolidation in the ML platform space. As AI became the "it" technology over the last few years, a bunch of AI infrastructure companies popped up and began peddling AI platforms to ease the task of building models for companies looking to leverage AI. While it sounds good on the surface, there is no identified business task being solved here, it's simply more efficient use of technology, and that's hard to sell. It's likely that the VCs who backed these plays will begin severing the cash lifelines in 2021.
  • The improvements in deep learning models over the last 18 months means that NLP features that have been desired but unfulfilled will start showing results. These include better entity recognition which drives better normalization, which in turn drives generic relationship extraction. The advances in deep learning models make all of these possible.
  • AI platforms will consolidate, but AI services will pick up the slack here. Companies are becoming more accepting of 3rd party expertise in machine learning, and this is driving an increase in consulting services for ML. This trend will continue and accelerate in 2021.
  • Fake news detection will start showing dividends. Fake news detection is an incredibly hard problem, but a lot of very smart people are spending a lot of time working on it. The spread of misinformation will be notably lower by late 2021.

Robyn Speer, Chief Science Officer at Luminoso

Doing more to fight bias in AI
  • In 2021, I really hope business will do more to fight AI bias in all its forms. If only it could be as simple as "not training on biased data." But where is unbiased data going to come from? Any data that you collect in quantity reflects the biases of the world we live in. I recently discussed this in this Twitter thread.
  • I see four steps to fighting AI bias that happen at different stages of machine learning: Knowing the biases of our source data and how to account for them; applying de-biasing techniques, when appropriate, to counteract the ways that biases get baked into intermediate representations; ensuring that the results of machine learning are used in ways that are fair and transparent; and being responsive and accountable in cases where the system turns out to have flaws or unintended consequences.

Johanna Pingel and David Willingham, Deep Learning Project Managers at MathWorks

A note on COVID-19: Investment in AI has not decreased

We'd be remiss if we didn't mention COVID-19, an unforeseen trend of 2020, which is expected to continue with us into 2021. Overall investment in AI-related projects has not decreased. While some heavily impacted industries have cut back in the near term, analysts report that these have been offset by those who increased their investment above what they had forecasted. Many are using this time to invest in upskilling remote learning, with AI themed courses amongst the top sought after by the engineering and scientific community, making them primed and ready to take on more AI projects in 2021.

AI aligns engineering, computer science, data science and IT direction
  • Engineers will continue to work with data scientists using AI models to enhance existing applications or discover new innovative solutions to the projects they're working on. However, creating a successful AI-based system is more than just developing a model. It requires model lifecycle management, which includes training, deploying, monitoring and updating the model for the system in which it resides. To do this efficiently these processes need to be automated, robust and well maintained. In 2021, engineers will augment their workflows to include:
Model explainability will reduce the aversion to AI within safety critical systems
  • AI has long been considered a black box approach to modelling systems, and with it a fear that how it operates is largely unknown. As more explainability methods are being produced by research and more software vendor tools offer them, industry practitioners will more readily adopt AI innovations within their workflows.
Engineers and scientists are beginning to understand why a model is making certain decisions and the limits at which a model can operate safely. They are running experiments to explain how a model operates in a variety of scenarios and using visualizations to understand the inner workings of a model when it doesn't behave as it should. It's driving innovation in the verification and validation of AI within safety critical systems, with automotive, aerospace and medical standards committees, such as EUROCAE and the FDA, working on the levels needed for certification.

Maksym Tatariants, Data Science Engineer at MobiDev

Growing adoption of Machine Learning in mainstream software, including mobile apps, defined 2020. Together with the hardware support, like Apple's M1 chip, or Nvidia's Ampere GPU architecture, the "intelligent process automation" will continue its growth in 2021.

Besides, Edge AI has made a strong contribution to hardware evolution. There is definitely a clear focus on optimizing the latest neural networks for smartphones and IoT devices. Acceleration techniques, such as Automated Mixed Precision or TensorRT, are becoming available and easy-to-use. Thereby, they will help to perform edge computing in a better manner. As a result it will improve privacy and security of the user's data.

3D dimension research is another key trend: 3D reconstruction, pose estimation, and scene understanding. Although there's still some lack of information, plenty of promising model architectures are appearing. The 2021 challenge is to learn how to work with 3D dimensions in real-time on consumer-grade hardware.

Dan Sommer, Senior Director, Global Market Intelligence Lead at Qlik

According to Gartner, by the end of 2024, 75% of enterprises will shift from piloting to operationalizing AI, driving a 5x increase in streaming data and analytics infrastructures. Having up-to-date and business ready data are more important than ever.

Since the pandemic arrived, we've seen a surge in the need for real-time and up-to-date data. What is usually fairly stale - quarterly business forecasts, for example - is fleeting and mutable now. Alerts, data refreshes and forecasts will need to occur more often, with the freshest variables. On a macro level, we've seen disruptions to supply chains, with hospitals scrambling to procure PPE and consumers stockpiling toilet paper. In the case of PPE, we reacted to an actual shortage too slowly; with toilet paper, consumers broke the supply chain by assuming a shortage where none existed. Surges like these are accentuated in a crisis, and we have to build preparedness for them.

Kimberly Nevala, AI Strategic Advisor, SAS:
  • The Analytics "Core" Gets Reinforced. The pandemic upended expected business trajectories and exposed the weaknesses in machine learning systems dependent on large amounts of representative historical data, including well-bounded and reasonably predictable patterns. As a result, organizations will bolster investments in traditional analytics teams and techniques better suited to rapid data discovery and hypothesizing.
  • Ethical AI Principles Cede to Responsible AI Practices. Organizations will move beyond ethics in principle to practical procedures to guide AI decision-making. This will include right-sizing AI governance and oversight to their specific industry, problem domain and level of maturity. Accountability for AI-enabled products and services will be placed with the product owner. Bolstered by increased consumer awareness and agency, leading adopters will promote "responsible AI" practices as a value-added differentiator.
Sarah Gates, Analytics Strategist, SAS: 2. The Year of ModelOps
  • Pressures created by COVID-19 have raised organizational awareness of and need for ModelOps - the holistic approach used to rapidly move mathematical models through the analytics lifecycle, delivering value and insights faster. For organizations wanting to accelerate their digital transformation and to rev up agility and competitiveness, ModelOps is the magic fairy dust that will make it possible.

Monte Zweben, CEO of Splice Machine.
  • Feature Stores will be implemented as the #1 ML product in 2021 to operationalize Machine Learning
  • Every commercial database will have ML features
  • Cloud migrations accelerate 10x, causing a major land grab by AWS, Azure, and GCP
  • Vendor lock-in becomes the #1 concern of cloud migrations as companies fear lock-in by the cloud provider equivalent to the control Oracle/IBM had
  • Everyone will be talking about the democratization of machine learning and data science as companies break out of the model where there is a centralized data science silo holding everybody up - much like the "web" group was in the 2000's - now every development team has web skills. The same will happen with ML
  • Data lakes finally die and the re-emergence of the curated SQL data warehouse for structured data with associated cloud storage for unstructured data makes the dream of Big Data finally real