Data Science & Analytics Industry Main Developments in 2021 and Key Trends for 2022
We have solicited insights from experts at industry-leading companies, asking: "What were the main AI, Data Science, Machine Learning Developments in 2021 and what key trends do you expect in 2022?" Read their opinions here.
As we wrap up 2021, we also wrap up our expert roundup series of posts, bringing our readers opinions from different viewpoints on what the major developments of the year were, and what the big stories next year may be.
To this end, we have solicited insights from experts at industry-leading companies, asking:
What were the main AI, Data Science, Machine Learning Developments in 2021 and what key trends do you expect in 2022?
This article approaches the question from an industry standpoint, and over the past week we have published similar articles focusing on the same question from both research and technology standpoints:
- AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2021 and Key Trends for 2022
- Main 2021 Developments and Key 2022 Trends in AI, Data Science, Machine Learning Technology
I would like to thank each of the participants in this round of opinions for taking time out of their busy schedules at such a hectic time of year to provide their insights and opinions: Yashar Behzadi (Synthesis AI), Dipti Borkar (Ahana), Matthew Carroll (Immuta), Kendall Clark (Stardog), Brian Gilmore (InfluxData), Raj Gossain (Alation), Alan Jacobson (Alteryx), Ashley Kramer (Sisense), Haoyuan Li (Alluxio), Buno Pati (Infoworks), Jared Peterson (SAS), John Purcell (DoiT International), Ravi Shankar (Denodo), Dan Sommer (Qlik), Muddu Sudhakar (Aisera), Marco Varone (expert.ai), Ryan Welsh (Kyndi), Brett Wujek (SAS).
And now, without further delay, let's have a look at the AI, Analytics, Machine Learning, Data Science, Deep Learning Industry Main Developments in 2021 and Key Trends for 2022.
In 2021 we saw an accelerated adoption of the cloud for AI and ML apps driven by the rise in popularity of the cloud data warehouse/data lake/data lakehouse. With more data moving to the cloud, companies were faced with an architectural decision - store specific, structured data in an expensive data warehouse where you can run high speed analytics with very good price/performance, or use a data lake to store all data - structured and unstructured - at much lower cost but no built-in query or analytics mechanisms.
In 2022, we’ll see more AI and ML workloads migrating to the data lake/data lakehouse because of the emergence of the Open Data Lake Analytics stack - a stack purpose-built for cloud data warehouse workloads that includes an open source high-performance query engine (Presto) on the data lake for SQL analytics, open formats, and open cloud. The next phase of growth in the cloud will include the open data lake to augment the cloud data warehouse, more open source behind analytics & AI, and out-of-the-box cloud solutions to drive innovation, meaning data platform teams will spend less time spent on managing complex, distributed systems and more time focused on delivering business-driven innovation.
The Meteoric Rise of Conversational AI and Better Language Modeling — In 2022, artificial intelligence will continue to evolve, becoming more transformative and intuitive that it has ever been. From human resources to marketing, Conversational AI technologies are engineered to make life easier. Conversational AI can and will take over mundane, day-to-day internal and customer service tasks, freeing up live agents to deal with more pressing matters.
Conversational AI, like that created by Aisera, is designed to operate in lockstep with employees, establishing an integrated, more efficient, faster customer service experience.
Natural Language Processing will continue to evolve, understanding speech rhythms, along with all of our human, idiosyncratic speech patterns, uhms, ahs and words with mixed meanings. It will continue to learn which ones apply, making it much more reflective of human speech and much more able to direct queries and resolve concerns.
Companies will unlock essential business value by utilizing public and private data marketplaces — Today, companies are already buying data sets to innovate or get insights where data is lacking. In 2022, we will see an increase in organizations turning to public data marketplaces, using two approaches. First, companies that use data catalogs to access, use, and understand the rich data within their organization, will recognize that joining enterprise data with third-party data sets unlocks even more value and productivity than ever before. In contrast, traditional companies will realize that proprietary internal data sets can be monetized and packaged for consumption by other companies, thus creating new revenue streams that will make it easier for enterprises to discover and use.
Mainstream AI and Deep Learning — As the toolset for AI applications continues to evolve, machine learning and deep learning platforms have entered the mainstream and will attain the same level of maturity as specialized data analytics. Just like we currently see a plethora of fully integrated managed services based on Apache Spark and Presto, in 2022 we will see vertical integrations emerging based on the likes of PyTorch and Tensorflow. MLOps for pipeline automation and management will become essential, further lowering the barriers and accelerating the adoption of AI and ML.
Digital Transformation 2.0 will usher in a culture of analytics across business units as more larger enterprises provide the self-service technologies and training to ensure the average knowledge worker is set up for success and able to directly perform analytics.
With the continued democratization of analytics, data scientists need to evolve from ‘problem solvers’ to ‘teachers.’ Organizations are now looking to fill these roles with someone who can articulate and explain – not just code to encourage people to be creative and think critically. However, there is an existing skills gap between data scientists as practitioners and those as teacher.
Fragmentation in the data and analytics space will level-off. In recent years, the AI/ML space has been complex, with more companies entering the space than the year prior. However, we will begin to see this trend curve and plateau as we enter a more mature space with increased consolidation in 2022.
Data mesh architectures become more enticing. As organizations grow in size and complexity, central data teams are forced to deal with a wide array of functional units and associated data consumers. This makes it difficult to understand the data requirements for all cross functional teams and offer the right set of data products to their consumers. Data mesh is a new decentralized data architecture approach for data analytics that aims to remove bottlenecks and take data decisions closer to those who understand the data.
In 2022 and beyond, larger organizations with distributed data environments will implement a data mesh architecture to minimize data silos, avoid duplication of effort, and ensure consistency. Data mesh will create a unified infrastructure enabling domains to create and share data products while enforcing standards for interoperability, quality, governance, and security.
In the season of predictions, it's likely that many will once again prophesize advancements with AI or ML. Companies and businesses are certainly finding innovative ways to leverage these technologies and techniques, but they have yet to really hit their stride in terms of adoption. The main challenges will remain the same: asking the right questions in our data, fusing both human and machine intelligence to answer them, and overcoming complexity. The hyperscalers will continue to introduce new services, and companies will spend to determine how they can truly help their business.
An important advancement in natural language understanding in the past few years has been the combination of different techniques to improve overall results and better tackle complex problems. This hybrid (or composite) approach mixes symbolic and machine learning to give us much more power and flexibility in addressing real world language problems.
We'll see greater adoption of this approach in 2022 because it can save a huge amount of time and money, while increasing accuracy, efficiency and speed. It also adds explain-ability to the mix (something very hard with ML only) and makes it far simpler to reuse knowledge from previous implementations thanks to the knowledge graph.
Cloud computing will make or break remote work — Cloud computing is now a must-have for businesses today and was critical during the COVID-19 pandemic. Data-driven organizations around the world are looking for solutions to speed time to data, safely share more data with more users, and mitigate the risk of data leaks and breaches. Characterized by remote workforces, cloud computing will continue to be essential for organizations looking for business continuity, increased scalability, and cost-efficiency.
According to the Immuta State of Data Engineering Survey, organizations are increasingly adopting multiple cloud technologies to keep up with the scale, speed, and use cases required by modern data teams. Nearly two-thirds (65%) of respondents characterized their company as either 100% cloud-based or primarily cloud-based, indicating a large market need for automated cloud data access control.
Smart city technology becomes ubiquitous: We’ll no longer use the term “Smart City” - not because the technology failed, but because the concept of “city” has eroded with population growth and ubiquitous connectivity. With that will come large increases in the adoption of individual “smart-city” technologies as a larger percentage of the population gains access or opts-in to highly mature and accessible connectivity and services.
AI/ML drive the citizen experience: Smart Government applications will look more like consumer apps and less like corporate intranets. The smartest cities will have integrated ML and AI in recommendation engines, support natural language interactions, deliver everything digitally and consider citizen experience the top requirement.
Uptick in Data Fabric — 2022 will see significant growth and interest in data fabric solutions as companies seek to leverage a common management layer to accelerate analytics migration to the cloud, ensure security and governance, quickly deliver business value by supporting real-time, trusted data across hybrid-multi-cloud – all in driving digital transformation. We believe this technology will be broadly adopted over the next five years.
Businesses will expect vendors to deliver comprehensive AI-enabled solutions for line of business teams instead of focusing on developer tools and technologies for IT — Much, if not most, of the AI industry has focused on developing robust tools for internal IT teams or consulting organizations to apply the technology for a specific use case in an enterprise application. In 2022, organizations will demand AI vendors begin developing specific AI-enabled solutions that can be implemented immediately without coding. By focusing on providing human-centered solutions to business users, vendors will enable individuals to immediately generate insights that drive decision making. Consequently, organizations will shift their investments in AI, moving away from highly customized solutions in favor of configurable (off-the-shelf) options.
Collaboration and BI have been inseparable since the start of the pandemic. At a time when the world tried to return to a degree of normalcy, the need to work together and collaborate sooner – and do so without data silos standing in the way – became even clearer. In striving to improve the way we come together around data, networks, and processes, we’ll see the advent of “collaboration mining,” enabling decisions to be tracked. Businesses have also learned that if they want to become truly data-driven, they’ll have to figure out how to run the right queries in the right place. Lastly, the API economy has opened up entirely new ways for businesses to unite for joint initiatives while reducing the relevancy of buy-versus-build. Automation is a strongly emerging area that removes the need to code these integrations, and I expect the technology to have a lasting impact in 2022.
AI moves to the real world, but slowly. While many advancements in machine learning and AI have demonstrated amazing accuracy on common tasks or online competitions, it still takes time for those advancements to make their way into industry to ultimately solve real problems for customers. Some of that is due to the need for things like domain-specific annotated data, or the required computing power to run these systems/models. Because of these constraints, we'll see a slow, but steady, stream of advancements moving from research to reality.
AI delivers real-world results. In the past, organizations would have little to show for their AI investments because of the hyper-focus on model building and model performance. AI will not only be used for unique breakthrough projects, but instead organizations will find value in applying AI techniques to established projects to achieve best-in-class results. For an AI product or service to be successful, it will incorporate elements that will help make an outcome better, or a process faster or cheaper. The value of AI will be determined not by how well it models the real world, but by how it helps improve it.
For years, we’ve heard that the future of analytics will go beyond descriptive analytics (what happened) and predictive analytics (what will happen) to prescriptive guidance (what to do about it). Advancements in AI combined with automation are finally making this possible by dynamically combining relevant data and alerting knowledge workers to take action, in advance, before an event occurs. In 2022, prescriptive analytics will evolve from telling us just where the numbers are going, to helping us make smarter, proactive decisions.
Organizations have also started to realize that not everyone has time/interest in becoming a data analyst or data literate. In 2022, many will redefine what it means to build a "culture of analytics" by bringing insights to workers in a more digestible way - turning to methods like embedded analytics that won’t require new skills or additional time investment.
The Era of Big Data Centralization and Consolidation is Over — The importance of centralized or consolidated data storage will also come to the forefront in 2022. To be clear this trend isn’t the end of storage, but is the end of centrally consolidated approaches to data storage particularly for analytics and app dev.
In 2022, we will see the continuation of the big fight that’s brewing in the data analytics space as old ways of managing enterprise data, focusing on patterns of consolidation and centralization, reach a peak and then start to trend downward. Part of what we’re about to see unfold in the big fight between Snowflake and Databricks in 2022 and beyond is a function of their differing approaches to centralized consolidation.
But it’s not just technical pressures. The economics of unavoidable data movement in a hybrid multicloud world are not good and don’t look to be improving. Customers and investors are pushing back against the kind of lock-in that accompanies centralization approaches so anticipate the pendulum swinging in the direction of decentralization and disintermediation of the data analytics stack in the coming year.
The Conversation Around Data for AI Will Be Prioritized — The discussions around data for AI have started, but they haven’t nearly received enough attention. Data is the most critical aspect for building AI systems, and we are just now starting to talk and think about the systems to acquire, prepare, and monitor data to ensure performance and lack of bias. Organizations will have to prioritize a data-first approach within an enterprise architecture in 2022 to enable AI and analytics to solve problems and facilitate new revenue streams.
Synthetic Data Will Be a Requirement to Build the Metaverse — The metaverse cannot be built without the use of synthetic data. To recreate reality as a digital twin, it’s necessary to deeply understand humans, objects, 3D environments, and their interactions with one another. Creating these AI capabilities requires tremendous amounts of high-quality labeled 3D data––data that is impossible for humans to label. We are incapable of labeling distance in 3D space, inferring material properties or labeling light sources needed to recreate spaces in high-fidelity. Synthetic data built using a combination of generative AI models and visual effects (VFX) technologies will be a key enabler of the AI models required to power new metaverse applications.