Scale and Govern AI Initiatives with ModelOps
AI/ML model life cycle automation and orchestration ensures reliable model operations and governance at scale. The path to production for each enterprise model can vary, along with different monitoring, continuous improvement, retirement needs. Organizations must now consider ModelOps as a fundamental capability towards operational excellence and immediate ROIs.
By Giuliano Liguori, a technologist and influencer in the Digital Transformation space and AI.
What is ModelOps?
Managing models in production is challenging. To optimize the value of Artificial Intelligence, AI models must improve efficiency in business applications or support efforts to make better decisions as they run in production. ModelOps is the key capability for scaling and governing enterprise AI initiatives across the organization and ensuring that the maximum value is obtained from such enterprise AI initiatives.
This article will talk about the requirements for systems that should be put in place to support this ModelOps capability. We will be drawing examples from real cases that use advanced production enterprise systems to orchestrate and automate the operationalization of models throughout their life cycle for scalable ModelOps.
In these short lines, it will not be possible to cover all the challenges and details relating to ModelOps capability. However, at the end of the article, we will try to give you the right references that will be useful for you to deepen and learn more about ModelOps.
Several organizations talk about ModelOps, how they have implemented it, and what ModelOps should be for the big business. A good example comes from Gartner, which in a paper published in August 2020 — Innovation insights for ModelOps — they have declared “ModelOps lies at the center of any organizations’ enterprise AI strategy.”
There are also many other publications from Gartner and Forester, just to name a few, as well as a growing community around ModelOps available to those who wish to learn more. However, the main focus of this article is to highlight the fundamentals around which to begin your journey to ModelOps.
So, let’s start with the main question about ModelOps, which is why ModelOps? What is the value, and what is the pain that drives large organizations to start investing seriously in a ModelOps capability?
Fig. 1 — Why ModelOps? — Image by source, edited with permission by the author.
We have listed a few here to the right of Figure 1, and from our point of view, we think they might be in the following order:
- Control Risk: more and more business initiatives and central decisions are guided with models derived from AI algorithms and ML models, which are really impacting existing governance and risk structures.
- Shorten Models Time to Business: a gradual transition of newer modelling techniques into existing structures drives a large organization to understand and better leverage ModelOps capabilities. On the other hand, there is a relentless need to introduce those models into business as quickly as possible. Therefore, an effort in terms of time and energy will be required in developing these models, which sometimes have a notoriously unpredictable shelf life such as applicability to a particular business regime. So introducing these models quickly into business turns out to be a driver for the core functionality of ModelOps.
- Increase Transparency: increase transparency and accountability in order to be able to know where your models are at any time.
- Unlock Value of AI Investments: obviously, all of this leads to unlocking the value of corporate AI initiatives not just now but over time as the number of such initiatives will inevitably increase.
If we look at the past 30 months, we see a huge increase in AI initiatives. Driven by the exponential speed of change, organizations and startups are working towards adopting ModelOps with the goal to defend and expand their market opportunities.
Traditional Models vs. AI/ML Models
Within that context, most of today’s organizations are moving aggressively to adopt more agile, efficient AI and Machine Learning application delivery and, of course, IT management practices to meet customers’ evolving expectations.
As shown in Figure 2, on the left, we have (grey hexagons) representations of typical enterprise-wide assets things, such as enterprise-wide tooling, enterprise-wide hardware machines, enterprise facilities, enterprise-wide software and processes, and a class typology of specifically traditional models, which affect the corporate regulatory and governance structures. In financial services, these are often models that are registered in a model risk management system or MRM. So traditionally, in conventional models, statistical models derived from business and domain expertise tend to be more in the business unit context, not the enterprise context, really focused on driving the performance of a specific business unit.
Fig. 2 — Traditional Models vs. AI/ML Models — Image by source, edited with permission by the author.
What’s different between traditional models and AI/ML models? AI / ML models are decision-making models like traditional models. These models actually come from the get-go demand, as they are derived from proprietary data, usually associated with a business problem. They are very complex models
- as they use an algorithmic approach in the context of AI/ML;
- in terms of the relationship with the governance structures of company process approvals.
Furthermore, in addition to fitting perfectly into a decision-making framework, they also have an impact on a technical level, as they tend to involve the company’s technical structure. Due to their technical complexity, it may be necessary to go through processes such as DevOps, enterprise security, IT operation, reside on cloud services, or for other reasons, be on on-premise infrastructures.
In the world of artificial intelligence and machine learning, these models, therefore, are a sort of enterprise asset much more than traditional models. They feed the Business Units and, as we have seen, in addition to being very complex and have more rapid um life cycles and can very unpredictable shelf life. For this reason, there is a needs to refresh these models very quickly, bearing in mind that they may also have a different frequency of refresh.
So all this leads to leads to enterprise AI and ML models being uniquely challenging to manage, and if a company or organization wishes to internalize the governance, monitoring, and orchestration of these models, they must take into account that these represent a new class of enterprise-wide asset that will really provide a foundation for them to go forward and ultimately understand the why and how of ModelOps as a capability.
All these models have their own life cycles and are temporally related to various existing business processes and various technical processes existing at the company level.
Figure 3 gives a rough idea of a life cycle. In the purple area, there are the model factories that include all the stuff that happens from a data science perspective for models to be created. Models are not traditional software because they do not have the same relationship to enterprise patterns due to their uniqueness, but the model life cycles certainly lend themselves to enterprise-wide patterns.
Fig. 3 — Models Life Cycle — Image by source, edited with permission by the author.
As you can see from this example, oftentimes, the packaging deployment and execution are provided by the model factories themselves. It is also possible to wrap this into a docker container and publishes it, or a ModelOps platform can provide that capability on its own, so there is on a per-model basis exactly the details of packaging deployment and execution part of the life cycle.
ModelOps is about understanding the overall life cycle of models and their integration with business and technology in order to deliver value over time in the enterprise.
In the green part of this illustration, we can appreciate a typical operational cycle. Here are provided Inferences monitor, the monitor of Concept Drift, things like Statistical Performance or Data Drift, the ongoing monitoring of models as well as the governance structure in the lower right regulatory compliance audits. If there are any changes to the model like, for example, in Retraining or Champion Challenger, which might be very complicated automatic automated retraining processes, this could kick off a new regulatory or compliance or accountability process.
Imagine an enterprise that has hundreds or thousands of models and must be able to know exactly where each model is in its life cycle is tremendously complex and becomes a real challenge to scale. If it is not done well, it impacts the ability to govern and provide accountability and auditability things like ethical fairness and bias, for example, on a per-model basis.
Models are assets, and they are complex assets that are not like traditional assets. They need life cycles, and these life cycles are also complex that must be automated to scale, and have complex relationships with existing business and technical processes.
Fig. 4 — MLC across the enterprise — Image by source, edited with permission by the author.
Beyond that, it is now important to note that, due to their kind of uniqueness, models are not like software as they do not commodify like software. Businesses don’t know if they can have many small data models or many large data models. Model lifecycles, like any enterprise architecture consideration, tend to lend themselves to enterprise-wide models. We believe lifecycles are an enterprise-wide concern, and this sets the role of enterprise AI architect. In Figure 4, we have some example lifecycles such as the fraud detection lifecycle, with other use cases being anti-money laundering and some financial services. However, all these life cycles need to be managed and designed, and they are truly architectural assets.
AI Orchestration Platform
Businesses need to be able to continuously govern, automate, and monitor these assets throughout their lifecycles, and that’s exactly what a modern production ModelOps system like ModelOps Center does.
This kind of system offers enterprises a way to centrally, consistently, and efficiently manage all their AI/ML models. The solution enables teams to optimize the entire model operations lifecycle, from initial deployment through to retirement.
For the scope of this blog, we tested the features of ModelOp Center. The platform enables the teams to automate and orchestrate model monitoring and governance. The solution delivers all the key capabilities teams need to establish reliable, compliant, and scalable AI initiatives. With these capabilities, teams can maximize the value of their models, boost operational and cost efficiency, and control risk.
Fig. 5 — Key Capabilities of a modern production ModelOps system — Image by source, edited with permission by the author.
- The solution enables teams to:
- Define, refine, standardize and automate every step involved in model operations.
- Use pre-defined processes for registration, operationalization, risk management, and monitoring.
- Design consistent, optimized, end-to-end operational processes for all models while enabling flexible customization based on metadata collected.
- Teams can establish continuous verification of model compliance. As a result, we can consistently, authoritatively ensure ongoing enforcement of regulatory mandates, business policies, and risk controls.
- Monitors cover a range of areas, including data drift, concept drift, ethical fairness, interpretability and population scoring, characteristic stability, and more.
- Prepackaged Integrations includes:
- AI model factories.
- Model frameworks.
- Model workbenches.
- Shared IT systems.
- Cloud-based ML services.
- BI visualization tools
- The solution allows a unified way to track and manage all models, and it offers the sophisticated visibility that streamlines tracking, governance, and reporting efforts.
Regardless of the type of business we are supporting, by adopting the right modern production ModelOps system, we will be able to provide automation and governance at the enterprise level and ensure that all questions at the enterprise level can be easily resolved with respect to every enterprise AI initiative, at the moment. That corporate AI is becoming a kind of strategic direction for most decisions in large organizations.
ModelOp Center is an example of a system that supports those fundamental challenges in managing AI and ML models as well as being able to automate at scale very complex life cycles that are inclusive of business and technical challenges and provide the organization with a capability to know where all models are.
Fig. 6 — Questions ModelOp Center Answers — Image by source, edited with permission by the author.
4 Steps to Successful Model Operations
Organizations have been using models to help with business decisions for decades. However, AI and machine learning models introduce new risks into model operationalization (post-development). Many model operations processes are manual or managed using home-grown solutions that constantly need to be updated as new technologies, tools, and governance requirements are introduced.
As a result, over half of the models developed do not get deployed, and those that take months to operationalize often leading to suboptimal outcomes and delayed or diminished value.
Here are 4 steps that any organization can take to successfully operationalize AI/ML or any other type of model.
- Define the end-to-end model operation process (referred to as the model life cycle)
- The first step of preparing a model for production use is establishing the end-to-end model operations process, referred to as the model life cycle (MLC).
- An Enterprise AI Architect typically has the responsibility for designing model life cycles.
- Deploy models
- Deployment is the method by which you integrate a model into an existing production environment to make practical business decisions based on data.
- Typically, the data scientist is responsible for deploying models.
- Monitor the models in production
- Monitoring begins when a model is first implemented in production systems for actual business use and continues until the model is retired, and sometimes even beyond as a historical archive.
- A Model Operator typically has the responsibility for monitoring the health of models in production.
- Govern model operations
- Models are a form of intellectual capital that should be governed as a corporate asset. They should be inventoried and assessed using tools and techniques that make auditing and reporting as efficient as possible.
Automating and orchestrating all aspects of the model life cycle ensures reliable model operations and governance at scale. Each model in the enterprise can take a wide variety of paths to production, have different patterns for monitoring, and various requirements for continuous improvement or retirement.
Companies need to start thinking of ModelOps as a fundamental capability that can truly lead to a level of excellence in the pursuit of the business and ensure that investments have a guaranteed and immediate ROI. It is not something that can be put off any longer.
- ModelOps: Govern and Scale AI initiatives
- ModelOp Center Scale and Govern AI Model Operations
- 4 Steps to Successful Model Operations
- Gartner “Innovation Insights for ModelOps” Report
- Operationalizing AI
Original. Reposted with permission.
Bio: Giuliano Liguori (@ingliguori) is a technologist, an influencer in the Digital Transformation space and in AI, and his expertise and career position him to represent the CIO and CTO’s perspective.
- MLOps and ModelOps: What’s the Difference and Why it Matters
- Including ModelOps in your AI strategy
- MLOps And Machine Learning Roadmap