Overview of MLOps
Building a machine learning model is great, but to provide real business value, it must be made useful and maintained to remain useful over time. Machine Learning Operations (MLOps), overviewed here, is a rapidly growing space that encompasses everything required to deploy a machine learning model into production, and is a crucial aspect to delivering this sought after value.
By Steve Shwartz, AI Author, Investor, and Serial Entrepreneur.
Photo: iStockPhoto / NanoStockk
Considerable data science expertise is usually required to create a dataset and build a model for a particular application. But building a good model is usually not enough. In fact, it is not nearly enough. As illustrated below, developing and testing a model is just the first step.
Machine Learning Model Lifecycle.
Machine Learning Operations (MLOps) is everything else required to make that model useful, including capabilities for an automated development and deployment pipeline, monitoring, lifecycle management, and governance, as illustrated above. Let’s look at each of these.
Creating a production ML system requires multiple steps: First, the data must undergo a series of transformations. Then, the model is trained. Usually, this requires experimentation with different network architectures and hyperparameters. Often, it is necessary to go back to the data and try different features. Next, the model must be validated with unit tests and integration tests. It needs to pass tests for data and model bias and explainability. Finally, it is deployed into a public cloud, an on-premise environment, or a hybrid environment. Additionally, some steps in the process might require an approval workflow.
If each of these steps is performed manually, the development process tends to be slow and brittle. Fortunately, many MLOps tools exist to automate these steps from data transformation to deployment end-to-end. When retraining is necessary, it is an automated, reliable, and reproducible process.
ML models tend to work well when first deployed and then work less well over time. As Forrester analyst, Dr. Kjell Carlsson said: “AI models are like six-year-olds during quarantine: They need constant attention . . . otherwise, something will break.”
It is critical for deployments to include various types of monitoring so that ML teams can be alerted when this starts to happen. Performance can degrade due to infrastructure issues such as inadequate CPU or memory. Performance can also degrade when the real-world data that constitute the independent variables that are input to the model start to take on different characteristics than the training data, a phenomenon known as data drift.
Similarly, the model may become less applicable because real-world conditions change, a phenomenon known as concept drift. For example, many predictive models of customer and supplier behavior were sent into a tailspin by COVID-19.
Some companies also monitor alternative models (e.g., different network architectures or different hyperparameters) to see if any of these “challenger” models starts performing better than the production model.
Often, it makes sense to put guardrails around decisions made by the model. These guardrails are simple rules that either trigger an alert, prevent the decision, or put the decision into a workflow for human approval.
When model performance starts to degrade due to data or model drift, model retraining and possibly model re-architecture are required. However, the data science team shouldn’t have to start from scratch. In developing the original model, and perhaps in prior re-architectures, they probably tested many architectures, hyperparameters, and features. It’s critical that all these prior experiments (and results) are recorded so that the data science team doesn’t have to go back to square one. It’s also critical for communication and collaboration between data science team members.
Machine learning models are being used for many applications that impact people like bank loan decision, medical diagnosis, and hiring/firing decisions. The use of ML models in decision-making has been criticized for two reasons: First, these models are subject to bias, especially if the training data results in models that discriminate based on race, color, ethnicity, national origin, religion, gender, sexual orientation, or other protected classes. Second, these models are often black boxes that don’t explain their decision-making.
As a result, organizations that used ML-based decision-making are under pressure to ensure their models don’t discriminate and are capable of explaining their decisions. Many MLOps vendors are incorporating tools based on academic research (e.g., SHAP and Grad-CAM) that help explain the model decisions and are using a variety of techniques to ensure that the data and models are not biased. Additionally, they are incorporating bias and explainability tests in their monitoring protocols because models can become biased or lose explanatory capability over time.
Organizations also need to build trust and are starting to ensure that on-going performance, lack of bias, and explainability are auditable. This requires model catalogs that not only document all the data, parameter, and architecture decisions but also log each decision and provide traceability so that it can be determined what data, model, and parameters were used for each decision, when the model was retrained or otherwise modified, and who made each change. It is also important for auditors to be able to repeat historical transactions and to test the boundaries of model decision-making with what-if scenarios.
Security and data privacy are also key concerns for organizations using ML. Care must be taken to ensure the personal information is protected and role-based data access capabilities are essential, especially for regulated industries.
Governments around the world are also moving quickly to regulate ML-based decision-making that affects people. The European Union has led the way with its GDPR and CRD IV regulations. In the US, several regulatory agencies, including the US Federal Reserve Bank and the FDA, have created regulations around ML-based decision-making for financial and medical decisions. A more comprehensive law, the recently proposed Data Accountability and Transparency Act of 2020, is slated for Congressional consideration in 2021. Regulations will likely evolve to the point where CEO’s need to sign off on the explainability of and the lack of bias in their ML models.
The MLOps Landscape
As we continue in 2021, the market for MLOps is exploding. According to analyst firm Cognilytica, it is expected to be a $4 billion market by 2025.
There are big players and small players in the MLOps space. Major ML platform vendors like Amazon, Google, Microsoft, IBM, Cloudera, Domino, DataRobot, and H2O are incorporating MLOps capabilities into their platforms. According to Crunchbase, there are 35 private companies in the MLOps space who have raised between $1.8M and $1B in financing and who have between 3 and 2800 employees on LinkedIn:
|Financing ($millions)||Number of Employees||
|Cloudera||1000||2803||Cloudera delivers an Enterprise Data Cloud for any data, anywhere, from the Edge to AI.|
|Databricks||897||1757||Databricks is a software platform that helps its customers unify their analytics across business, data science, and data engineering.|
|DataRobot||750||1105||DataRobot brings AI technology and ROI enablement services to global enterprises.|
|Dataiku||246||556||Dataiku operates as an enterprise artificial intelligence and machine-learning platform.|
|Alteryx||163||1623||Alteryx accelerates digital transformation by unifying analytics, data science and automated processes.|
|H2O||151||257||H2O.ai is the open source leader in AI and automatic machine learning with a mission to democratize AI for everyone.|
|Domino||124||232||Domino is the world's leading Enterprise Data Science Platform, powering data science at over 20% of the Fortune 100.|
|Iguazio||72||83||The Iguazio Data Science Platform enables you to develop, deploy and manage AI applications at scale and in real-time|
|Explorium.ai||50||96||Explorium offers a data science platform powered by augmented data discovery and feature engineering|
|Algorithmia||38||63||Algorithmia is a machine learning model deployment and management solution that automates the MLOps for an organization|
|Paperspace||23||37||Paperspace powers next-generation applications built on GPUs.|
|Pachyderm||21||32||Pachyderm is an enterprise-grade data science platform that makes explainable, repeatable, and scalable AI/ML a reality.|
|Weights and Biases||20||58||Tools for experiment tracking, improved model performance, and results collaboration|
|OctoML||19||37||OctoML is changing how developers optimize and deploy machine learning models for their AI needs.|
|Arthur AI||18||28||Arthur AI is a platform that monitors the productivity of machine learning models.|
|Truera||17||26||Truera provides a Model Intelligence platform for enterprises to analyze machine learning.|
|Snorkel AI||15||39||Snorkel AI is focused on making AI practical through Snorkel Flow: the data-first platform for enterprise AI|
|Seldon.io||14||48||Machine Learning Deployment Platform|
|Fiddler Labs||13||46||Fiddler enables users to create AI solutions that are transparent, explainable, and understandable.|
|run.ai||13||26||Run:AI develops an automated distributed training technology that virtualizes and accelerates deep learning.|
|ClearML (Allegro)||11||29||ML / DL Experiment Manager and ML-Ops Open-Source Solution End-to-End Product Life-cycle Management Enterprise Solution|
|Verta||10||15||Verta builds software infrastructure to help enterprise data science and machine learning (ML) teams develop and deploy ML models.|
|cnvrg.io||8||38||cnvrg.io is a full stack data science platform that helps teams manage models, and build auto-adaptive machine learning pipelines|
|Datatron||8||19||Datatron provides a single model governance (management) platform for all of your ML, AI, and Data Science models in production|
|Comet||7||19||Comet.ml is a machine learning platform designed to help AI practitioners and teams build reliable machine learning models.|
|ModelOp||6||39||Govern, Monitor and Manage all models across the enterprise|
|WhyLabs||4||15||WhyLabs is the AI observability and monitoring company.|
|Arize AI||4||14||Arize AI offers a platform that explains and troubleshoots production AI.|
|DarwinAI||4||31||DarwinAI’s Generative Synthesis 'AI building AI' technology enables optimized and explainable deep learning.|
|Mona||4||11||Mona is a SaaS monitoring platform for Data and AI driven systems|
|Valohai||2||13||Your Managed Machine Learning Platform that lets data scientists build, deploy and track machine learning models.|
|Modzy||0||31||The secure ModelOps platform to discover, deploy, manage, and govern machine learning at scale—getting to value faster.|
|Algomox||0||17||Catalyze Your AI Transformation|
|Monitaur||0||8||Monitaur is a software company that provides auditability, transparency, and governance for companies using machine learning software.|
|Hydrosphere.io||0||3||Hydrosphere.io is a platform for AI/ML operations automation|
Many of these companies focus on just one segment of MLOps, such as automation pipeline, monitoring, lifecycle management, or governance. Some argue that using multiple, best-of-breed MLOps products are better for data science projects than monolithic platforms. And some companies are building MLOps products for specific verticals. For example, Monitaur positions itself as a best-of-breed governance solution that can work with any platform. Monitaur is also building industry-specific MLOps governance capabilities for regulated industries, starting with insurance. (Full disclosure: I am an investor in Monitaur).
There are also a number of open-source MLOps projects, including:
- MLFlow manages the ML lifecycle, including experimentation, reproducibility, and deployment, and includes a model registry
- DVC manages version control for ML projects to make them shareable and reproducible
- Polyaxon has capabilities for experimentation, lifecycle automation, collaboration, and deployment, and includes a model registry
- Metaflow is a former Netflix project for managing the automation pipeline and deployment
- Kubeflow has capabilities for workflow automation and deployment in Kubernetes containers
2021 promises to be an interesting year for MLOps. We’ll likely see rapid growth, tremendous competition, and most likely, some consolidation.
Bio: Steve Shwartz (@sshwartz) started his AI career as a postdoc at Yale University many years ago, is a successful serial entrepreneur and investor, and is the author of “Evil Robots, Killer Computers, and Other Myths: The Truth About AI and the Future of Humanity”.
- A Machine Learning Model Monitoring Checklist: 7 Things to Track
- How to Use MLOps for an Effective AI Strategy
- MLOps: Model Monitoring 101
Top Stories Past 30 Days