MLOps is an Engineering Discipline: A Beginner’s Overview

MLOps = ML + DEV + OPS. MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning.



By Angad Gupta, Data Science Student

Introduction

 
MLOps is a combination of ML + DEV + OPS. MLOps basically helps to increase production scalability and quality of production models by increasing automation.

MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning. It is the creation of an automated environment for model development, model retraining, drift monitoring, automation of pipeline, quality control, and governance of a model into a single platform.

Figure
Image source: techinnocens

 

An MLOps team includes the data scientists who curate datasets and design AI models and ML engineers who run those models and datasets in the automated ways.

 

Why MLOps is important

 
An MLOps team will help you the following issues:

Deployment issues:

  1. Machine learning build with multiple languages
  2. Model deployment on development & production environments
  3. Troubleshooting issues raised during model deployments
  4. Preparedness of deployment packages with different languages

Monitoring Issues:

  1. Model performance monitoring
  2. Consistent way to monitor the models deployed across the organization

Model lifecycle management issues:

  1. Needing the involvement of data scientists to update the production models and maintenance activities
  2. Keeping track of model decay after initial deployments

Model Governance:

  1. Production access control
  2. Traceable model results
  3. Model audit trails
  4. Model upgrade approval workflows

 

Goals of MLOps

 
The goals of MLOps include:

  • Deployment and automation
  • Model training and upgrading
  • Operation diagnostics & fixes
  • Data governance and business regulatory compliance
  • Production scalability
  • Team collaboration
  • Monitoring and management

 

Major Benefits

 
Creation of reproducible workflows pipelines and ML models: Pipelines are the backbone of the infrastructure of the machine learning workflow. Pipelines help to get the data from the source systems, and process and validate the data. It also keep track of all the activities such as model version, dataset being used to train the models, etc.

  • Create machine learning pipelines to design, deploy and reproduce model deployment
  • Provide a mechanism to trace the code version, data and various matrices as well as execution logs

Easy model deployment in any production environment: Machine learning models are complex in nature, and each deployment requires the resources to run models efficiently. Deployment of machine learning models require automated system to provide and manage the required resources and execute properly.

  • Deployment of machine learning models quickly and perfectly
  • Automated control of the usage of cloud resources
  • Running model validation and various tests before deployment
  • Predefined dedicated system to migrate models from deployment to production systems

Management of machine learning life cycle: A final machine learning model can have many associated micro and ancillary services embedded within it. It is required to keep track of the all the associated resources used in the machine learning models for further enhancement and verification purposes.

  • Use effective integration tools to track the model development and its components and integrate all the components via dedicated tools
  • Advanced bias data analysis to cross verify model performance over a period of time

Machine learning resource control and management: Machine learning models are required to train continually with different datasets, so it is mandatory to have them keep track of the model version, code version, data set version, and associated required resources.

  • Keep track of model version history for audit purposes
  • Evaluate the importance of features and create more advanced models with minimal bias using uniform distribution metrics
  • Set a resource quota and establish proper policies for increasing/decreasing these resources as requirement to run the model efficiently
  • Create audit trails to meet regulatory requirements as you mark machine learning resources and automatically trace experiments

 

Best Practices

 
ML pipelines: Setup of various ML pipelines, such as a data pipeline, to define the dependencies and its execution order and produce the matrices for the monitoring of a particular pipeline's resources

Hybrid teams: MLOps includes the work of a data scientist, machine learning engineer, DevOps engineer and data engineer; such a hybrid team will hopefully, by design, handle issues quickly and efficiently

Model and data Versioning: In addition to maintain the code version, we also need to maintain the machine learning model version and data used for training the model, hyperparameters of the model, and meta-data of models, etc.; there is more to model versioning than just the resultant model itself

Model validation: There is a need to setup the statistical tests for model validation because model validation can’t be pass/fail or true/false; it is much more nuanced, and there are lessons that can be learned from detailed statistical tests

Data validation: Before training a model on the provided data, input data has to be validated to avoid inserting uncertainty and bias from the model

Monitoring: As training and deploying models takes up more and more resources, it is become more important to monitor model performance in the environment by visualizing the various matrices of resources being used by the model

 

Platforms and tools to assist with MLOps

 
As alluded to above, the following types of platforms and tools can assist with MLOps:

  • Those tools specifically for model tracking, model history, and model registry related information
  • Those tools designed for model versioning, and versioning the various individual aspects of models (code, data sets, etc.)
  • Cloud service platforms to execute the model experiments as well as the deployment of models and ML pipelines

 

Conclusion

 
MLOps is a new branch of engineering disciplines. It’s a hybrid team of machine learning engineers, DevOps and data scientists which helps in retrieving the data, validating it, deploying the machine learning models, and training them with the proper datasets. MLOps also helps to monitor the model output to optimize the model, runs and produces the desired output seamlessly. MLOps is very helpful for deploying and training models and keeping track of the models and associated datasets.

 
Bio: Angad Gupta is working as a customer delivery engineer at AutoGrid India Pvt Ltd and pursuing M.Tech in data science from Bits Pilani. You may follow him on LinkedIn.

Related: