Operationalizing Machine Learning from PoC to Production
Most companies haven’t seen ROI from machine learning since the benefit is only realized when the models are in production. Here’s how to make sure your ML project works.
Robot cartoon vector created by vectorjuice - www.freepik.com
Many companies use machine learning to help create a differentiator and grow their business. However, it’s not easy to make machine learning work as it requires a balance between research and engineering. One can come up with a good innovative solution based on current research, but it might not go live due to engineering inefficiencies, cost and complexity. Most companies haven’t seen much ROI from machine learning since the benefit is realized only when the models are in production. Let’s dive into the challenges and best practices that one can follow to make machine learning work.
First, let’s frame the main phases for a typical AI project:
- Proof of Concept (PoC) – The purpose of this phase is to validate if Machine Learning can really solve the problem at hand. At the same time, estimate the cost and time required to solve the problem in production.
- Engineering – As the name suggests, the focus at this stage is on model engineering for scale and reliability, setting up the validation framework and data pipelines.
- Maintenance or AI Operations – This is an important step as the data is not static and keeps changing, which means the models need to be trained with new data and deployed to production frequently. This cannot be done effectively, without automating the model training, deployment, validation and finally monitoring.
Next, it’s important to understand the key challenges that data scientists need to address while designing machine learning solutions to ensure an ROI for the business. For each of the challenges, there are recommendations to successfully overcome these challenges.
- Data – Quality and Quantity
- Choice of algorithms – Accuracy vis-à-vis Cost
- Model Performance in Production
- Deployment Automation and Monitoring (MLOps)
- Handling Scale
Data – Quality and Quantity
The most important part of any machine learning model is data, and to train a model, quality and quantity are both required. Let’s start with the quantity first.
Challenge 1: Not Enough Data
Many times, the company or team either has too little data or too much data. In the PoC phase, even if there isn’t a large amount of data, it can easily validate the approach, but the solution may not work for all production scenarios. This is because the dataset used for training and validation may not represent the entire data and in some cases it may be impossible to get the data– for example in the case of rare medical diseases.
Let me elaborate this aspect with an instance where one of our customers in the legal domain wanted to classify documents based on regulations and extract information for each regulation. The initial training dataset was just 20 documents and the goal was to classify a million documents. The challenge with a small training dataset is to ensure that the model does not overfit to the data. Overfitting will ensure that the accuracies are good in the PoC stage (where we are not validating a model with unseen data), but not when the model is rolled out to production. We chose a simple one class classification model that could learn patterns from the small data set without overfitting that was rolled out to production and improved once we had more data.
Choose algorithms that can work with small data, roll out to production by scoping the feature to solve only the seen patterns and collect data for the unseen patterns. Iteratively improve the model to cover other scenarios. Alternatively, use semi-supervised learning which can handle less data problems.
Challenge 2: Too Much Data
Similar to small data, large data is also a problem because the training time for the machine learning model could be very long and the computation power needed may be very high. We need to look at creating a subset dataset that is representative of the complete dataset. The challenge is not only to pick the right sampling technique, but also to automate the sample collection process to ensure that the data is not outdated.
We faced a similar scenario where we worked on a bid price prediction problem for an advertising company in the real-time bidding space that was getting almost 70 billion requests per day. We couldn’t train models on whole data and even capturing the data needed for the PoC required some engineering to be put in place to ensure that the distribution of the original data and the sampled data remained the same.
Automate the training dataset creation by integrating the data sampling module into the data processing pipeline. To resolve fewer data issues, use oversampling techniques such as SMOTE (Synthetic Minority Oversampling Technique), which can generate synthetic samples from current data distributions.
Challenge 3: Quality of Data
Quality of data also matters because what the quality of data fed to the model dictates the quality of AI that comes out of the model. The distribution of data for training in the POC phase might be different from the distribution of data in production resulting in higher errors in production. This happens primarily due to using incorrect sampling techniques.
To resolve the big data issue, we have to use good sampling techniques to create the best training data for the problem statement. In the real-time bidding project we used random sampling because data is from uniform distribution.
Choice of Algorithms
In the PoC phase of the project, data scientists mainly focus on solutions and results. However, any organization will approve the project only if the solution is cost-effective (compared to the current solution) and stands out in performance (response time, etc.). It’s important to look at the running cost of the machine learning solution while choosing the algorithm.
The cost of the machine learning solution depends on the algorithm or technique we choose. For example, training a deep learning model from scratch with the domain specific data might result in good accuracies, but will need multiple GPUs to train, while a classical machine learning based solution might not require a GPU. The computation costs could increase non-linearly depending on the approach and the size of the training data.
I have experienced similar troubles. While working on a voice cloning problem, we had an option of training the Convolution Neural Network based Deep Learning model from scratch or use transfer learning. We chose to go with transfer learning based solution as the cost was 60% lower and the cloning accuracy was good enough to not see the difference in production.
While thinking about machine learning solutions, we should also be careful about infrastructure costs. These costs should be a metric while comparing different models along with the accuracy metric. A few ways to control infrastructure costs are:
- Avoid using GPUs where the task can be completed using CPUs. For example, classical machine learning models we can train using CPUs.
- Use transfer learning wherever possible.
- Train the model only when required. For example, when the data distribution changes or new categories are added. Periodically training models without much change in data, would increase infrastructure overhead for the project.
Standard metrics such as MAPE, F1 score, IOU etc. are available to measure the model performance. These metrics work well when testing the model in the PoC phase but may not work well when we look at the ROI for business. It’s also possible that the machine learning model could impact the business negatively if implemented incorrectly.
An example of this is a ratings prediction model that we built for an advertising company where the model accuracy was good from a machine learning metrics perspective, but the measure of success for business was revenue maximization. We had to come up with an ensemble approach to achieve the revenue maximization.
To minimize the impact, roll out the solution incrementally using an A/B testing approach. The key thing here is to have the deployment automated to be able to roll back the ML solution when needed.
Deployment Automation & Monitoring
This is the most important part of the entire project, but most people ignore it due to a lack of understanding of Machine Learning in the engineering and DevOps teams. It’s important to continuously monitor the model performance, as the model results can change if the incoming data changes. During this phase, you need to overcome the following issues:
- Most data engineers do not understand the sampling, data preparation and model validation techniques and hence are unable to set up data pipelines that enable collection of training data and validation of models.
- A common lack of machine learning experience in DevOps teams to manage model versioning, rollbacks etc.
- Lack of tools to help monitor the machine learning models in production.
For the bid price prediction problem mentioned earlier, we modified the data processing pipeline to generate the training data by sampling and store it in a common place where the training pipeline could be triggered if the data distribution changed. The DevOps team automated the model deployment by rolling the model incrementally to one cluster at a time. The DevOps team also setup monitoring of the model accuracy to generate alerts when the accuracy dropped below a particular threshold.
A data scientist should define the metrics and indicators for the model performance in the production environment before developing the solution. Once defined, the data scientist needs to work closely with the engineering and the DevOps teams to setup the data processing pipelines and monitoring
There are 2 important factors that define a ML model scale:
- Number of requests the model can handle
- Execution time (Latency)
We can scale the number of requests a model can handle by linearly scaling the model deployment. Execution time is a critical factor for scale as it adds to the processing latency and thereby affecting the scale.
A machine learning PoC is like a research project with different types of experiments to figure out the right algorithm or approach for the problem at hand. The time complexity of the solution with the best accuracy could be high, resulting in a high latency. It’s important to consider the complexity of the while building the solution as it might be best to let go of the accuracy if it can simplify the solution.
Using the bidding problem again as an example, we used reinforcement learning to predict the bid price in real time. The overall time taken for prediction had to be less than 60-70ms which meant that the machine learning model execution time had to be much less to avoid timeouts. To reduce the overall execution time, we reduced the complexity by reducing the complexity of the environment in the Reinforcement Learning model, implemented the model in JAVA instead of python and deployed it in process.
Execution time is an important metric while choosing an algorithm. At times it is best to compromise on accuracy if the overall latency can be reduced with simpler algorithms.
To successfully deploy machine learning models in production:
- Define metrics and indicators for validating the model performance in production during the PoC phase.
- Choose a machine learning approach based on the infrastructure budget and the latency requirements.
- Automate the data processing pipeline for training and execution.
Alakh Sharma is a Data Scientist at Talentica Software, a global product development company that helps startups build their products. Alakh is an Indian Institute of Science, Bangalore alumnus. He helps businesses gain a competitive edge with the adoption of reinforcement learning, machine learning, and natural language processing.