Monitor Model Performance in the MLOps Pipeline with Python

Python Tutorial on maintaining your model quality in production by monitoring the performance.

Monitor Model Performance in the MLOps Pipeline with Python
Image by on Freepik


The machine learning model is only helpful if used in production to solve business problems. However, the business problem and the machine learning model are constantly evolving. That is why we need to maintain the machine learning so the performance keeps up with the business KPI. This is where the MLOps concept came from.

MLOps, or machine learning operations, is a collection of techniques and tools for machine learning in production. From the machine learning automation, versioning, delivery, and monitoring is something that MLOps handle. This article will focus on monitoring and how we use Python packages to set up monitoring model performance in production. Let’s get into it.


Monitor Model Performance


When we talk about monitoring in the MLOps, it could refer to many things, as one of the MLOps principles is monitoring. For example:

- Monitor the data distribution change over time

- Monitor the features used in the development vs. production

- Monitor model decay

- Monitor model performance 

- Monitor the system staleness

There are still a lot of elements to monitor in the MLOps, but in this article, we will focus on monitoring model performance. Model performance, in our case, refers to the capability of the model to make reliable predictions from unseen data, measured with specific metrics such as accuracy, precision, recall, etc. 

Why do we need to monitor the model performance? It’s to maintain the model prediction reliability to solve the business problem. Before production, we often calculate the model performance and its effect on the KPI; for example, the baseline is 70% Accuracy if we want our model still follow the business needs, but below that is unacceptable. That is why monitoring the performance would allow the model always to meet the business requirements.

Using Python, we would learn how model monitoring is done. Let’s start by installing the package. There are many choices for model monitoring, but for this example, we would use the open-source package for monitoring called evidently.


Setup the Model Monitoring with Python


First, we need to install the evidently package with the following code.

pip install evidently


After installing the package, we would download the data example, the insurance claim data from Kaggle. Also, we would clean the data before we use them further.

import pandas as pd

df = pd.read_csv("insurance_claims.csv")

# Sort the data based on the Incident Data
df = df.sort_values(by="incident_date").reset_index(drop=True)

# Variable Selection
df = df[

# Data Cleaning and One-Hot Encoding
df = pd.get_dummies(

df["fraud_reported"] = df["fraud_reported"].apply(lambda x: 1 if x == "Y" else 0)

df = df.rename(columns={"incident_date": "timestamp", "fraud_reported": "target"})

for i in df.select_dtypes("number").columns:
    df[i] = df[i].apply(float)

data = df[df["timestamp"] < "2015-02-20"].copy()
val = df[df["timestamp"] >= "2015-02-20"].copy()


In the code above, we select some columns for model training purposes, transform them into numerical representation, and split the data for reference (data) and current data (val).

We need reference or baseline data in the MLOps pipeline to monitor model performance. It’s usually the data separated from the training data (for example, test data). Also, we need the current data or the data unseen by the model (incoming data).

Let’s use evidently to monitor the data and the model performance. Because data drift would affect the model performance, it’s also something considered to monitor. 

from import Report
from evidently.metric_preset import DataDriftPreset

data_drift_report = Report(metrics=[
]), reference_data=data, column_mapping=None)'inline')


Monitor Model Performance in the MLOps Pipeline with Python


The evidently package would automatically show a report on what happened to the dataset. The information includes the dataset drift and the column drift. For the example above, we don’t have any dataset drift occurrence, but two columns drifted.


Monitor Model Performance in the MLOps Pipeline with Python


The report shows that the column ‘property_claim’ and ‘timestamp’ indeed have drift detected. This information can be used in the MLOps pipeline to retrain the model, or we still need a further data exploration.

If required, we can also acquire the data report above in the log dictionary object.



Next, let’s try to train a classifier model from the data and try to use evidently to monitor the model's performance.

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()['target', 'timestamp'], axis = 1), data['target'])


Evidently would need both the target and prediction columns in the reference and the current dataset. Let’s add the model prediction to the dataset and use evidently to monitor the performance.

data['prediction'] = rf.predict(data.drop(['target', 'timestamp'], axis = 1))
val['prediction'] = rf.predict(val.drop(['target', 'timestamp'], axis = 1))


As a note, it’s better to have the reference data that is not the training data for the actual cases to monitor the model performance. Let’s set up the model performance monitoring with the following code.

from evidently.metric_preset import ClassificationPreset

classification_performance_report = Report(metrics=[
]), current_data=val)'inline')


Monitor Model Performance in the MLOps Pipeline with Python

In the result, we get the current model quality metrics are lower than the reference (expected as we use training data for the reference). Depending on the business requirement, the metrics above could become indicators of the next step we need to take. Let’s see the other information we get from the evidently report.


Monitor Model Performance in the MLOps Pipeline with Python

The class Representation report shows the actual class distribution.


Monitor Model Performance in the MLOps Pipeline with Python


The confusion matrix shows how the prediction values were against the actual data in both reference and current datasets.


Monitor Model Performance in the MLOps Pipeline with Python


Quality Metrics by Class show how the performance of each class. 

Like before, we can transform the classification performance report into a dictionary log with the following code.



That is all for now. You can set up the model performance monitor with evidently in any MLOps pipeline you currently have, and it would still work wonderfully.




Model performance monitoring is an essential task in the MLOps pipeline as it’s the one that would help maintain how our model keeps up with the business requirement. With a Python package called evidently, we can easily set up the model performance monitor, which can be integrated into any existing MLOps pipeline.
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.