Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune

With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.

comments

By Jakub Czakon, Sr Data Scientist at neptune.ai, Przemysław Biecek, Founder of MI2DataLab & Adam Rydelek, Research Engineer at MI2DataLab

Machine learning model development is hard, especially in the real world.

Typically, you need to:

understand the business problem,
gather the data,
explore it,
set up a proper validation scheme,
implement models and tune parameters,
deploy them in a way that makes sense for the business,
inspect model results only to find out new problems that you have to deal with.

And that is not all.

You should have the experiments you run and models you train versioned in case you or anyone else needs to inspect them or reproduce the results in the future. From my experience, this moment comes when you least expect it and the feeling of “I wish I had thought about it before” is so very real (and painful).

But there is even more.

With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.

So what can you do about it?

Fortunately, today there are tools that make dealing with both of those problems possible.

The best part is you can combine them to have your models versioned, reproducible, and explainable.

Read on to learn how to:

explain machine learning models with DALEX explainers
make your models versioned and experiments reproducible with Neptune
automatically save model explainers and interactive explanation charts for every training run with Neptune + DALEX integration
compare, debug, and audit every model you build with versioned explainers

Let’s dive in.

Explainable Machine Learning with DALEX

Nowadays a model that scores high on the test set is often not enough. That’s why there is a growing interest in eXplainable Artificial Intelligence (XAI), which is a set of methods and techniques that make you understand the model’s behavior.

There are many XAI methods available in multiple programming languages. Some of the most commonly used in machine learning are LIME, SHAP, or PDP, but there are many more.

It is easy to get lost in the vast amount of techniques and that is where the eXplainable Artificial Intelligence pyramid comes in handy. It gathers the needs related to the exploration of models into an extensible drill-down map. The left side is about needs related to a single instance, the right side to a model as a whole. Consecutive layers dig into more and more detailed questions about the model behavior (local or global).

XAI pyramide | Find more in the Explanatory Model Analysis ebook

DALEX (available in R and Python) is a tool that helps you to understand how complex models are working. It currently works for tabular data only (but text and vision will come in the future).

It is integrated with most popular frameworks used for building machine learning models like keras, sklearn, xgboost, lightgbm, H2O and many more!

The core object in DALEX is an explainer. It connects training or evaluation data and a trained model and extracts all the information that you need to explain it.

Once you have it you can create visualizations, show model parameters, and dive into other model-related information. You can share it with your team or save it for later.

Creating an explainer for any model is really easy, as you can see in this example using sklearn!

import dalex as dx
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder

data = dx.datasets.load_titanic()
le = preprocessing.LabelEncoder()
for feature in ['gender', 'class', 'embarked']:
	data[feature] = le.fit_transform(data[feature])

X = data.drop(columns='survived')
y = data.survived

classifier = RandomForestClassifier()
classifier.fit(X, y)

exp = dx.Explainer(classifier, X, y, label = "Titanic Random Forest")

Model explanation for observations (local explanations)

When you want to understand why your model made a particular prediction, local explanations are your best friend.

It all starts with a prediction and moving down the left half of the pyramid above you can explore and understand what happened.

DALEX gives you a bunch of methods that show the influence of each variable locally:

SHAP: calculates contributions of features to the model prediction using classic Shapley values
Break Down: decomposes predictions into parts that can be attributed to each variable with so-called “greedy explanations”
Break Down with interactions: extends “greedy explanations” to account for feature interactions

Moving down the pyramid, the next crucial part of local explanations is understanding the sensitivity of the model to changes in feature values.

There is an easy way to plot such information in DALEX:

Ceteris Paribus: shows changes in model prediction allowing for differences only in a single variable while keeping all others constant

Following up on our example Random Forest model created on the Titanic dataset, we can easily create the plots mentioned above.

observation = pd.DataFrame({'gender': ['male'],
                   	    'age': [25],
                   	    'class': ['1st'],
                   	    'embarked': ['Southampton'],
                       	    'fare': [72],
                   	    'sibsp': [0],
                   	    'parch': 0},
                  	    index = ['John'])

# Variable influence plots - Break Down & SHAP
bd = exp.predict_parts(observation , type='break_down')
bd_inter = exp.predict_parts(observation, type='break_down_interactions')
bd.plot(bd_inter)

shap = exp.predict_parts(observation, type = 'shap', B = 10)
shap.plot(max_vars=5)

# Ceteris Paribus plots
cp = exp.predict_profile(observation)
cp.plot(variable_type = "numerical")
cp.plot(variable_type = "categorical")

Model understanding (global explanations)

When you want to understand which features are generally important for your model when it makes decisions you should look into global explanations.

To understand the model on the global level DALEX provides you with the variable importance plots. Variable importance plots, specifically permutation feature importance, enable the user to understand each variable’s influence on the model as a whole, and distinguish the most important ones.

Such visualizations can be seen as a global equivalent of SHAP and Break Down plots which depict similar information for a single observation.

Moving down the pyramid, on a dataset level, there are techniques such as Partial Dependence Profiles and Accumulated Local Dependence that let you visualize the way the model reacts as a function of selected variables.

Now let’s create some global explanations for our example.

# Variable importance

vi = exp.model_parts()
vi.plot(max_vars=5)

# Partial and Accumulated Dependence Profiles

pdp_num = exp.model_profile(type = 'partial')
ale_num = exp.model_profile(type = 'accumulated')

pdp_num.plot(ale_num)

pdp_cat = exp.model_profile(type = 'partial', 
variable_type='categorical',
variables = ["gender","class"])
ale_cat = exp.model_profile(type = 'accumulated',
          variable_type='categorical',
          variables = ["gender","class"])

ale_cat.plot(pdp_cat)

Reusable and organized explanation objects

A clean, structured, and easy to use collection of XAI visualizations is great but there is more to DALEX than that.

Packaging your models in DALEX explainers gives you a reusable and organized way of storing and versioning any work you do with machine learning models.

The explainer object created using DALEX contains:

a model to be explained,
model name and class,
task type,
data which will be used to calculate the explanations,
model predictions for such data,
predict function,
model residuals,
sampling weights for observations,
additional model information (package, version, etc.)

Having all this information stored in a single object makes creating local and global explanations easy (as we saw before).

It also makes reviewing, sharing, and comparing models and explanations at every stage of model development possible.

Experiment and model versioning with Neptune

In the perfect world, all your machine learning models and experiments are versioned in the same way as you version your software projects.

Unfortunately, to keep track of your ML projects you need way more than just committing your code to Github.

In a nutshell, to version machine learning models properly you should keep track of:

code, notebooks, and configuration files
environment
parameters
datasets
model files
results like evaluation metrics, performance charts or predictions

Some of those things work nicely with .git (code, environment configs) but others not so much.

Neptune makes it easy to keep track of all that by letting you log everything and anything you feel is important.

You just add a few lines to your scripts:

import neptune
from neptunecontrib.api import *
from neptunecontrib.versioning.data import *

neptune.init('YOU/YOUR_PROJECT')

neptune.create_experiment(
          params={'lr': 0.01, 'depth': 30, 'epoch_nr': 10}, # parameters
          upload_source_files=['**/*.py', # scripts
                               'requirements.yaml']) # environment
log_data_version('/path/to/dataset') # data version
#
# your training logic
#
neptune.log_metric('test_auc', 0.82) # metrics
log_chart('ROC curve', fig) # performance charts
log_pickle('model.pkl', clf) # model file

And every experiment or model training you run is versioned and waiting for you in the Neptune app (and database ????).

See it in Neptune

Your team can access all of the experiments and models, compare results, and find the information quickly.

You may be thinking: “Ok great, so I have my models versioned but”:

what if I want to debug the model weeks or months after they were trained?
what if I want to see the prediction explanations or variable importance for every experiment run?
what if somebody asks me to check if this model is unfairly biased and I don’t have the code or data it was trained on?

I hear you, and that’s where DALEX integration comes in!

DALEX + Neptune = versioned and explainable models

Why not have your DALEX explainers logged and versioned for every experiment with interactive explanation charts rendered in a nice UI, easy to share with anyone you want.

Exactly, why not!

With Neptune-DALEX integration, you can get all that at a cost of 3 additional lines.

Also, there are some very real benefits that come with this:

You can review models that others created and share yours easily
You can compare the behavior of any of the created models
You can trace and audit every model for unwanted bias and other problems
You can debug and compare models for which the training data, code or parameters are missing

Ok, it sounds cool, but how does it actually work?

Let’s get into this now.

Version local explanations

To log local model explanations you just need to:

Create an observation vector
Create your DALEX explainer object
Pass them to the log_local_explanations function from neptunecontrib

from neptunecontrib.api import log_local_explanations

observation = pd.DataFrame({'gender': ['male'],
                   	    'age': [25],
                   	    'class': ['1st'],
                   	    'embarked': ['Southampton'],
                       	    'fare': [72],
                   	    'sibsp': [0],
                   	    'parch': 0},
                  	    index = ['John'])

log_local_explanations(expl, observation)

Interactive explanation charts will be waiting for you in the “Artifacts” section of the Neptune app:

See it in Neptune

The following plots are created:

variable importance
partial dependence (if numerical features are specified)
accumulated dependence (if categorical features are specified)

Version global explanations

With global model explanations it’s even simpler:

Create your DALEX explainer object
Pass it to the log_global_explanations function from neptunecontrib
(optional) specify categorical features for which you would like to plot

from neptunecontrib.api import log_global_explanations

log_global_explanations(expl, categorical_features=["gender", "class"])

That’s it. Now you can go to the “Artifacts” section and find your local explanations charts:

See it in Neptune

The following plots are created:

break down,
break down with interactions,
shap,
ceteris paribus for numeric variables,
ceteris paribus for categorical variables

Version explainer objects

But if you really want to version your explanations you should version the explainer object itself.

The benefits of saving it?:

You can always create a visual representation of it later
You can dive into details in the tabular format
You can use it however you like (even if you don’t know how at the moment ????)

and it’s super simple:

from neptunecontrib.api import log_explainer

log_explainer('explainer.pkl', expl)

You may be thinking: “How else am I going to use the explainer objects?”

Let me show you in the next sections.

Fetch and analyze explanations of trained models

First of all, if you logged your explainer to Neptune you can fetch it directly into your script or notebook:

import neptune
from neptunecontrib.api import get_pickle

project = neptune.init(api_token='ANONYMOUS',
                       project_qualified_name='shared/dalex-integration')
experiment = project.get_experiments(id='DAL-68')[0]
explainer = get_pickle(filename='explainer.pkl', experiment=experiment)

Now that you have the model explanation you can debug your model.

One possible scenario is that you have an observation for which your model fails miserably.

You want to figure out why.

If you have your DALEX explainer object saved you can:

create local explanations and see what happened.
check how changing features affect the results.

See it in Neptune

Of course, you can do way more, especially if you want to compare models and explanations.

Let’s dive into that now!

Compare models and explanations

What if you want to:

compare the current model idea with the models that are running in production?
see whether experimental ideas from last year would work better on freshly collected data?

Having a clean structure of experiments and models and a single place where you store them makes it really easy to do.

You can compare experiments based on parameters, data version, or metrics in the Neptune UI:

See it in Neptune

You see the diffs in two clicks and can drill down to whatever info you need with one or two more.

Ok, it is really useful when it comes to comparing hyperparameters and metrics but what about the explainers?

You can go into each experiment and look at the interactive explanation charts to see if there is something fishy going on with your model.

What’s better, Neptune lets you access all the information you logged programmatically, including model explainers.
You can fetch explainer objects for each experiment and compare them. Just use get_pickle function from neptunecontrib and then visualize multiple explainers with DALEX .plot:

experiments =project.get_experiments(id=['DAL-68','DAL-69','DAL-70','DAL-71'])

shaps = []
for exp in experiments:
	auc_score = exp.get_numeric_channels_values('auc')['auc'].tolist()[0]
	label = f'{exp.id} | AUC: {auc_score:.3f}'

	explainer_ = get_pickle(filename='explainer.pkl', experiment=exp)
    
	sh = explainer_.predict_parts(new_observation, type='shap', B = 10)
	sh.result.label = label
	shaps.append(sh)

shaps[0].plot(shaps[1:])

See it in Neptune

That is the beauty of DALEX plots. You can pass multiple explainers and they will do the magic.

Of course, you can compare previously trained models with the one that you are currently working on to see if you are going in the right direction. Just append it to the list of explainers and pass to the .plot method.

Final thoughts

Ok, to sum up.

In this article, you’ve learned about:

Various model explanation techniques and how to package those explanations with DALEX explainers
How you can version machine learning models and experiments with Neptune
How to version model explainers and interactive explanation charts for every training you run with Neptune + DALEX integration
How to compare and debug models you train with explainers

With all that information, I hope your model development process will now be more organized, reproducible, and explainable.

Happy training!

Jakub Czakon is Senior Data Scientist at neptune.ai.

Przemysław Biecek is Founder of MI2DataLab, Principal Data Scientist at Samsung R&D Institute Poland.

Adam Rydelek is a Research Engineer at MI2DataLab, Student in Data Science at Warsaw University of Technology.

Original. Reposted with permission.

Related:

Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune

Explainable Machine Learning with DALEX

Model explanation for observations (local explanations)

Model understanding (global explanations)

Reusable and organized explanation objects

Experiment and model versioning with Neptune

DALEX + Neptune = versioned and explainable models

Version local explanations

Version global explanations

Version explainer objects

Fetch and analyze explanations of trained models

Compare models and explanations

Final thoughts

More On This Topic

Latest Posts

Top Posts