# SHAP: Explain Any Machine Learning Model in Python

A Comprehensive Guide to SHAP and Shapley Values

# Motivation

Imagine you are trying to train a machine learning model to predict whether an ad is clicked by a particular person. After receiving some information about a person, the model predicts that a person will not click on an ad.

Image by Author

But why does the model predict that? How much does each feature contribute to the prediction? Wouldn’t it be nice if you can see a plot indicating how much each feature contributes to the prediction like below?

Image by Author

That is when Shapley value comes in handy.

# What is Shapley Value?

The Shapley value is a method used in game theory that involves fairly distributing both gains and costs to actors working in a coalition.

Since each actor contributes differently to the coalition, the Shapley value makes sure that each actor gets a fair share depending on how much they contribute.

Image by Author

## A Simple Example

Shapley value is used for a wide range of problems that question the contribution of each worker/feature in a group. To understand how Shapley value works, let’s imagine that your company has just done A/B testing, where they test different combinations of advertisement strategies.

The revenues for each strategy in a specific month are:

• social media: \$300
• email marketing: \$350
• social media and email marketing: \$400

Image by Author

The difference in revenues between using three ads and using no ad is \$300. How much does each ad contribute to this difference?

Image by Author

We can figure that out by calculating the Shapley value for each type of advertisement. This article provides an excellent way to calculate the Shap value. I will summarize it here.

Image by Author

Image by Author

Image by Author

Image by Author

Image by Author

Image by Author

## Find Weights

To find the weights, we will organize the combinations of different advertisement strategies into multiple levels like below. Each level corresponds to the number of advertisement strategies in each combination.

Then we will assign weights based on the number of edges in each level. We saw that:

• The first level contains 3 edges so the weight of each edge will be 1/3
• The second level contains 6 edges so the weight of each edge will be 1/6
• The third level contains 3 edges so the weight of each edge will be 1/3

Image by Author

Image by Author

Image by Author

Image by Author

Repeat the steps above for two other advertisement strategies, we can see that:

• Email marketing contributes \$151.67
• Social media contributes \$111.67

Image by Author

Now that we understand the Shapley value, let’s see how we can use it to interpret a machine learning model.

# SHAP — Explain Any Machine Learning Models in Python

SHAP is a Python library that uses Shapley values to explain the output of any machine learning model.

To install SHAP, type:

``pip install shap``

## Train a Model

To understand how SHAP works, we will experiment with an advertising dataset:

We will build a machine learning model to predict whether a user clicked on an ad based on some information about that person.

We will use Patsy to turn the DataFrame into an array of features and an array of target values:

Split the data into train and test sets:

Next, we will use XGBoost to build a model and make predictions:

To see how well the model performs, we will use the F1 score:

``0.9619047619047619``

Pretty good!

## Interpret the Model

The model did a good job of predicting whether a user clicked an ad. But how did it come up with such predictions? How much did each feature contribute to the difference between the final prediction and the average prediction?

Note that this problem is very similar to the problem we addressed at the beginning of the article.

That is why finding the Shapley value of each feature can help us determine their contribution. The steps to get the importance of the feature i, where i is the index of features, is similar to before:

• Get all subsets that don’t contain the feature i
• Find the marginal contribution of the feature i to each of these subsets
• Aggregate all marginal contributions to compute the contributions of the feature i

To find the Shapley values using SHAP, simply insert your trained model to `shap.Explainer` :

# SHAP Waterfall Plot

Visualize the first prediction’s explanation:

Image by Author

Aha! Now we know the contribution of each feature to the first prediction. Explanations for the graph above:

Image by Author

• The blue bar shows how much a particular feature decreases the value of the prediction.
• The red bar shows how much a particular feature increases the value of the prediction.
• Negative values imply probabilities of less than 0.5 that the person clicked the ad

For each of these subsets, SHAP doesn’t remove a feature then retrain the model but replaces that feature with the average value of that feature, then generates the predictions.

We should expect the total contribution to be equal to the difference between the prediction and the mean prediction. Let’s check that:

Image by Author

Cool! They’re equal.

Visualize the second prediction’s explanation:

Image by Author

# SHAP Summary Plot

Instead of looking at each individual instance, we can visualize the overall impact of these features across multiple instances using SHAP summary plot:

Image by Author

The SHAP summary plot tells us the most important features and their range of effects over the dataset.

From the plot above, we can gain some interesting insights into the model’s predictions:

• The daily internet usage of a user has the strongest effect on whether that user clicked on an ad.
• As the daily internet usage increases, a user is less likely to click on an ad.
• As the daily time spent on the site increases, a user is less likely to click on an ad.
• As the area income increases, a user is less likely to click on an ad.
• As the age increases, a user is more likely to click on an ad.
• If a user is a male, that user is less likely to click on an ad.

# SHAP Bar Plot

We can also get a global feature importance plot using the SHAP bar plot.

Image by Author

# SHAP Dependence Scatter Plot

We can observe the effect of a single feature on all predictions made by the model using the SHAP dependence scatter plot.

## Daily Internet Usage

Scatter plot of the daily internet usage feature:

Image by Author

From the plot above, we can see that as the daily internet usage increases, the SHAP value for the daily internet usage decreases. This confirms what we saw in the earlier plot.

We can also observe the interaction between the daily internet usage feature with other features in the same plot by adding `color=shap_values` .

The scatter plot will attempt to pick out the feature column with the strongest interaction with daily internet usage, which is daily time spent on site.

Image by Author

Cool! From the plot above, we can see that a person who uses the internet for 150 minutes per day and spends a small amount of time on the website per day is more likely to click the ad.

Let’s look at the scatter plots of some other features:

Image by Author

Image by Author

Image by Author

Image by Author

# SHAP Interaction Plot

You can also observe the matrix of interactions between features with the SHAP interaction value summary plot. In this plot, the main effects are on the diagonal and the interaction effects are off the diagonal.

Image by Author

Pretty cool!

# Conclusion

Congratulations! You have just learned about Shapey value and how to use it to interpret a machine learning model. I hope this article will give you the essential knowledge to interpret your own machine learning model using Python.

Feel free to play with the source code in this interactive notebook or fork my repository.

## Reference

Mazzanti, S. (2021, April 21). SHAP explained the way I wish someone explained it to me. Medium. Retrieved September 23, 2021, from https://towardsdatascience.com/shap-explained-the-way-i-wish-someone-explained-it-to-me-ab81cc69ef30.

Khuyen Tran is a prolific data science writer, and has written an impressive collection of useful data science topics along with code and articles. Khuyne is currently looking for a machine learning engineer role, a data scientist role, or a developer advocate role in Bay Area after May 2022, so please reach out if you are looking for someone with her set of skills.

Original. Reposted with permission.