Open In App

SHAP : A Comprehensive Guide to SHapley Additive exPlanations

SHAP is a unified framework for interpreting machine learning models. It provides a way to understand the contributions of each input feature to the model’s predictions. SHAP helps us understand how machine learning models work. We will explore more about SHAP and how to plot important graphs using SHAP in this article.

What is SHAP?

SHAP is a framework used to interpret the output of machine learning models. The key idea behind SHAP values is rooted in cooperative game theory and the concept of Shapley values.



Unlike other methods, SHAP gives us a detailed understanding of how each feature contributes to predictions. This not only ensures fairness but also makes it easier for everyone to understand.

SHAP is useful because it shows us the importance of each feature in making predictions. Providing Shapley values, helps us understand complex models and how input features affect predictions.



Creating a Simple XGBRegression Model for SHAP Interpretation:

Install necessary packages:

 ! pip install xgboost shap pandas scikit-learn ipywidgets matplotlib

Creating a model:

In the following code snippet, XGBoost is used to train a regression model on the abalone dataset then using SHAP (SHapley Additive exPlanations) to explain the model’s predictions.

We have imported necessary packages: xgboost, shap, pandas. And loaded the abalone dataset.

Data preprocessing and Feature Engineering:

After data processing step we have created an XGBRegressor Model and trained on the training model.

The SHAP Explainer is created using the loaded XGBoost model and the SHAP values are calculated for the test set.

And last, we have initialized the JavaScript visualization library for displaying SHAP summary plots.




# Importing necessary packages
import xgboost as xgb
import shap
import pandas as pd
from sklearn.model_selection import train_test_split
 
# Loading the abalone dataset
columns = ["Sex", "Length", "Diameter", "Height", "WholeWeight",
           "ShuckedWeight", "VisceraWeight", "ShellWeight", "Rings"]
abalone_data = pd.read_csv(url, header=None, names=columns)
 
# Data preprocessing and feature engineering
# Assuming you want to predict the number of rings, which is a continuous target variable
X = abalone_data.drop("Rings", axis=1)
y = abalone_data["Rings"]
 
# Convert categorical feature 'Sex' to numerical using one-hot encoding
X = pd.get_dummies(X, columns=["Sex"], drop_first=True)
 
# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)
 
# Creating an XGBRegressor model
model = xgb.XGBRegressor()
model.fit(X_train, y_train)
 
# Save the XGBoost model in binary format
model.save_model('model.json')
 
# Load the model from the saved binary file
loaded_model = xgb.XGBRegressor()
loaded_model.load_model('model.json')
 
# SHAP Explainer
explainer = shap.Explainer(loaded_model)
shap_values = explainer(X_test)
 
# Initialize the SHAP JavaScript library
shap.initjs()

Waterfall Plot:




# Load the model from the saved binary file
loaded_model = xgb.XGBRegressor()
loaded_model.load_model('model.json')
 
# SHAP Explainer
explainer = shap.Explainer(loaded_model)
shap_values = explainer(X_test)
 
# Waterfall plot for the first observation
shap.waterfall_plot(shap_values[0])

Output:

Waterfall plot

Force Plot:




# Create a SHAP explainer for the model
explainer = shap.Explainer(model)
 
# Compute SHAP values for the test set
shap_values = explainer(X_test)
 
# If SHAP values are an Explanation object, extract the values
if isinstance(shap_values, shap.Explanation):
    shap_values = shap_values.values
 
# Force plot for the first observation with matplotlib
# The expected_value is the model's expected output for the dataset
# The shap_values[0] represents the SHAP values for the first observation
# X_test.iloc[0, :] is the corresponding feature values for the first observation
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0, :], matplotlib=True)

Output:

Force plot

Stacked Force Plot:




# If shap_values is an Explanation object, extract the values
if isinstance(shap_values, shap.Explanation):
    shap_values = shap_values.values
 
# Initialize the SHAP JavaScript library
shap.initjs()
 
# Visualize the first 100 observations
for i in range(100):
    shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0, :], matplotlib=True)

Output:

Stacked force plot

Mean SHAP Plot:




shap.summary_plot(shap_values, X_test)

Output:

Mean SHAP plot

Beeswarm Plot:




shap.summary_plot(shap_values, X_test, plot_type="bar")

Output:

Beeswarm Plot

Dependence Plots:




shap.dependence_plot("ShellWeight", shap_values, X_test)

Output:

Dependence Plots

Feature Importance with SHAP:

To understand machine learning models SHAP (SHapley Additive exPlanations) provides a comprehensive framework for interpreting the portion of each input feature in a model’s predictions.

Shapley Values:

Individual Feature Contributions:

Quantifying Impact:

Interpretability Across Models:

Consistency in Summation:

Visual Representation:

Interpreting Black Box Models with SHAP:




# Importing necessary packages
import xgboost as xgb
import shap
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
 
# Loading the Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name="Target")
 
# Data preprocessing and feature engineering
# Assuming no specific preprocessing is needed for this example
 
# Creating an XGBRegressor model for demonstration purposes
model = xgb.XGBRegressor()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model.fit(X_train, y_train)
 
# Creating a black box model (e.g., a simple decision tree)
from sklearn.tree import DecisionTreeClassifier
 
black_box_model = DecisionTreeClassifier(random_state=42)
black_box_model.fit(X_train, y_train)
 
# SHAP values for the black box model
explainer = shap.Explainer(black_box_model)
shap_values = explainer.shap_values(X_test)
 
# You can now use shap_values for interpretation and visualization
# Example: Plotting a summary plot
shap.summary_plot(shap_values, X_test)

Output:

Summary plot of BlackBox model

Applications of SHAP:

Challenges of SHAP:

In summary, SHAP is like a powerful tool that helps us see which parts of our data matter the most in making predictions. It works for different kinds of models and shows us clear pictures to make things easier to understand. This makes it really useful for people who want to trust and better understand their complicated models.


Article Tags :