The most common machine learning task is to train a model which is able to predict an unknown outcome (response variable) based on a set of known input variables/features. When using such models for real life applications, it is often crucial to understand why a certain set of features lead to exactly that prediction. However, explaining predictions from complex, or seemingly simple, machine learning models is a practical and ethical question, as well as a legal issue. Can I trust the model? Is it biased? Can I explain it to others? We want to explain individual predictions from a complex machine learning model by learning simple, interpretable explanations.
Shapley values is the only prediction explanation framework with a solid theoretical foundation (Lundberg and Lee (2017)). Unless the true distribution of the features are known, and there are less than say 10-15 features, these Shapley values needs to be estimated/approximated. Popular methods like Shapley Sampling Values (Štrumbelj and Kononenko (2014)), SHAP/Kernel SHAP (Lundberg and Lee (2017)), and to some extent TreeSHAP (Lundberg, Erion, and Lee (2018)), assume that the features are independent when approximating the Shapley values for prediction explanation. This may lead to very inaccurate Shapley values, and consequently wrong interpretations of the predictions. Aas, Jullum, and Løland (2019) extends and improves the Kernel SHAP method of Lundberg and Lee (2017) to account for the dependence between the features, resulting in significantly more accurate approximations to the Shapley values. See the paper for details.
This package implements the methodology of Aas, Jullum, and Løland (2019).
The following methodology/features are currently implemented:
stats::glm
,
stats::lm
,ranger::ranger
,
xgboost::xgboost
/xgboost::xgb.train
and
mgcv::gam
.Future releases will include:
Note that both the features and the prediction must be numeric. The
approach is constructed for continuous features. Discrete features may
also work just fine with the empirical (conditional) distribution
approach. Unlike SHAP and TreeSHAP, we decompose probability predictions
directly to ease the interpretability, i.e. not via log odds
transformations. The application programming interface (API) of
shapr
is inspired by Pedersen and Benesty (2019).
To install the current stable release from CRAN, use
install.packages("shapr")
To install the current development version, use
::install_github("NorskRegnesentral/shapr") remotes
If you would like to install all packages of the models we currently support, use
::install_github("NorskRegnesentral/shapr", dependencies = TRUE) remotes
If you would also like to build and view the vignette locally, use
::install_github("NorskRegnesentral/shapr", dependencies = TRUE, build_vignettes = TRUE)
remotesvignette("understanding_shapr", "shapr")
You can always check out the latest version of the vignette here.
shapr
supports computation of Shapley values with any
predictive model which takes a set of numeric features and produces a
numeric outcome.
The following example shows how a simple xgboost
model
is trained using the Boston Housing Data, and how
shapr
explains the individual predictions.
library(xgboost)
library(shapr)
data("Boston", package = "MASS")
<- c("lstat", "rm", "dis", "indus")
x_var <- "medv"
y_var
<- 1:6
ind_x_test <- as.matrix(Boston[-ind_x_test, x_var])
x_train <- Boston[-ind_x_test, y_var]
y_train <- as.matrix(Boston[ind_x_test, x_var])
x_test
# Looking at the dependence between the features
cor(x_train)
#> lstat rm dis indus
#> lstat 1.0000000 -0.6108040 -0.4928126 0.5986263
#> rm -0.6108040 1.0000000 0.1999130 -0.3870571
#> dis -0.4928126 0.1999130 1.0000000 -0.7060903
#> indus 0.5986263 -0.3870571 -0.7060903 1.0000000
# Fitting a basic xgboost model to the training data
<- xgboost(
model data = x_train,
label = y_train,
nround = 20,
verbose = FALSE
)
# Prepare the data for explanation
<- shapr(x_train, model)
explainer #> The specified model provides feature classes that are NA. The classes of data are taken as the truth.
# Specifying the phi_0, i.e. the expected prediction without any features
<- mean(y_train)
p
# Computing the actual Shapley values with kernelSHAP accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
<- explain(
explanation
x_test,approach = "empirical",
explainer = explainer,
prediction_zero = p
)
# Printing the Shapley values for the test data.
# For more information about the interpretation of the values in the table, see ?shapr::explain.
print(explanation$dt)
#> none lstat rm dis indus
#> <num> <num> <num> <num> <num>
#> 1: 22.446 5.2632030 -1.2526613 0.2920444 4.5528644
#> 2: 22.446 0.1671901 -0.7088401 0.9689005 0.3786871
#> 3: 22.446 5.9888022 5.5450858 0.5660134 -1.4304351
#> 4: 22.446 8.2142204 0.7507572 0.1893366 1.8298304
#> 5: 22.446 0.5059898 5.6875103 0.8432238 2.2471150
#> 6: 22.446 1.9929673 -3.6001958 0.8601984 3.1510531
# Finally we plot the resulting explanations
plot(explanation)
All feedback and suggestions are very welcome. Details on how to contribute can be found here. If you have any questions or comments, feel free to open an issue here.
Please note that the ‘shapr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.