shapr

Explaining the output of machine learning models with more accurately estimated Shapley values

shapr

CRAN_Status_Badge
CRAN_Downloads_Badge
R build
status
Lifecycle:
stable
License:
MIT
JOSS
(v0.2.3)
arXiv
(v1.0.4)

See the pkgdown site at
norskregnesentral.github.io/shapr/
for a complete introduction with examples and documentation of the
package.

For an overview of the methodology and capabilities of the package (per
shapr v1.0.4), see the software paper Jullum et al.
(2025), available in preprint
here.

NEWS

With shapr version 1.0.0 (GitHub only, Nov 2024) and version 1.0.1
(CRAN, Jan 2025), the package was subject to a major update, providing a
full restructuring of the code based, and a full suit of new
functionality, including:

  • A long list of approaches for estimating the contribution/value
    function $v(S)$, including Variational Autoencoders, and
    regression-based methods
  • Iterative Shapley value estimation with convergence detection
  • Parallelized computations with progress updates
  • Reweighted Kernel SHAP for faster convergence
  • New function explain_forecast() for explaining forecasts
  • Asymmetric and causal Shapley values
  • Several other methodological, computational and user-experience
    improvements
  • Python wrapper shaprpy making the core functionality of shapr
    available in Python

See the
NEWS for a
complete list.

Coming from shapr < 1.0.0?

shapr version >= 1.0.0 comes with a number of breaking changes. Most
notably, we moved from using two function (shapr() and explain()) to
a single function (explain()). In addition, custom models are now
explained by passing the prediction function directly to explain(),
quite a few input arguments got new names, and a few functions for edge
cases was removed to simplify the code base.

Click
here
to view a version of this README with the old syntax (v0.2.2).

Python wrapper

We provide a Python wrapper (shaprpy) which allows explaining Python
models with the methodology implemented in shapr, directly from
Python. The wrapper calls R internally, and therefore requires an
installation of R. See
here for
installation instructions and examples.

The package

The shapr R package implements an enhanced version of the Kernel SHAP
method, for approximating Shapley values, with a strong focus on
conditional Shapley values. The core idea is to remain completely
model-agnostic while offering a variety of methods for estimating
contribution functions, enabling accurate computation of conditional
Shapley values across different feature types, dependencies, and
distributions. The package also includes evaluation metrics to compare
various approaches. With features like parallelized computations,
convergence detection, progress updates, and extensive plotting options,
shapr is as a highly efficient and user-friendly tool, delivering
precise estimates of conditional Shapley values, which are critical for
understanding how features truly contribute to predictions.

A basic example is provided below. Otherwise we refer to the pkgdown
website
and the different
vignettes there for details and further examples.

Installation

shapr is available on CRAN
and can be installed in R as:

install.packages("shapr")

To install the development version of shapr, available on GitHub, use

remotes::install_github("NorskRegnesentral/shapr")

To also install all dependencies, use

remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)

Example

shapr supports computation of Shapley values with any predictive model
which takes a set of numeric features and produces a numeric outcome.

The following example shows how a simple xgboost model is trained
using the airquality dataset, and how shapr explains the individual
predictions.

We first enable parallel computation and progress updates with the
following code chunk. These are optional, but recommended for improved
performance and user friendliness, particularly for problems with many
features.

# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features

# Enable progress updates of the v(S)-computations
# Requires the progressr package
progressr::handlers(global = TRUE)
progressr::handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)

Here comes the actual example

library(xgboost)
library(shapr)

data("airquality")
data <- data.table::as.data.table(airquality)
data <- data[complete.cases(data), ]

x_var <- c("Solar.R", "Wind", "Temp", "Month")
y_var <- "Ozone"

ind_x_explain <- 1:6
x_train <- data[-ind_x_explain, ..x_var]
y_train <- data[-ind_x_explain, get(y_var)]
x_explain <- data[ind_x_explain, ..x_var]

# Looking at the dependence between the features
cor(x_train)
#>            Solar.R       Wind       Temp      Month
#> Solar.R  1.0000000 -0.1243826  0.3333554 -0.0710397
#> Wind    -0.1243826  1.0000000 -0.5152133 -0.2013740
#> Temp     0.3333554 -0.5152133  1.0000000  0.3400084
#> Month   -0.0710397 -0.2013740  0.3400084  1.0000000

# Fitting a basic xgboost model to the training data
model <- xgboost(
  data = as.matrix(x_train),
  label = y_train,
  nround = 20,
  verbose = FALSE
)

# Specifying the phi_0, i.e. the expected prediction without any features
p0 <- mean(y_train)

# Computing the Shapley values with kernelSHAP accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
  model = model,
  x_explain = x_explain,
  x_train = x_train,
  approach = "empirical",
  phi0 = p0,
  seed = 1
)
#> 
#> ── Starting `shapr::explain()` at 2025-05-16 15:59:46 ──────────────────────────
#> ℹ Feature classes extracted from the model contains `NA`.
#>   Assuming feature classes from the data are correct.
#> ℹ `max_n_coalitions` is `NULL` or larger than or `2^n_features = 16`, and is
#>   therefore set to `2^n_features = 16`.
#> 
#> 
#> ── Explanation overview ──
#> 
#> 
#> 
#> • Model class: <xgb.Booster>
#> 
#> • Approach: empirical
#> 
#> • Iterative estimation: FALSE
#> 
#> • Number of feature-wise Shapley values: 4
#> 
#> • Number of observations to explain: 6
#> 
#> • Computations (temporary) saved at:
#> '/tmp/RtmprK6ied/shapr_obj_367086e7deb18.rds'
#> 
#> 
#> 
#> ── Main computation started ──
#> 
#> 
#> 
#> ℹ Using 16 of 16 coalitions.

# Printing the Shapley values for the data to explain.
# For more information about the interpretation of the values in the table, see ?shapr::explain.
print(explanation$shapley_values_est)
#>    explain_id     none    Solar.R      Wind      Temp      Month
#>         <int>    <num>      <num>     <num>     <num>      <num>
#> 1:          1 43.08571 13.2117337  4.785645 -25.57222  -5.599230
#> 2:          2 43.08571 -9.9727747  5.830694 -11.03873  -7.829954
#> 3:          3 43.08571 -2.2916185 -7.053393 -10.15035  -4.452481
#> 4:          4 43.08571  3.3254595 -3.240879 -10.22492  -6.663488
#> 5:          5 43.08571  4.3039571 -2.627764 -14.15166 -12.266855
#> 6:          6 43.08571  0.4786417 -5.248686 -12.55344  -6.645738

# Finally we plot the resulting explanations
plot(explanation)

See Jullum et al. (2025) (preprint available
here) for a software paper with an
overview of the methodology and capabilities of the package (as of
v1.0.4). See the general usage
vignette

for further basic usage examples and brief introductions to the
methodology. For more thorough information about the underlying
methodology, see methodological papers Aas, Jullum, and Løland
(2021), Redelmeier, Jullum, and Aas
(2020), Jullum, Redelmeier, and Aas
(2021), Olsen et al.
(2022), Olsen et al.
(2024). See also Sellereite and Jullum
(2019) for a very brief paper about a
previous version (v0.1.1) of the package (with a different structure,
syntax and significantly less functionality).

Contribution

All feedback and suggestions are very welcome. Details on how to
contribute can be found
here. If
you have any questions or comments, feel free to open an issue
here.

Please note that the ‘shapr’ project is released with a Contributor
Code of
Conduct
.
By contributing to this project, you agree to abide by its terms.

References

Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. “Explaining
Individual Predictions When Features Are Dependent: More Accurate
Approximations to Shapley Values.” Artificial Intelligence 298.
https://doi.org/10.1016/j.artint.2021.103502.

Jullum, Martin, Lars Henry Berge Olsen, Jon Lachmann, and Annabelle
Redelmeier. 2025. “Shapr: Explaining Machine Learning Models with
Conditional Shapley Values in R and Python.” arXiv Preprint
arXiv:2504.01842
. https://arxiv.org/abs/2504.01842.

Jullum, Martin, Annabelle Redelmeier, and Kjersti Aas. 2021. “Efficient
and Simple Prediction Explanations with groupShapley: A Practical
Perspective.” In Proceedings of the 2nd Italian Workshop on Explainable
Artificial Intelligence
, 28–43. CEUR Workshop Proceedings.

Olsen, Lars Henry Berge, Ingrid Kristine Glad, Martin Jullum, and
Kjersti Aas. 2022. “Using Shapley Values and Variational Autoencoders to
Explain Predictive Models with Dependent Mixed Features.” Journal of
Machine Learning Research
23 (213): 1–51.

———. 2024. “A Comparative Study of Methods for Estimating Model-Agnostic
Shapley Value Explanations.” Data Mining and Knowledge Discovery,
1–48.

Redelmeier, Annabelle, Martin Jullum, and Kjersti Aas. 2020. “Explaining
Predictive Models with Mixed Features Using Shapley Values and
Conditional Inference Trees.” In International Cross-Domain Conference
for Machine Learning and Knowledge Extraction
, 117–37. Springer.

Sellereite, N., and M. Jullum. 2019. “Shapr: An r-Package for Explaining
Machine Learning Models with Dependence-Aware Shapley Values.” Journal
of Open Source Software
5 (46): 2027.
https://doi.org/10.21105/joss.02027.