tidyposterior

Bayesian comparisons of models using resampled statistics


output: github_document

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

tidyposterior

R-CMD-check
Codecov test coverage
CRAN_Status_Badge
Downloads

This package can be used to conduct post hoc analyses of resampling results generated by models.

For example, if two models are evaluated with the root mean squared error (RMSE) using 10-fold cross-validation, there are 10 paired statistics. These can be used to make comparisons between models without involving a test set.

There is a rich literature on the analysis of model resampling results such as McLachlan’s Discriminant Analysis and Statistical Pattern Recognition and the references therein. This package follows the spirit of Benavoli et al (2017).

tidyposterior uses Bayesian generalized linear models for this purpose and can be considered an upgraded version of the caret::resamples() function. The package works with rsample objects natively but any results in a data frame can be used.

See Chapter 11 of Tidy Models with R for examples and more details.

Installation

You can install the released version of tidyposterior from CRAN with:

install.packages("tidyposterior")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("tidymodels/tidyposterior")

Example

To illustrate, here are some example objects using 10-fold cross-validation for a simple two-class problem:

library(tidymodels)
library(tidyposterior)

data(two_class_dat, package = "modeldata")

set.seed(100)
folds <- vfold_cv(two_class_dat)

We can define two different models (for simplicity, with no tuning parameters).

logistic_reg_glm_spec <-
  logistic_reg() %>%
  set_engine('glm')

mars_earth_spec <-
  mars(prod_degree = 1) %>%
  set_engine('earth') %>%
  set_mode('classification')

For tidymodels, the [tune::fit_resamples()] function can be used to estimate performance for each model/resample:

rs_ctrl <- control_resamples(save_workflow = TRUE)

logistic_reg_glm_res <- 
  logistic_reg_glm_spec %>% 
  fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

mars_earth_res <- 
  mars_earth_spec %>% 
  fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl)

From these, there are several ways to pass the results to the perf_mod() function. The most general approach is to have a data frame with the resampling labels (i.e., one or more id columns) as well as columns for each model that you would like to compare.

For the model results above, [tune::collect_metrics()] can be used along with some basic data manipulation steps:

logistic_roc <- 
  collect_metrics(logistic_reg_glm_res, summarize = FALSE) %>% 
  dplyr::filter(.metric == "roc_auc") %>% 
  dplyr::select(id, logistic = .estimate)

mars_roc <- 
  collect_metrics(mars_earth_res, summarize = FALSE) %>% 
  dplyr::filter(.metric == "roc_auc") %>% 
  dplyr::select(id, mars = .estimate)

resamples_df <- full_join(logistic_roc, mars_roc, by = "id")
resamples_df

We can then give this directly to perf_mod():

set.seed(101)
roc_model_via_df <- perf_mod(resamples_df, iter = 2000)

From this, the posterior distributions for each model can be obtained from the tidy() method:

#| fig-alt: "Faceted histogram chart. Area Under the ROC Curve along the x-axis, count along the y-axis. The two facets are logistic and mars. Both histogram looks fairly normally distributed, with a mean of 0.89 for logistic and 0.88 for mars. The full range is 0.84 to 0.93."
roc_model_via_df %>% 
  tidy() %>% 
  ggplot(aes(x = posterior)) + 
  geom_histogram(bins = 40, col = "blue", fill = "blue", alpha = .4) + 
  facet_wrap(~ model, ncol = 1) + 
  xlab("Area Under the ROC Curve")

See contrast_models() for how to analyze these distributions

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.