posterior

The posterior R package

167
24
R

output: github_document

stopifnot(require(knitr))
options(width = 90)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  dev = "png",
  dpi = 150,
  fig.asp = 0.8,
  fig.width = 5,
  out.width = "60%",
  fig.align = "center"
)

posterior

CRAN
status
R-CMD-check
Coverage
Status

The posterior R package is intended to provide useful tools for both users
and developers of packages for fitting Bayesian models or working with output
from Bayesian models. The primary goals of the package are to:

  • Efficiently convert between many different useful formats of
    draws (samples) from posterior or prior distributions.
  • Provide consistent methods for operations commonly performed on draws,
    for example, subsetting, binding, or mutating draws.
  • Provide various summaries of draws in convenient formats.
  • Provide lightweight implementations of state of the art posterior inference
    diagnostics.

If you are new to posterior we recommend starting with these vignettes:

Installation

You can install the latest official release version via

install.packages("posterior")

or build the developmental version directly from GitHub via

# install.packages("remotes")
remotes::install_github("stan-dev/posterior")

Examples

Here we offer a few examples of using the package. For a more detailed overview
see the vignette The posterior R package.

library("posterior")

To demonstrate how to work with the posterior package, we will use example
posterior draws obtained from the eight schools hierarchical meta-analysis model
described in Gelman et al. (2013). Essentially, we have an estimate per school
(theta[1] through theta[8]) as well as an overall mean (mu) and standard
deviation across schools (tau).

Draws formats

eight_schools_array <- example_draws("eight_schools")
print(eight_schools_array, max_variables = 3)

The draws for this example come as a draws_array object, that is, an array
with dimensions iterations x chains x variables. We can easily transform it to
another format, for instance, a data frame with additional meta information.

eight_schools_df <- as_draws_df(eight_schools_array)
print(eight_schools_df)

Different formats are preferable in different situations and hence posterior
supports multiple formats and easy conversion between them. For more details on
the available formats see help("draws"). All of the formats are essentially
base R object classes and can be used as such. For example, a draws_matrix
object is just a matrix with a little more consistency and additional methods.

Summarizing draws

Computing summaries of posterior or prior draws and convergence diagnostics for
posterior draws is one of the most common tasks when working with Bayesian
models fit using Markov Chain Monte Carlo (MCMC) methods. The posterior
package provides a flexible interface for this purpose via summarise_draws():

# summarise_draws or summarize_draws
summarise_draws(eight_schools_df)

Basically, we get a data frame with one row per variable and one column per
summary statistic or convergence diagnostic. The summaries rhat, ess_bulk,
and ess_tail are described in Vehtari et al. (2020). We can choose which
summaries to compute by passing additional arguments, either functions or names
of functions. For instance, if we only wanted the mean and its corresponding
Monte Carlo Standard Error (MCSE) we would use:

summarise_draws(eight_schools_df, "mean", "mcse_mean")

For a function to work with summarise_draws, it needs to take a vector or
matrix of numeric values and returns a single numeric value or a named vector of
numeric values.

Subsetting draws

Another common task when working with posterior (or prior) draws, is subsetting
according to various aspects of the draws (iterations, chains, or variables).
posterior provides a convenient interface for this purpose via the
subset_draws() method. For example, here is the code to extract the first five
iterations of the first two chains of the variable mu:

subset_draws(eight_schools_df, variable = "mu", chain = 1:2, iteration = 1:5)

The same call to subset_draws() can be used regardless of whether the object
is a draws_df, draws_array, draws_list, etc.

Mutating and renaming draws

The magic of having obtained draws from the joint posterior (or prior)
distribution of a set of variables is that these draws can also be used
to obtain draws from any other variable that is a function of the original variables.
That is, if are interested in the posterior distribution of, say,
phi = (mu + tau)^2 all we have to do is to perform the transformation for each
of the individual draws to obtain draws from the posterior distribution of the
transformed variable. This procedure is automated in the mutate_variables method:

x <- mutate_variables(eight_schools_df, phi = (mu + tau)^2)
x <- subset_draws(x, c("mu", "tau", "phi"))
print(x)

When we do the math ourselves, we see that indeed for each draw,
phi is equal to (mu + tau)^2 (up to rounding two 2 digits
for the purpose of printing).

We may also easily rename variables, or even entire vectors of variables via
rename_variables, for example:

x <- rename_variables(eight_schools_df, mean = mu, alpha = theta)
variables(x)

As with all posterior methods, mutate_variables and rename_variables
can be used with all draws formats.

Binding draws together

Suppose we have multiple draws objects that we want to bind together:

x1 <- draws_matrix(alpha = rnorm(5), beta = 1)
x2 <- draws_matrix(alpha = rnorm(5), beta = 2)
x3 <- draws_matrix(theta = rexp(5))

Then, we can use the bind_draws method to bind them along different dimensions.
For example, we can bind x1 and x3 together along the 'variable' dimension:

x4 <- bind_draws(x1, x3, along = "variable")
print(x4)

Or, we can bind x1 and x2 together along the 'draw' dimension:

x5 <- bind_draws(x1, x2, along = "draw")
print(x5)

As with all posterior methods, bind_draws can be used with all draws
formats.

Converting from regular R objects to draws formats

The eight_schools example already comes in a format natively supported by
posterior but we could of course also import the draws from other sources,
for example, from common base R objects:

x <- matrix(rnorm(50), nrow = 10, ncol = 5)
colnames(x) <- paste0("V", 1:5)
x <- as_draws_matrix(x)
print(x)

summarise_draws(x, "mean", "sd", "median", "mad")

Instead of as_draws_matrix() we also could have just used as_draws(), which
attempts to find the closest available format to the input object. In this case
this would result in a draws_matrix object either way.

The above matrix example contained only one chain. Multi-chain draws could be
stored in base R 3-D array object, which can also be converted to a draws object:

x <- array(data=rnorm(200), dim=c(10, 2, 5))
x <- as_draws_matrix(x)
variables(x) <-  paste0("V", 1:5)
print(x)

Converting from mcmc objects to draws formats

The coda and rjags packages use mcmc and mcmc.list objects which
can also be converted to draws objects:

data(line, package = "coda")
line <- as_draws_df(line)
print(line)

Contributing to posterior

We welcome contributions! The posterior package is under active development.
If you find bugs or have ideas for new features (for us or yourself to
implement) please open an issue on GitHub
(https://github.com/stan-dev/posterior/issues).

Citing posterior

Developing and maintaining open source software is an important yet often
underappreciated contribution to scientific progress. Thus, whenever you are
using open source software (or software in general), please make sure to cite it
appropriately so that developers get credit for their work.

When using posterior, please cite it as follows:

  • Bürkner P. C., Gabry J., Kay M., & Vehtari A. (2020). “posterior: Tools for
    Working with Posterior Distributions.” R package version XXX, <URL:
    https://mc-stan.org/posterior/>.

When using the MCMC convergence diagnostics rhat, ess_bulk, ess_tail,
ess_median, ess_quantile, mcse_median, or mcse_quantile
please also cite

  • Vehtari A., Gelman A., Simpson D., Carpenter B., & Bürkner P. C. (2021).
    Rank-normalization, folding, and localization: An improved Rhat for assessing
    convergence of MCMC (with discussion). Bayesian Analysis. 16(2), 667–718.
    doi.org/10.1214/20-BA1221

When using the MCMC convergence diagnostic rhat_nested
please also cite

  • Margossian, C. C., Hoffman, M. D., Sountsov, P., Riou-Durand, L.,
    Vehtari, A., and Gelman, A. (2024).
    Nested $\widehat{R}$: Assessing the convergence of Markov chain
    Monte Carlo when running many short chains. Bayesian Analysis,
    doi:10.1214/24-BA1453.

When using the MCMC convergence diagnostic rstar
please also cite

  • Lambert, B. and Vehtari, A. (2022). $R^*$: A robust MCMC convergence
    diagnostic with uncertainty using decision tree classifiers.
    Bayesian Analysis, 17(2):353-379.
    doi:10.1214/20-BA1252

When using the Pareto-k diagnostics pareto_khat, pareto_min_ss,
pareto_convergence_rate, khat_threshold or pareto_diags, or
Pareto smoothing pareto_smooth please also cite

  • Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
    Pareto smoothed importance sampling.
    Journal of Machine Learning Research, 25(72):1-58.

The same information can be obtained by running citation("posterior").

References

Gelman A., Carlin J. B., Stern H. S., David B. Dunson D. B., Aki Vehtari A.,
& Rubin D. B. (2013). Bayesian Data Analysis, Third Edition. Chapman and
Hall/CRC.

Lambert, B. and Vehtari, A. (2022). $R^*$: A robust MCMC convergence
diagnostic with uncertainty using decision tree classifiers.
Bayesian Analysis, 17(2):353-379.
doi:10.1214/20-BA1252

Margossian, C. C., Hoffman, M. D., Sountsov, P., Riou-Durand, L.,
Vehtari, A., and Gelman, A. (2024).
Nested $\widehat{R}$: Assessing the convergence of Markov chain
Monte Carlo when running many short chains. Bayesian Analysis,
doi:10.1214/24-BA1453.

Vehtari A., Gelman A., Simpson D., Carpenter B., & Bürkner P. C. (2021).
Rank-normalization, folding, and localization: An improved Rhat for assessing
convergence of MCMC (with discussion). Bayesian Analysis. 16(2), 667–718.
doi.org/10.1214/20-BA1221

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).
Pareto smoothed importance sampling.
Journal of Machine Learning Research, 25(72):1-58.

Licensing

The posterior package is licensed under the following licenses: