Similar to rstantools
for
rstan
, the instantiate
package builds
pre-compiled CmdStan
models into CRAN-ready statistical modeling R packages. The models
compile once during installation, the executables live inside the file
systems of their respective packages, and users have the full power and
convenience of CmdStanR
without any
additional compilation after package installation. This approach saves
time and helps R package developers migrate from
rstan
to the more modern
CmdStanR
.
The website at https://wlandau.github.io/instantiate/ includes a function reference and other documentation.
The instantiate
package depends on the R package
CmdStanR
and the command line tool
CmdStan
, so it is
important to follow these stages in order:
- Install the R package
CmdStanR
.CmdStanR
is not on CRAN, so the recommended way to install it isinstall.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
. - Optional: set environment variables
CMDSTAN_INSTALL
and/orCMDSTAN
to manage theCmdStan
installation. See the “Administering CmdStan” section below for details. - Install
instantiate
using one of the R commands below.
Type | Source | Command |
---|---|---|
Release | CRAN | install.packages("instantiate") |
Development | GitHub | remotes::install_github("wlandau/instantiate") |
Development | R-universe | install.packages("instantiate", repos = "https://wlandau.r-universe.dev") |
Packages that use instantiate
may be published on CRAN. CRAN does not
have CmdStan
, so the models are not pre-compiled in the Mac OS and
Windows binaries. If you install from CRAN, please install from the
source. For example:
install.packages("hdbayes", type = "source")
The instantiate
package uses environment variables to manage the
installation of
CmdStan
. An
environment variable is an operating system setting with a name and a
value (both text strings). In R, there are two ways to set environment
variables:
Sys.setenv()
, which sets environment variables temporarily for the current R session.- The
.Renviron
text file in you home directory, which passes environment variables to all new R sessions. theedit_r_environ()
function from theusethis
package helps.
By default, instantiate
looks for the copy of
CmdStan
located at
cmdstanr::install_cmdstan()
. If you upgrade
CmdStan
, then the path
returned by cmdstanr::install_cmdstan()
will change, which may not be
desirable in some cases. To permanently lock the path that instantiate
uses, follow these steps:
- Set the
CMDSTAN
environment variable to the desired path toCmdStan
. - Set the
CMDSTAN_INSTALL
environment variable to"fixed"
. - Install
instantiate
.
Henceforth, instantiate
will automatically use the
CmdStan
path from (1),
regardless of the value of CMDSTAN
after (3). To prefer
cmdstanr::cmdstan_path()
instead, you could do one of the following:
- Reinstall
instantiate
withCMDSTAN_INSTALL
not equal to"fixed"
, or - Set
CMDSTAN_INSTALL
to"implicit"
at runtime, or - Set the
cmdstan_install
argument to"implicit"
for the currentinstantiate
package function you are using.
The following section explains how to create an R package with
pre-compiled Stan models. This stage of the development workflow is
considered “runtime” for the purposes of administering
CmdStan
as described
previously.
Begin with an R package with one or more Stan model files inside the
src/stan/
directory. stan_package_create()
is a convenient way to
start.
stan_package_create(path = "package_folder")
#> Example package named "example" created at "package_folder". Run stan_package_configure(path = "package_folder") so that the built-in Stan model will compile when the package installs.
At minimum the package file structure should look something like this:
fs::dir_tree("package_folder")
#> package_folder
#> ├── DESCRIPTION
#> └── src
#> └── stan
#> └── bernoulli.stan
Configure the package so the Stan models compile during installation.
stan_package_configure()
writes required scripts cleanup
,
cleanup.win
, src/Makevars
, src/Makevars.win
, and
src/install.libs.R
. Inside src/install.libs.R
is a call to
instantiate::stan_package_compile()
which you can manually edit to
control how your models are compiled. For example, different calls to
stan_package_compile()
can compile different groups of models using
different C++ compiler flags.
fs::dir_tree("package_folder")
#> package_folder
#> ├── DESCRIPTION
#> ├── cleanup
#> ├── cleanup.win
#> └── src
#> ├── Makevars
#> ├── Makevars.win
#> ├── install.libs.R
#> └── stan
#> └── bernoulli.stan
Install the package just like you would any other R package. To install
it from your local copy of package_folder
, open R and run:
install.packages(pkgs = "package_folder", type = "source", repos = NULL)
A user can now run a model from the package without any additional
compilation. See the documentation of
CmdStanR
to learn how to
use CmdStanR
model objects.
library(example)
model <- stan_package_model(name = "bernoulli", package = "example")
print(model) # CmdStanR model object
#> data {
#> int<lower=0> N;
#> array[N] int<lower=0,upper=1> y;
#> }
#> parameters {
#> real<lower=0,upper=1> theta;
#> }
#> model {
#> theta ~ beta(1,1); // uniform prior on interval 0,1
#> y ~ bernoulli(theta);
#> }
fit <- model$sample(
data = list(N = 10, y = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0)),
refresh = 0,
iter_warmup = 2000,
iter_sampling = 4000
)
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.6 seconds.
fit$summary()
#> # A tibble: 2 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1 lp__ -8.15 -7.87 0.725 0.317 -9.60 -7.64 1.00 7365. 8498.
#> 2 theta 0.333 0.324 0.130 0.134 0.137 0.563 1.00 6229. 7560.
You can write an exported user-side function in your R package to access
the model. For example, you might store this code in a R/model.R
file
in the package:
#' @title Fit the Bernoulli model.
#' @export
#' @family models
#' @description Fit the Bernoulli Stan model and return posterior summaries.
#' @return A data frame of posterior summaries.
#' @param y Numeric vector of Bernoulli observations (zeroes and ones).
#' @param `...` Named arguments to the `sample()` method of CmdStan model
#' objects: <https://mc-stan.org/cmdstanr/reference/model-method-sample.html>
#' @examples
#' if (instantiate::stan_cmdstan_exists()) {
#' run_bernoulli_model(y = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0))
#' }
run_bernoulli_model <- function(y, ...) {
stopifnot(is.numeric(y) && all(y >= 0 & y <= 1))
model <- stan_package_model(name = "bernoulli", package = "mypackage")
fit <- model$sample(data = list(N = length(y), y = y), ...)
fit$summary()
}
- In your package
DESCRIPTION
file, list https://mc-stan.org/r-packages/ in theAdditional_repositories:
field (example inbrms
). This step is only necessary whilecmdstanr
is not yet on CRAN.
Additional_repositories:
https://mc-stan.org/r-packages/
- In your package
DESCRIPTION
andNAMESPACE
files, import theinstantiate
package and functionstan_package_model()
. - Write user-side statistical modeling functions which call the models in your package as mentioned above.
CmdStan
is too big for CRAN, soinstantiate
will not be able to access it there. So if you plan to submit your package to CRAN, please skip the appropriate code in your examples, vignettes, and tests wheninstantiate::stan_cmdstan_exists()
isFALSE
. Explicitif()
statements like the above one in theroxygen2
@examples
work for examples and vignettes. For tests, it is convenient to usetestthat::skip_if_not()
, e.g.skip_if_not(stan_cmdstan_exists())
.pkgload::load_all()
might not compile your models. If you usepkgload
ordevtools
to load and develop your package, you may need to callinstantiate::stan_package_compile()
from the root directory of your package to compile your models manually.- For version
control,
it is best practice to commit only source code files and
documentation. Please do not commit any compiled executable Stan
model files to your repository. If you do commit them, then other
users with different machines will have trouble installing your
package, and your commit history will consume too much storage. For
Git, you may add the following lines to the
.gitigore
file at the root of your package:
src/stan/**
!src/stan/**/*.*
src/stan/**/*.exe
src/stan/**/*.EXE
- For continuous integration
(e.g. on GitHub Actions), please
use
cmdstanr
-based installation as explained above, and tweak your workflow YAML files as explained in that section. - For general information on R package development, please consult the free online book R Packages (2e) by Hadley Wickham and Jennifer Bryan, as well as the official manual on Writing R Extensions by the R Core Team.
Please note that the instantiate
project is released with a
Contributor Code of
Conduct.
By contributing to this project, you agree to abide by its terms.
To cite package ‘instantiate’ in publications use:
Landau WM (2023). _instantiate: A Minimal CmdStan Client for R Packages_.
https://wlandau.github.io/instantiate/, https://github.com/wlandau/instantiate.
A BibTeX entry for LaTeX users is
@Manual{,
title = {instantiate: A Minimal CmdStan Client for R Packages},
author = {William Michael Landau},
year = {2023},
note = {https://wlandau.github.io/instantiate/,
https://github.com/wlandau/instantiate},
}