Fit Bayesian generalized (non)linear multivariate multilevel models using Stan for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit  among others  linear, robust linear, count data, survival, response times, ordinal, zeroinflated, hurdle, and even selfdefined mixture models all in a multilevel context. Further modeling options include nonlinear and smooth terms, autocorrelation structures, censored data, metaanalytic standard errors, and quite a few more. In addition, all parameters of the response distributions can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with posterior predictive checks and leaveoneout crossvalidation.
brm( formula, data, family = gaussian(), prior = NULL, autocor = NULL, data2 = NULL, cov_ranef = NULL, sample_prior = "no", sparse = NULL, knots = NULL, stanvars = NULL, stan_funs = NULL, fit = NA, save_pars = NULL, save_ranef = NULL, save_mevars = NULL, save_all_pars = NULL, inits = "random", chains = 4, iter = 2000, warmup = floor(iter/2), thin = 1, cores = getOption("mc.cores", 1), threads = NULL, opencl = NULL, normalize = getOption("brms.normalize", TRUE), control = NULL, algorithm = getOption("brms.algorithm", "sampling"), backend = getOption("brms.backend", "rstan"), future = getOption("future", FALSE), silent = 1, seed = NA, save_model = NULL, stan_model_args = list(), file = NULL, file_refit = getOption("brms.file_refit", "never"), empty = FALSE, rename = TRUE, ... )
formula  An object of class 

data  An object of class 
family  A description of the response distribution and link function to
be used in the model. This can be a family function, a call to a family
function or a character string naming the family. Every family function has
a 
prior  One or more 
autocor  (Deprecated) An optional 
data2  A named 
cov_ranef  (Deprecated) A list of matrices that are proportional to the
(within) covariance structure of the grouplevel effects. The names of the
matrices should correspond to columns in 
sample_prior  Indicate if draws from priors should be drawn
additionally to the posterior draws. Options are 
sparse  (Deprecated) Logical; indicates whether the populationlevel
design matrices should be treated as sparse (defaults to 
knots  Optional list containing user specified knot values to be used
for basis construction of smoothing terms. See

stanvars  An optional 
stan_funs  (Deprecated) An optional character string containing
selfdefined Stan functions, which will be included in the functions
block of the generated Stan code. It is now recommended to use the

fit  An instance of S3 class 
save_pars  An object generated by 
save_ranef  (Deprecated) A flag to indicate if grouplevel effects for
each level of the grouping factor(s) should be saved (default is

save_mevars  (Deprecated) A flag to indicate if draws of latent
noisefree variables obtained by using 
save_all_pars  (Deprecated) A flag to indicate if draws from all
variables defined in Stan's 
inits  Either 
chains  Number of Markov chains (defaults to 4). 
iter  Number of total iterations per chain (including warmup; defaults to 2000). 
warmup  A positive integer specifying number of warmup (aka burnin)
iterations. This also specifies the number of iterations used for stepsize
adaptation, so warmup draws should not be used for inference. The number
of warmup should not be larger than 
thin  Thinning rate. Must be a positive integer. Set 
cores  Number of cores to use when executing the chains in parallel,
which defaults to 1 but we recommend setting the 
threads  Number of threads to use in withinchain parallelization. For
more control over the threading process, 
opencl  The platform and device IDs of the OpenCL device to use for
fitting using GPU support. If you don't know the IDs of your OpenCL
device, 
normalize  Logical. Indicates whether normalization constants should
be included in the Stan code (defaults to 
control  A named 
algorithm  Character string naming the estimation approach to use.
Options are 
backend  Character string naming the package to use as the backend for
fitting the Stan model. Options are 
future  Logical; If 
silent  Verbosity level between 
seed  The seed for random number generation to make results
reproducible. If 
save_model  Either 
stan_model_args  A 
file  Either 
file_refit  Modifies when the fit stored via the 
empty  Logical. If 
rename  For internal use only. 
...  Further arguments passed to Stan.
For 
An object of class brmsfit
, which contains the posterior
draws along with many other useful information about the model. Use
methods(class = "brmsfit")
for an overview on available methods.
Fit a generalized (non)linear multivariate multilevel model via
full Bayesian inference using Stan. A general overview is provided in the
vignettes vignette("brms_overview")
and
vignette("brms_multilevel")
. For a full list of available vignettes
see vignette(package = "brms")
.
Formula syntax of brms models
Details of the formula syntax applied in brms can be found in
brmsformula
.
Families and link functions
Details of families supported by brms can be found in
brmsfamily
.
Prior distributions
Priors should be specified using the
set_prior
function. Its documentation
contains detailed information on how to correctly specify priors. To find
out on which parameters or parameter classes priors can be defined, use
get_prior
. Default priors are chosen to be
non or very weakly informative so that their influence on the results will
be negligible and you usually don't have to worry about them. However,
after getting more familiar with Bayesian statistics, I recommend you to
start thinking about reasonable informative priors for your model
parameters: Nearly always, there is at least some prior information
available that can be used to improve your inference.
Adjusting the sampling behavior of Stan
In addition to choosing the number of iterations, warmup draws, and
chains, users can control the behavior of the NUTS sampler, by using the
control
argument. The most important reason to use control
is
to decrease (or eliminate at best) the number of divergent transitions that
cause a bias in the obtained posterior draws. Whenever you see the
warning "There were x divergent transitions after warmup." you should
really think about increasing adapt_delta
. To do this, write
control = list(adapt_delta = <x>)
, where <x>
should usually
be value between 0.8
(current default) and 1
. Increasing
adapt_delta
will slow down the sampler but will decrease the number
of divergent transitions threatening the validity of your posterior
draws.
Another problem arises when the depth of the tree being evaluated in each
iteration is exceeded. This is less common than having divergent
transitions, but may also bias the posterior draws. When it happens,
Stan will throw out a warning suggesting to increase
max_treedepth
, which can be accomplished by writing control =
list(max_treedepth = <x>)
with a positive integer <x>
that should
usually be larger than the current default of 10
. For more details
on the control
argument see stan
.
PaulChristian Buerkner (2017). brms: An R Package for Bayesian Multilevel
Models Using Stan. Journal of Statistical Software, 80(1), 128.
doi:10.18637/jss.v080.i01
PaulChristian Buerkner (2018). Advanced Bayesian Multilevel Modeling
with the R Package brms. The R Journal. 10(1), 395–411.
doi:10.32614/RJ2018017
PaulChristian Buerkner paul.buerkner@gmail.com
if (FALSE) { # Poisson regression for the number of seizures in epileptic patients # using normal priors for populationlevel effects # and halfcauchy priors for standard deviations of grouplevel effects prior1 < prior(normal(0,10), class = b) + prior(cauchy(0,2), class = sd) fit1 < brm(count ~ zAge + zBase * Trt + (1patient), data = epilepsy, family = poisson(), prior = prior1) # generate a summary of the results summary(fit1) # plot the MCMC chains as well as the posterior distributions plot(fit1, ask = FALSE) # predict responses based on the fitted model head(predict(fit1)) # plot conditional effects for each predictor plot(conditional_effects(fit1), ask = FALSE) # investigate model fit loo(fit1) pp_check(fit1) # Ordinal regression modeling patient's rating of inhaler instructions # category specific effects are estimated for variable 'treat' fit2 < brm(rating ~ period + carry + cs(treat), data = inhaler, family = sratio("logit"), prior = set_prior("normal(0,5)"), chains = 2) summary(fit2) plot(fit2, ask = FALSE) WAIC(fit2) # Survival regression modeling the time between the first # and second recurrence of an infection in kidney patients. fit3 < brm(time  cens(censored) ~ age * sex + disease + (1patient), data = kidney, family = lognormal()) summary(fit3) plot(fit3, ask = FALSE) plot(conditional_effects(fit3), ask = FALSE) # Probit regression using the binomial family ntrials < sample(1:10, 100, TRUE) success < rbinom(100, size = ntrials, prob = 0.4) x < rnorm(100) data4 < data.frame(ntrials, success, x) fit4 < brm(success  trials(ntrials) ~ x, data = data4, family = binomial("probit")) summary(fit4) # Nonlinear Gaussian model fit5 < brm( bf(cum ~ ult * (1  exp((dev/theta)^omega)), ult ~ 1 + (1AY), omega ~ 1, theta ~ 1, nl = TRUE), data = loss, family = gaussian(), prior = c( prior(normal(5000, 1000), nlpar = "ult"), prior(normal(1, 2), nlpar = "omega"), prior(normal(45, 10), nlpar = "theta") ), control = list(adapt_delta = 0.9) ) summary(fit5) conditional_effects(fit5) # Normal model with heterogeneous variances data_het < data.frame( y = c(rnorm(50), rnorm(50, 1, 2)), x = factor(rep(c("a", "b"), each = 50)) ) fit6 < brm(bf(y ~ x, sigma ~ 0 + x), data = data_het) summary(fit6) plot(fit6) conditional_effects(fit6) # extract estimated residual SDs of both groups sigmas < exp(as.data.frame(fit6, variable = "^b_sigma_", regex = TRUE)) ggplot(stack(sigmas), aes(values)) + geom_density(aes(fill = ind)) # Quantile regression predicting the 25%quantile fit7 < brm(bf(y ~ x, quantile = 0.25), data = data_het, family = asym_laplace()) summary(fit7) conditional_effects(fit7) # use the future package for more flexible parallelization library(future) plan(multiprocess) fit7 < update(fit7, future = TRUE) # fit a model manually via rstan scode < make_stancode(count ~ Trt, data = epilepsy) sdata < make_standata(count ~ Trt, data = epilepsy) stanfit < rstan::stan(model_code = scode, data = sdata) # feed the Stan model back into brms fit8 < brm(count ~ Trt, data = epilepsy, empty = TRUE) fit8$fit < stanfit fit8 < rename_pars(fit8) summary(fit8) }