Fit Bayesian generalized (non-)linear multivariate multilevel models using Stan for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distributions can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation.

```
brm(
formula,
data,
family = gaussian(),
prior = NULL,
autocor = NULL,
data2 = NULL,
cov_ranef = NULL,
sample_prior = "no",
sparse = NULL,
knots = NULL,
drop_unused_levels = TRUE,
stanvars = NULL,
stan_funs = NULL,
fit = NA,
save_pars = NULL,
save_ranef = NULL,
save_mevars = NULL,
save_all_pars = NULL,
init = NULL,
inits = NULL,
chains = 4,
iter = 2000,
warmup = floor(iter/2),
thin = 1,
cores = getOption("mc.cores", 1),
threads = getOption("brms.threads", NULL),
opencl = getOption("brms.opencl", NULL),
normalize = getOption("brms.normalize", TRUE),
control = NULL,
algorithm = getOption("brms.algorithm", "sampling"),
backend = getOption("brms.backend", "rstan"),
future = getOption("future", FALSE),
silent = 1,
seed = NA,
save_model = NULL,
stan_model_args = list(),
file = NULL,
file_compress = TRUE,
file_refit = getOption("brms.file_refit", "never"),
empty = FALSE,
rename = TRUE,
...
)
```

- formula
An object of class

`formula`

,`brmsformula`

, or`mvbrmsformula`

(or one that can be coerced to that classes): A symbolic description of the model to be fitted. The details of model specification are explained in`brmsformula`

.- data
An object of class

`data.frame`

(or one that can be coerced to that class) containing data of all variables used in the model.- family
A description of the response distribution and link function to be used in the model. This can be a family function, a call to a family function or a character string naming the family. Every family function has a

`link`

argument allowing to specify the link function to be applied on the response variable. If not specified, default links are used. For details of supported families see`brmsfamily`

. By default, a linear`gaussian`

model is applied. In multivariate models,`family`

might also be a list of families.- prior
One or more

`brmsprior`

objects created by`set_prior`

or related functions and combined using the`c`

method or the`+`

operator. See also`get_prior`

for more help.- autocor
(Deprecated) An optional

`cor_brms`

object describing the correlation structure within the response variable (i.e., the 'autocorrelation'). See the documentation of`cor_brms`

for a description of the available correlation structures. Defaults to`NULL`

, corresponding to no correlations. In multivariate models,`autocor`

might also be a list of autocorrelation structures. It is now recommend to specify autocorrelation terms directly within`formula`

. See`brmsformula`

for more details.- data2
A named

`list`

of objects containing data, which cannot be passed via argument`data`

. Required for some objects used in autocorrelation structures to specify dependency structures as well as for within-group covariance matrices.- cov_ranef
(Deprecated) A list of matrices that are proportional to the (within) covariance structure of the group-level effects. The names of the matrices should correspond to columns in

`data`

that are used as grouping factors. All levels of the grouping factor should appear as rownames of the corresponding matrix. This argument can be used, among others to model pedigrees and phylogenetic effects. It is now recommended to specify those matrices in the formula interface using the`gr`

and related functions. See`vignette("brms_phylogenetics")`

for more details.- sample_prior
Indicate if draws from priors should be drawn additionally to the posterior draws. Options are

`"no"`

(the default),`"yes"`

, and`"only"`

. Among others, these draws can be used to calculate Bayes factors for point hypotheses via`hypothesis`

. Please note that improper priors are not sampled, including the default improper priors used by`brm`

. See`set_prior`

on how to set (proper) priors. Please also note that prior draws for the overall intercept are not obtained by default for technical reasons. See`brmsformula`

how to obtain prior draws for the intercept. If`sample_prior`

is set to`"only"`

, draws are drawn solely from the priors ignoring the likelihood, which allows among others to generate draws from the prior predictive distribution. In this case, all parameters must have proper priors.- sparse
(Deprecated) Logical; indicates whether the population-level design matrices should be treated as sparse (defaults to

`FALSE`

). For design matrices with many zeros, this can considerably reduce required memory. Sampling speed is currently not improved or even slightly decreased. It is now recommended to use the`sparse`

argument of`brmsformula`

and related functions.- knots
Optional list containing user specified knot values to be used for basis construction of smoothing terms. See

`gamm`

for more details.- drop_unused_levels
Should unused factors levels in the data be dropped? Defaults to

`TRUE`

.- stanvars
An optional

`stanvars`

object generated by function`stanvar`

to define additional variables for use in Stan's program blocks.- stan_funs
(Deprecated) An optional character string containing self-defined Stan functions, which will be included in the functions block of the generated Stan code. It is now recommended to use the

`stanvars`

argument for this purpose instead.- fit
An instance of S3 class

`brmsfit`

derived from a previous fit; defaults to`NA`

. If`fit`

is of class`brmsfit`

, the compiled model associated with the fitted result is re-used and all arguments modifying the model code or data are ignored. It is not recommended to use this argument directly, but to call the`update`

method, instead.- save_pars
An object generated by

`save_pars`

controlling which parameters should be saved in the model. The argument has no impact on the model fitting itself.- save_ranef
(Deprecated) A flag to indicate if group-level effects for each level of the grouping factor(s) should be saved (default is

`TRUE`

). Set to`FALSE`

to save memory. The argument has no impact on the model fitting itself.- save_mevars
(Deprecated) A flag to indicate if draws of latent noise-free variables obtained by using

`me`

and`mi`

terms should be saved (default is`FALSE`

). Saving these draws allows to better use methods such as`predict`

with the latent variables but leads to very large R objects even for models of moderate size and complexity.- save_all_pars
(Deprecated) A flag to indicate if draws from all variables defined in Stan's

`parameters`

block should be saved (default is`FALSE`

). Saving these draws is required in order to apply the methods`bridge_sampler`

,`bayes_factor`

, and`post_prob`

.- init
Initial values for the sampler. If

`NULL`

(the default) or`"random"`

, Stan will randomly generate initial values for parameters in a reasonable range. If`0`

, all parameters are initialized to zero on the unconstrained space. This option is sometimes useful for certain families, as it happens that default random initial values cause draws to be essentially constant. Generally, setting`init = 0`

is worth a try, if chains do not initialize or behave well. Alternatively,`init`

can be a list of lists containing the initial values, or a function (or function name) generating initial values. The latter options are mainly implemented for internal testing but are available to users if necessary. If specifying initial values using a list or a function then currently the parameter names must correspond to the names used in the generated Stan code (not the names used in R). For more details on specifying initial values you can consult the documentation of the selected`backend`

.- inits
(Deprecated) Alias of

`init`

.- chains
Number of Markov chains (defaults to 4).

- iter
Number of total iterations per chain (including warmup; defaults to 2000).

- warmup
A positive integer specifying number of warmup (aka burnin) iterations. This also specifies the number of iterations used for stepsize adaptation, so warmup draws should not be used for inference. The number of warmup should not be larger than

`iter`

and the default is`iter/2`

.- thin
Thinning rate. Must be a positive integer. Set

`thin > 1`

to save memory and computation time if`iter`

is large.- cores
Number of cores to use when executing the chains in parallel, which defaults to 1 but we recommend setting the

`mc.cores`

option to be as many processors as the hardware and RAM allow (up to the number of chains). For non-Windows OS in non-interactive R sessions, forking is used instead of PSOCK clusters.- threads
Number of threads to use in within-chain parallelization. For more control over the threading process,

`threads`

may also be a`brmsthreads`

object created by`threading`

. Within-chain parallelization is experimental! We recommend its use only if you are experienced with Stan's`reduce_sum`

function and have a slow running model that cannot be sped up by any other means. Can be set globally for the current R session via the`"brms.threads"`

option (see`options`

).- opencl
The platform and device IDs of the OpenCL device to use for fitting using GPU support. If you don't know the IDs of your OpenCL device,

`c(0,0)`

is most likely what you need. For more details, see`opencl`

. Can be set globally for the current R session via the`"brms.opencl"`

option- normalize
Logical. Indicates whether normalization constants should be included in the Stan code (defaults to

`TRUE`

). Setting it to`FALSE`

requires Stan version >= 2.25 to work. If`FALSE`

, sampling efficiency may be increased but some post processing functions such as`bridge_sampler`

will not be available. Can be controlled globally for the current R session via the `brms.normalize` option.- control
A named

`list`

of parameters to control the sampler's behavior. It defaults to`NULL`

so all the default values are used. The most important control parameters are discussed in the 'Details' section below. For a comprehensive overview see`stan`

.- algorithm
Character string naming the estimation approach to use. Options are

`"sampling"`

for MCMC (the default),`"meanfield"`

for variational inference with independent normal distributions,`"fullrank"`

for variational inference with a multivariate normal distribution, or`"fixed_param"`

for sampling from fixed parameter values. Can be set globally for the current R session via the`"brms.algorithm"`

option (see`options`

).- backend
Character string naming the package to use as the backend for fitting the Stan model. Options are

`"rstan"`

(the default) or`"cmdstanr"`

. Can be set globally for the current R session via the`"brms.backend"`

option (see`options`

). Details on the rstan and cmdstanr packages are available at https://mc-stan.org/rstan/ and https://mc-stan.org/cmdstanr/, respectively. Additionally a`"mock"`

backend is available to make testing brms and packages that depend on it easier. The`"mock"`

backend does not actually do any fitting, it only checks the generated Stan code for correctness and then returns whatever is passed in an additional`mock_fit`

argument as the result of the fit.- future
Logical; If

`TRUE`

, the future package is used for parallel execution of the chains and argument`cores`

will be ignored. Can be set globally for the current R session via the`"future"`

option. The execution type is controlled via`plan`

(see the examples section below).- silent
Verbosity level between

`0`

and`2`

. If`1`

(the default), most of the informational messages of compiler and sampler are suppressed. If`2`

, even more messages are suppressed. The actual sampling progress is still printed. Set`refresh = 0`

to turn this off as well. If using`backend = "rstan"`

you can also set`open_progress = FALSE`

to prevent opening additional progress bars.- seed
The seed for random number generation to make results reproducible. If

`NA`

(the default), Stan will set the seed randomly.- save_model
Either

`NULL`

or a character string. In the latter case, the model's Stan code is saved via`cat`

in a text file named after the string supplied in`save_model`

.- stan_model_args
A

`list`

of further arguments passed to`rstan::stan_model`

for`backend = "rstan"`

or to`cmdstanr::cmdstan_model`

for`backend = "cmdstanr"`

, which allows to change how models are compiled.- file
Either

`NULL`

or a character string. In the latter case, the fitted model object is saved via`saveRDS`

in a file named after the string supplied in`file`

. The`.rds`

extension is added automatically. If the file already exists,`brm`

will load and return the saved model object instead of refitting the model. Unless you specify the`file_refit`

argument as well, the existing files won't be overwritten, you have to manually remove the file in order to refit and save the model under an existing file name. The file name is stored in the`brmsfit`

object for later usage.- file_compress
Logical or a character string, specifying one of the compression algorithms supported by

`saveRDS`

. If the`file`

argument is provided, this compression will be used when saving the fitted model object.- file_refit
Modifies when the fit stored via the

`file`

argument is re-used. Can be set globally for the current R session via the`"brms.file_refit"`

option (see`options`

). For`"never"`

(default) the fit is always loaded if it exists and fitting is skipped. For`"always"`

the model is always refitted. If set to`"on_change"`

, brms will refit the model if model, data or algorithm as passed to Stan differ from what is stored in the file. This also covers changes in priors,`sample_prior`

,`stanvars`

, covariance structure, etc. If you believe there was a false positive, you can use`brmsfit_needs_refit`

to see why refit is deemed necessary. Refit will not be triggered for changes in additional parameters of the fit (e.g., initial values, number of iterations, control arguments, ...). A known limitation is that a refit will be triggered if within-chain parallelization is switched on/off.- empty
Logical. If

`TRUE`

, the Stan model is not created and compiled and the corresponding`'fit'`

slot of the`brmsfit`

object will be empty. This is useful if you have estimated a brms-created Stan model outside of brms and want to feed it back into the package.- rename
For internal use only.

- ...
Further arguments passed to Stan. For

`backend = "rstan"`

the arguments are passed to`sampling`

or`vb`

. For`backend = "cmdstanr"`

the arguments are passed to the`cmdstanr::sample`

or`cmdstanr::variational`

method.

An object of class `brmsfit`

, which contains the posterior
draws along with many other useful information about the model. Use

`methods(class = "brmsfit")`

for an overview on available methods.

Fit a generalized (non-)linear multivariate multilevel model via
full Bayesian inference using Stan. A general overview is provided in the
vignettes `vignette("brms_overview")`

and
`vignette("brms_multilevel")`

. For a full list of available vignettes
see `vignette(package = "brms")`

.

**Formula syntax of brms models**

Details of the formula syntax applied in brms can be found in
`brmsformula`

.

**Families and link functions**

Details of families supported by brms can be found in
`brmsfamily`

.

**Prior distributions**

Priors should be specified using the
`set_prior`

function. Its documentation
contains detailed information on how to correctly specify priors. To find
out on which parameters or parameter classes priors can be defined, use
`get_prior`

. Default priors are chosen to be
non or very weakly informative so that their influence on the results will
be negligible and you usually don't have to worry about them. However,
after getting more familiar with Bayesian statistics, I recommend you to
start thinking about reasonable informative priors for your model
parameters: Nearly always, there is at least some prior information
available that can be used to improve your inference.

**Adjusting the sampling behavior of Stan**

In addition to choosing the number of iterations, warmup draws, and
chains, users can control the behavior of the NUTS sampler, by using the
`control`

argument. The most important reason to use `control`

is
to decrease (or eliminate at best) the number of divergent transitions that
cause a bias in the obtained posterior draws. Whenever you see the
warning "There were x divergent transitions after warmup." you should
really think about increasing `adapt_delta`

. To do this, write
`control = list(adapt_delta = <x>)`

, where `<x>`

should usually
be value between `0.8`

(current default) and `1`

. Increasing
`adapt_delta`

will slow down the sampler but will decrease the number
of divergent transitions threatening the validity of your posterior
draws.

Another problem arises when the depth of the tree being evaluated in each
iteration is exceeded. This is less common than having divergent
transitions, but may also bias the posterior draws. When it happens,
Stan will throw out a warning suggesting to increase
`max_treedepth`

, which can be accomplished by writing ```
control =
list(max_treedepth = <x>)
```

with a positive integer `<x>`

that should
usually be larger than the current default of `10`

. For more details
on the `control`

argument see `stan`

.

Paul-Christian Buerkner (2017). brms: An R Package for Bayesian Multilevel
Models Using Stan. *Journal of Statistical Software*, 80(1), 1-28.
`doi:10.18637/jss.v080.i01`

Paul-Christian Buerkner (2018). Advanced Bayesian Multilevel Modeling
with the R Package brms. *The R Journal*. 10(1), 395–411.
`doi:10.32614/RJ-2018-017`

```
if (FALSE) {
# Poisson regression for the number of seizures in epileptic patients
# using normal priors for population-level effects
# and half-cauchy priors for standard deviations of group-level effects
prior1 <- prior(normal(0,10), class = b) +
prior(cauchy(0,2), class = sd)
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
data = epilepsy, family = poisson(), prior = prior1)
# generate a summary of the results
summary(fit1)
# plot the MCMC chains as well as the posterior distributions
plot(fit1, ask = FALSE)
# predict responses based on the fitted model
head(predict(fit1))
# plot conditional effects for each predictor
plot(conditional_effects(fit1), ask = FALSE)
# investigate model fit
loo(fit1)
pp_check(fit1)
# Ordinal regression modeling patient's rating of inhaler instructions
# category specific effects are estimated for variable 'treat'
fit2 <- brm(rating ~ period + carry + cs(treat),
data = inhaler, family = sratio("logit"),
prior = set_prior("normal(0,5)"), chains = 2)
summary(fit2)
plot(fit2, ask = FALSE)
WAIC(fit2)
# Survival regression modeling the time between the first
# and second recurrence of an infection in kidney patients.
fit3 <- brm(time | cens(censored) ~ age * sex + disease + (1|patient),
data = kidney, family = lognormal())
summary(fit3)
plot(fit3, ask = FALSE)
plot(conditional_effects(fit3), ask = FALSE)
# Probit regression using the binomial family
ntrials <- sample(1:10, 100, TRUE)
success <- rbinom(100, size = ntrials, prob = 0.4)
x <- rnorm(100)
data4 <- data.frame(ntrials, success, x)
fit4 <- brm(success | trials(ntrials) ~ x, data = data4,
family = binomial("probit"))
summary(fit4)
# Non-linear Gaussian model
fit5 <- brm(
bf(cum ~ ult * (1 - exp(-(dev/theta)^omega)),
ult ~ 1 + (1|AY), omega ~ 1, theta ~ 1,
nl = TRUE),
data = loss, family = gaussian(),
prior = c(
prior(normal(5000, 1000), nlpar = "ult"),
prior(normal(1, 2), nlpar = "omega"),
prior(normal(45, 10), nlpar = "theta")
),
control = list(adapt_delta = 0.9)
)
summary(fit5)
conditional_effects(fit5)
# Normal model with heterogeneous variances
data_het <- data.frame(
y = c(rnorm(50), rnorm(50, 1, 2)),
x = factor(rep(c("a", "b"), each = 50))
)
fit6 <- brm(bf(y ~ x, sigma ~ 0 + x), data = data_het)
summary(fit6)
plot(fit6)
conditional_effects(fit6)
# extract estimated residual SDs of both groups
sigmas <- exp(as.data.frame(fit6, variable = "^b_sigma_", regex = TRUE))
ggplot(stack(sigmas), aes(values)) +
geom_density(aes(fill = ind))
# Quantile regression predicting the 25%-quantile
fit7 <- brm(bf(y ~ x, quantile = 0.25), data = data_het,
family = asym_laplace())
summary(fit7)
conditional_effects(fit7)
# use the future package for more flexible parallelization
library(future)
plan(multiprocess)
fit7 <- update(fit7, future = TRUE)
# fit a model manually via rstan
scode <- make_stancode(count ~ Trt, data = epilepsy)
sdata <- make_standata(count ~ Trt, data = epilepsy)
stanfit <- rstan::stan(model_code = scode, data = sdata)
# feed the Stan model back into brms
fit8 <- brm(count ~ Trt, data = epilepsy, empty = TRUE)
fit8$fit <- stanfit
fit8 <- rename_pars(fit8)
summary(fit8)
}
```