Use threads for within-chain parallelization in Stan via the brms interface. Within-chain parallelization is experimental! We recommend its use only if you are experienced with Stan's reduce_sum function and have a slow running model that cannot be sped up by any other means.

threading(threads = NULL, grainsize = NULL, static = FALSE)



Number of threads to use in within-chain parallelization.


Number of observations evaluated together in one chunk on one of the CPUs used for threading. If NULL (the default), grainsize is currently chosen as max(100, N / (2 * threads)), where N is the number of observations in the data. This default is experimental and may change in the future without prior notice.


Logical. Apply the static (non-adaptive) version of reduce_sum? Defaults to FALSE. Setting it to TRUE is required to achieve exact reproducibility of the model results (if the random seed is set as well).


A brmsthreads object which can be passed to the

threads argument of brm and related functions.


The adaptive scheduling procedure used by reduce_sum will prevent the results to be exactly reproducible even if you set the random seed. If you need exact reproducibility, you have to set argument static = TRUE which may reduce efficiency a bit.

To ensure that chunks (whose size is defined by grainsize) require roughly the same amount of computing time, we recommend storing observations in random order in the data. At least, please avoid sorting observations after the response values. This is because the latter often cause variations in the computing time of the pointwise log-likelihood, which makes up a big part of the parallelized code.


if (FALSE) {
# this model just serves as an illustration
# threading may not actually speed things up here
fit <- brm(count ~ zAge + zBase * Trt + (1|patient),
           data = epilepsy, family = negbinomial(),
           chains = 1, threads = threading(2, grainsize = 100),
           backend = "cmdstanr")