Perform exact K-fold cross-validation by refitting the model \(K\) times each leaving out one-\(K\)th of the original data. Folds can be run in parallel using the future package.

```
# S3 method for brmsfit
kfold(
x,
...,
K = 10,
Ksub = NULL,
folds = NULL,
group = NULL,
joint = FALSE,
compare = TRUE,
resp = NULL,
model_names = NULL,
save_fits = FALSE,
recompile = NULL,
future_args = list()
)
```

- x
A

`brmsfit`

object.- ...
Further arguments passed to

`brm`

.- K
The number of subsets of equal (if possible) size into which the data will be partitioned for performing \(K\)-fold cross-validation. The model is refit

`K`

times, each time leaving out one of the`K`

subsets. If`K`

is equal to the total number of observations in the data then \(K\)-fold cross-validation is equivalent to exact leave-one-out cross-validation.- Ksub
Optional number of subsets (of those subsets defined by

`K`

) to be evaluated. If`NULL`

(the default), \(K\)-fold cross-validation will be performed on all subsets. If`Ksub`

is a single integer,`Ksub`

subsets (out of all`K`

) subsets will be randomly chosen. If`Ksub`

consists of multiple integers or a one-dimensional array (created via`as.array`

) potentially of length one, the corresponding subsets will be used. This argument is primarily useful, if evaluation of all subsets is infeasible for some reason.- folds
Determines how the subsets are being constructed. Possible values are

`NULL`

(the default),`"stratified"`

,`"grouped"`

, or`"loo"`

. May also be a vector of length equal to the number of observations in the data. Alters the way`group`

is handled. More information is provided in the 'Details' section.- group
Optional name of a grouping variable or factor in the model. What exactly is done with this variable depends on argument

`folds`

. More information is provided in the 'Details' section.- joint
Indicates which observations' log likelihoods shall be considered jointly in the ELPD computation. If

`"obs"`

or`FALSE`

(the default), each observation is considered separately. This enables comparability of`kfold`

with`loo`

. If`"fold"`

, the joint log likelihoods per fold are used. If`"group"`

, the joint log likelihoods per group within folds are used (only available if argument`group`

is specified).- compare
A flag indicating if the information criteria of the models should be compared to each other via

`loo_compare`

.- resp
Optional names of response variables. If specified, predictions are performed only for the specified response variables.

- model_names
If

`NULL`

(the default) will use model names derived from deparsing the call. Otherwise will use the passed values as model names.- save_fits
If

`TRUE`

, a component`fits`

is added to the returned object to store the cross-validated`brmsfit`

objects and the indices of the omitted observations for each fold. Defaults to`FALSE`

.- recompile
Logical, indicating whether the Stan model should be recompiled. This may be necessary if you are running

`reloo`

on another machine than the one used to fit the model.- future_args
A list of further arguments passed to

`future`

for additional control over parallel execution if activated.

`kfold`

returns an object that has a similar structure as the
objects returned by the `loo`

and `waic`

methods and
can be used with the same post-processing functions.

The `kfold`

function performs exact \(K\)-fold
cross-validation. First the data are partitioned into \(K\) folds
(i.e. subsets) of equal (or as close to equal as possible) size by default.
Then the model is refit \(K\) times, each time leaving out one of the
`K`

subsets. If \(K\) is equal to the total number of observations
in the data then \(K\)-fold cross-validation is equivalent to exact
leave-one-out cross-validation (to which `loo`

is an efficient
approximation). The `compare_ic`

function is also compatible with
the objects returned by `kfold`

.

The subsets can be constructed in multiple different ways:

If both

`folds`

and`group`

are`NULL`

, the subsets are randomly chosen so that they have equal (or as close to equal as possible) size.If

`folds`

is`NULL`

but`group`

is specified, the data is split up into subsets, each time omitting all observations of one of the factor levels, while ignoring argument`K`

.If

`folds = "stratified"`

the subsets are stratified after`group`

using`loo::kfold_split_stratified`

.If

`folds = "grouped"`

the subsets are split by`group`

using`loo::kfold_split_grouped`

.If

`folds = "loo"`

exact leave-one-out cross-validation will be performed and`K`

will be ignored. Further, if`group`

is specified, all observations corresponding to the factor level of the currently predicted single value are omitted. Thus, in this case, the predicted values are only a subset of the omitted ones.If

`folds`

is a numeric vector, it must contain one element per observation in the data. Each element of the vector is an integer in`1:K`

indicating to which of the`K`

folds the corresponding observation belongs. There are some convenience functions available in the loo package that create integer vectors to use for this purpose (see the Examples section below and also the kfold-helpers page).

When running `kfold`

on a `brmsfit`

created with the
cmdstanr backend in a different R session, several recompilations
will be triggered because by default, cmdstanr writes the model
executable to a temporary directory. To avoid that, set option
`"cmdstanr_write_stan_file_dir"`

to a nontemporary path of your choice
before creating the original `brmsfit`

(see section 'Examples' below).

```
if (FALSE) {
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient) + (1|obs),
data = epilepsy, family = poisson())
# throws warning about some pareto k estimates being too high
(loo1 <- loo(fit1))
# perform 10-fold cross validation
(kfold1 <- kfold(fit1, chains = 1))
# use joint likelihoods per fold for ELPD evaluation
kfold(fit1, chains = 1, joint = "fold")
# use the future package for parallelization of models
# that is to fit models belonging to different folds in parallel
library(future)
plan(multisession, workers = 4)
kfold(fit1, chains = 1)
plan(sequential)
## to avoid recompilations when running kfold() on a 'cmdstanr'-backend fit
## in a fresh R session, set option 'cmdstanr_write_stan_file_dir' before
## creating the initial 'brmsfit'
## CAUTION: the following code creates some files in the current working
## directory: two 'model_<hash>.stan' files, one 'model_<hash>(.exe)'
## executable, and one 'fit_cmdstanr_<some_number>.rds' file
set.seed(7)
fname <- paste0("fit_cmdstanr_", sample.int(.Machine$integer.max, 1))
options(cmdstanr_write_stan_file_dir = getwd())
fit_cmdstanr <- brm(rate ~ conc + state, data = Puromycin,
backend = "cmdstanr", file = fname)
# now restart the R session and run the following (after attaching 'brms')
set.seed(7)
fname <- paste0("fit_cmdstanr_", sample.int(.Machine$integer.max, 1))
fit_cmdstanr <- brm(rate ~ conc + state,
data = Puromycin,
backend = "cmdstanr",
file = fname)
kfold_cmdstanr <- kfold(fit_cmdstanr, K = 2)
}
```