Summary statistics

We can easily customise the summary statistics reported by $summary() and $print().

fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
fit$summary()
Warning: 302 of 4000 (8.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
Warning: 1 of 4 chains had an E-BFMI less than 0.2.
See https://mc-stan.org/misc/warnings for details.
   variable  mean median  sd mad      q5 q95 rhat ess_bulk ess_tail
1      lp__ -56.7  -57.1 6.1 6.4 -66.008 -46  1.1       33       19
2        mu   6.6    6.7 4.2 4.5   0.025  13  1.0      124      879
3       tau   4.7    3.9 3.6 3.4   0.790  12  1.1       33       22
4  theta[1]   9.1    8.5 6.8 6.0  -0.082  21  1.0      163      405
5  theta[2]   7.0    6.9 5.5 5.8  -1.456  16  1.0      236     1810
6  theta[3]   5.7    6.1 6.4 6.2  -4.835  16  1.0      313     1466
7  theta[4]   6.8    6.9 5.9 5.8  -2.355  16  1.0      246     1356
8  theta[5]   4.9    5.2 5.8 5.5  -5.038  13  1.0      219      932
9  theta[6]   5.7    5.9 5.8 5.8  -4.073  15  1.0      278     1208
10 theta[7]   8.9    8.6 6.0 5.7   0.094  19  1.0      176      352
11 theta[8]   7.0    7.1 6.5 5.9  -3.088  18  1.0      317     1709

By default all variables are summaries with the follow functions:

[1] "mean"      "median"    "sd"        "mad"       "quantile2"

To change the variables summarised, we use the variables argument

fit$summary(variables = c("mu", "tau"))
  variable mean median  sd mad    q5 q95 rhat ess_bulk ess_tail
1       mu  6.6    6.7 4.2 4.5 0.025  13  1.0      124      879
2      tau  4.7    3.9 3.6 3.4 0.790  12  1.1       33       22

We can additionally change which functions are used

fit$summary(variables = c("mu", "tau"), mean, sd)
  variable mean  sd
1       mu  6.6 4.2
2      tau  4.7 3.6

To summarise all variables with non-default functions, it is necessary to set explicitly set the variables argument, either to NULL or the full vector of variable names.

fit$metadata()$model_params
fit$summary(variables = NULL, "mean", "median")
 [1] "lp__"     "mu"       "tau"      "theta[1]" "theta[2]" "theta[3]"
 [7] "theta[4]" "theta[5]" "theta[6]" "theta[7]" "theta[8]"
   variable  mean median
1      lp__ -56.7  -57.1
2        mu   6.6    6.7
3       tau   4.7    3.9
4  theta[1]   9.1    8.5
5  theta[2]   7.0    6.9
6  theta[3]   5.7    6.1
7  theta[4]   6.8    6.9
8  theta[5]   4.9    5.2
9  theta[6]   5.7    5.9
10 theta[7]   8.9    8.6
11 theta[8]   7.0    7.1

Summary functions can be specified by character string, function, or using a formula (or anything else supported by [rlang::as_function]). If these arguments are named, those names will be used in the tibble output. If the summary results are named they will take precedence.

my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
  c("mu", "tau"), 
  MEAN = mean, 
  "median",
  my_sd,
  ~quantile(.x, probs = c(0.1, 0.9)),
  Minimum = function(x) min(x)
)        
  variable MEAN median My_SD 10%  90% Minimum
1       mu  6.6    6.7   4.2 1.3 11.7  -11.23
2      tau  4.7    3.9   3.6 1.1  9.6    0.53

Arguments to all summary functions can also be specified with .args.

fit$summary(c("mu", "tau"), quantile, .args = list(probs = c(0.025, .05, .95, .975)))
  variable  2.5%    5% 95% 97.5%
1       mu -1.17 0.025  13    15
2      tau  0.59 0.790  12    13

The summary functions are applied to the array of sample values, with dimension iter_samplingxchains.

fit$summary(variables = NULL, dim, colMeans)
   variable dim.1 dim.2     1     2     3     4
1      lp__  1000     4 -58.0 -55.7 -55.7 -57.3
2        mu  1000     4   6.9   7.5   5.4   6.8
3       tau  1000     4   5.2   4.3   4.4   4.9
4  theta[1]  1000     4  10.0   9.7   7.6   9.1
5  theta[2]  1000     4   7.1   8.0   5.8   7.2
6  theta[3]  1000     4   5.7   6.6   4.5   5.9
7  theta[4]  1000     4   7.2   7.7   5.6   6.7
8  theta[5]  1000     4   4.9   6.0   4.0   4.9
9  theta[6]  1000     4   5.7   6.7   4.8   5.7
10 theta[7]  1000     4   9.3   9.5   7.5   9.2
11 theta[8]  1000     4   7.0   8.0   5.9   7.0

For this reason users may have unexpected results if they use stats::var() directly, as it will return a covariance matrix. An alternative is the distributional::variance() function, which can also be accessed via posterior::variance().

fit$summary(c("mu", "tau"), posterior::variance, ~var(as.vector(.x)))
  variable posterior::variance ~var(as.vector(.x))
1       mu                  18                  18
2      tau                  13                  13

Summary functions need not be numeric, but these won’t work with $print().

strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)
   variable Strictly Positive
1      lp__                no
2        mu                no
3       tau               yes
4  theta[1]                no
5  theta[2]                no
6  theta[3]                no
7  theta[4]                no
8  theta[5]                no
9  theta[6]                no
10 theta[7]                no
11 theta[8]                no

For more information, see posterior::summarise_draws(), which is called by $summary().

Extracting posterior draws/samples

The $draws() method can be used to extract the posterior draws in formats provided by the posterior package. Here we demonstrate only the draws_array and draws_df formats, but the posterior package supports other useful formats as well.

# default is a 3-D draws_array object from the posterior package
# iterations x chains x variables
draws_arr <- fit$draws() # or format="array"
str(draws_arr)
 'draws_array' num [1:1000, 1:4, 1:11] -66.1 -68.2 -67.1 -62.4 -65.6 ...
 - attr(*, "dimnames")=List of 3
  ..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
  ..$ chain    : chr [1:4] "1" "2" "3" "4"
  ..$ variable : chr [1:11] "lp__" "mu" "tau" "theta[1]" ...
# draws x variables data frame
draws_df <- fit$draws(format = "df")
str(draws_df)
draws_df [4,000 × 14] (S3: draws_df/draws/tbl_df/tbl/data.frame)
 $ lp__      : num [1:4000] -66.1 -68.2 -67.1 -62.4 -65.6 ...
 $ mu        : num [1:4000] -2.42 9.44 2.99 2.91 6.73 ...
 $ tau       : num [1:4000] 12.21 6.46 17.66 8.04 8.8 ...
 $ theta[1]  : num [1:4000] 5.57 11.03 -2.77 1.5 8.91 ...
 $ theta[2]  : num [1:4000] 6.97 3.31 6.77 12.84 5.79 ...
 $ theta[3]  : num [1:4000] 8.21 15.21 -8.08 -5.34 -19.54 ...
 $ theta[4]  : num [1:4000] 19.75 19.47 -7.42 -5.76 7.54 ...
 $ theta[5]  : num [1:4000] -4.12 -5.77 6.01 5.63 -3.23 ...
 $ theta[6]  : num [1:4000] -4.03 2.55 2.99 2.86 15.21 ...
 $ theta[7]  : num [1:4000] -0.186 -2.004 10.11 7.803 14.427 ...
 $ theta[8]  : num [1:4000] 0.0702 -3.005 11.0116 14.5279 14.1928 ...
 $ .chain    : int [1:4000] 1 1 1 1 1 1 1 1 1 1 ...
 $ .iteration: int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
 $ .draw     : int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
print(draws_df)
# A draws_df: 1000 iterations, 4 chains, and 11 variables
   lp__   mu  tau theta[1] theta[2] theta[3] theta[4] theta[5]
1   -66 -2.4 12.2      5.6     6.97      8.2    19.75    -4.12
2   -68  9.4  6.5     11.0     3.31     15.2    19.47    -5.77
3   -67  3.0 17.7     -2.8     6.77     -8.1    -7.42     6.01
4   -62  2.9  8.0      1.5    12.84     -5.3    -5.76     5.63
5   -66  6.7  8.8      8.9     5.79    -19.5     7.54    -3.23
6   -64  5.3 11.4     18.5    13.37     -1.4    15.97    -0.61
7   -60  7.3  9.1      8.6     7.82      3.5     0.34    -2.00
8   -60  6.3  8.5      7.7     0.51      7.5    -0.99     1.51
9   -59  1.9  6.9      3.0     8.63      1.4     3.70     4.72
10  -63  9.1  9.3     16.0     5.77      3.9     4.14   -10.34
# ... with 3990 more draws, and 3 more variables
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

To convert an existing draws object to a different format use the posterior::as_draws_*() functions.

To manipulate the draws objects use the various methods described in the posterior package vignettes and documentation.