Evaluating Fitted Models

Summary

Section 6 of Gelman et al. (2020) covers how to evaluate a fitted Bayesian model through posterior predictive checks, cross validation, sensitivity analysis of priors, and graphical exploration. The goal is not just to assess fit but to understand what the model captures and misses, guiding the next iteration of Iterative Model Improvement.

Posterior Predictive Checking

Posterior predictive checking (Box, 1980; Gelman et al., 1996) generates replicated datasets $y^{rep}$ from the posterior predictive distribution:

y^{rep} \sim p (y^{rep} ∣ y) = \int p (y^{rep} ∣ θ) p (θ ∣ y) d θ

Compare $y^{rep}$ to observed data $y$ using summary statistics or visual checks. If the observed data look unrepresentative of the posterior predictive distribution, the model fails to capture some aspect of the data. See Model Checking for foundational discussion.

Choosing What to Check

There is no general rule for which checks to perform. Focus on severe tests (Mayo, 2018) — checks that are likely to fail if the model would give misleading answers to the questions you care most about.

Types of checks implemented in bayesplot:

Density overlays — compare distribution of $y$ vs. $y^{rep}$
Statistic checks — compare test statistics (e.g., sd, max) between $y$ and $y^{rep}$
Grouped checks — compare $y$ vs. $y^{rep}$ by subgroups not in the model

Cross Validation and Influence of Data Points

Posterior predictive checking uses the same data for fitting and evaluation, which can be overly optimistic. Leave-one-out cross validation (LOO-CV) addresses this by evaluating predictive performance on held-out observations.

Three diagnostic uses of LOO-CV:

Calibration checks using the LOO predictive distribution (LOO-PIT values should be uniform under good calibration)
Identifying hard-to-predict observations — which data points have poor LOO scores?
Assessing observation influence — how much do individual points affect inferences?

Efficient LOO-CV via Pareto-smoothed importance sampling (PSIS; Vehtari et al., 2017) avoids refitting the model for each held-out point. See Model Comparison for use in model selection.

Influence of Prior Information

Understanding how priors affect the posterior is essential:

Sensitivity analysis: refit with alternative priors (e.g., $normal (0, 0.5)$ vs. $normal (0, 2)$ ) or use importance sampling to approximate the effect
Prior-to-posterior shrinkage: compare prior and posterior standard deviations; if the prior is informative for a parameter, shrinkage toward the prior should be visible
Static sensitivity analysis: plot posterior simulations of a quantity of interest against individual parameters to visualize dependence without refitting (Gelman, Bois, and Jiang, 1996)

Summarizing Inference and Propagating Uncertainty

Bayesian inference naturally handles uncertainty propagation through Hierarchical Models and latent variables. However, standard summaries (point estimates, intervals) often fail to capture the multiple levels of variation in complex models. Graphical exploration — plotting data alongside model-based estimates — is essential for understanding model behavior.

Gabry et al. (2019) advocate for graphics in Bayesian workflow, implemented in tools like bayesplot and ArviZ.

Second Brain

Explorer

Evaluating Fitted Models

Evaluating Fitted Models

Posterior Predictive Checking

Cross Validation and Influence of Data Points

Influence of Prior Information

Summarizing Inference and Propagating Uncertainty

See Also

Graph View

Table of Contents

Backlinks