Model Comparison
Summary
Chapter 7 of BDA3 covers methods for evaluating, comparing, and expanding models based on predictive accuracy. The key measure is expected log predictive density (ELPD), estimated via cross-validation or information criteria.
Measures of Predictive Accuracy
The gold standard is the expected log pointwise predictive density (ELPD):
where is the true data-generating distribution. This must be estimated since is unknown.
Information Criteria
- AIC: — penalizes by number of parameters
- DIC: replaces with effective number of parameters
- WAIC (Widely Applicable IC): fully Bayesian, computed from the posterior:
Cross-Validation
- Leave-one-out CV (LOO-CV): gold standard but expensive
- Pareto-smoothed importance sampling (PSIS-LOO): efficient approximation using importance weights from the full posterior — implemented in the
looR package - Preferred over WAIC in practice due to better diagnostics ( diagnostic)
Bayes Factors
Warning
Bayes factors are sensitive to the prior, especially for vague priors. BDA3 generally recommends predictive approaches (LOO, WAIC) over Bayes factors for model comparison.
See Also
- Model Checking — qualitative model assessment via posterior predictive checks
- Overfitting and Information Criteria — the bias-variance tradeoff that information criteria are designed to manage
- Iterative Model Improvement — multiverse analysis and stacking as workflow tools (Section 8 of Bayesian Workflow)
- Evaluating Fitted Models — workflow perspective on diagnosing model fit
- Decision Analysis — using predictive distributions to make decisions, not just compare models
- Nonparametric Models Overview — flexible model classes where comparison tools are especially important