Missing Data Models
Summary
Chapter 18 of BDA3 presents the Bayesian framework for handling missing data. Multiple imputation — drawing multiple plausible completions of the data from the posterior predictive distribution — propagates missing-data uncertainty into final inferences.
Missing Data Mechanisms
- MCAR (Missing Completely At Random): missingness independent of all data
- MAR (Missing At Random): missingness depends only on observed values — mechanism is ignorable
- MNAR (Missing Not At Random): missingness depends on the missing values — requires explicit modeling of the mechanism
Multiple Imputation
- Draw completed datasets from
- Analyze each completed dataset separately
- Combine results using Rubin’s rules:
- Point estimate:
- Variance: where is within-imputation variance and is between-imputation variance
Tip
In a fully Bayesian analysis, missing data are simply additional unknown parameters — they are sampled alongside model parameters in each MCMC iteration. Multiple imputation approximates this for non-Bayesian analyses.
Key Applications
- Polls with missing demographic data: imputing covariates for poststratification
- Counted data: handling partially observed counts (e.g., election data with missing precincts)
See Also
- Data Collection Models — ignorability conditions (Ch 8)
- Hierarchical Models — hierarchical imputation models
- Missing Data - Statistical Rethinking — DAG-based treatment of missing data mechanisms from Statistical Rethinking (companion note)
- MCMC Basics — in a fully Bayesian analysis, missing values are sampled alongside model parameters in each MCMC iteration