Data Collection Models
Summary
Chapter 8 of BDA3 addresses how the data collection process affects Bayesian inference. The key concept is ignorability: when the data collection mechanism can be safely ignored in the likelihood.
Ignorability
A data collection mechanism is ignorable if:
- The inclusion/missingness mechanism depends only on observed data (missing at random — MAR)
- The parameters of the data model and inclusion model are distinct (parameter distinctness)
When ignorable, we can perform inference using only the observed-data likelihood without modeling the selection process.
Applications
- Sample surveys: design weights and poststratification for non-representative samples
- Designed experiments: randomization ensures ignorability — connects to The Experimental Ideal
- Observational studies: ignorability is an assumption, not guaranteed — relates to The Selection Problem and Conditional Independence Assumption
- Censoring and truncation: requires explicit modeling when not ignorable
Connection to Causal Inference
The ignorability concept directly parallels the unconfoundedness assumption in causal inference. When treatment assignment is not ignorable (depends on unobserved potential outcomes), observational estimates are biased — see Activity Bias in Advertising for a dramatic example.
See Also
- Missing Data Models — explicit treatment of missing data (Ch 18)
- Omitted Variables Bias — what happens when ignorability fails
- Observational vs Experimental Methods in Advertising — observational methods failing
- Instrumental Variables — IV ensures ignorability through exogenous variation rather than conditioning
- The Selection Problem — the frequentist framing of the same challenge ignorability addresses
- Regression and the CEF — regression as an estimator when the data collection mechanism is ignorable
- Differences-in-Differences — panel fixed effects as an alternative when ignorability fails for observational studies
- Counterfactual Inference — counterfactual prediction relies on the same ignorability assumption (pre-COVID model applied forward)
- Spurious Association and Confounds — confounds are precisely the case where the data collection mechanism is not ignorable