General Structure of Bayesian Causal Inference
Summary
Bayesian causal inference treats the missing potential outcomes as parameters to be imputed. The full-data likelihood factorizes into three components: the assignment mechanism, the outcome model, and the covariate model. Under ignorability and prior independence (Assumption 3.2), inference for causal estimands depends only on the outcome model — but the propensity score model and design stage remain essential for valid inference.
Overview
Because causal inference involves missing potential outcomes, it is inherently a missing data problem. The Bayesian paradigm naturally handles missing data through imputation from the posterior predictive distribution — making it particularly well-suited for causal inference.
This section (§3 of Li et al. 2022) reviews the general Bayesian framework first outlined by Rubin (1978) and Li, Ding, Mealli (2022).
Setup
For each unit , four quantities are associated: , where and is missing.
Bayesian inference views all these quantities as random variables and specifies a model for them. Under the Bayesian model, the random variables for each unit are i.i.d. governed by a parameter .
Full-Data Likelihood Factorization
Definition: Full-Data Likelihood Factorization
The joint distribution of the full data (observed and missing) for each unit factorizes as:
(under ignorability, the assignment mechanism further reduces to , the propensity score model)
The three terms represent:
- Assignment mechanism model: — the propensity score model
- Potential outcome model: — the outcome model
- Covariate model: — usually replaced by the empirical distribution
Prior Independence Assumption
Assumption 3.2 — Prior Independence
The parameters for the assignment mechanism , outcome , and covariates are a priori distinct and independent:
This assumption is:
- Unique to Bayesian causal inference — imposed primarily for computational convenience
- Potentially problematic in high dimensions — can act as a strongly informative prior (prior dogmatism), because independent priors on and implicitly constrain the marginal distributions of outcomes in each treatment group
Prior Dogmatism
In high dimensions, prior independence (Assumption 3.2) effectively acts as a strongly informative prior as increases. This is the Bayesian analogue of regularization-induced confounding (§4). Knowing why this happens is crucial for designing priors that avoid it.
Ignorability of the Propensity Score
Under Assumptions 2.1 (ignorability) and 3.2 (prior independence), the observed-data likelihood based on the factorization becomes:
Key result: the propensity score model is ignorable — it does not appear in the likelihood for causal estimands , , or .
The same ignorability argument applies to:
- The covariate model (in most settings)
- Estimands that depend only on : SATE, PATE, CATE, MATE
Exception: the SATE involves both observed and missing potential outcomes , requiring imputation via the outcome model and data augmentation.
Posterior Inference for Causal Effects
Example 3.1 — Covariate Adjustment in a Randomized Experiment
Model potential outcomes as bivariate normal for each unit:
- PATE: — depends only on and (not on )
- SATE: — also independent of
- MATE: — same as SATE here
For the SATE, posterior inference requires specifying (association between potential outcomes). The posterior distribution of would be sensitive to the prior on .
Bayesian inference for the SATE is more complex because it depends on and jointly, involving both observed and missing quantities. The most common sampling strategy is data augmentation: iteratively simulate from its posterior given data and impute , then derive the posterior for any estimand.
Identifiability in the Bayesian Framework
In Bayesian inference, identifiability has a different character than in the Frequentist paradigm:
- A parameter is identified if any two distinct values give different distributions of the observed data
- Under the Bayesian paradigm, even non-identified parameters (like above) have posterior distributions — because the prior provides information
- Gustafson (2015) proposed: a parameter is weakly/partially identifiable if a large region of its posterior is flat, or its posterior depends crucially on the prior even with large samples
This blurring of identifiability motivates transparent parametrization: separate identifiable from non-identifiable parameters, treating the latter as sensitivity parameters (see Sensitivity Analysis in Observational Studies).
Bayesian Bootstrap
An alternative general-purpose approach: the Bayesian bootstrap (Rubin, 1981). It simulates the posterior distribution of any parameter under a non-parametric Dirichlet process prior. It can incorporate IPW and doubly-robust estimators as M-estimation problems into Bayesian inference. However, it doesn’t capitalize on the main strength of Bayesian inference (versatile priors + unified framework).
Connections
- Bayesian Outcome Models — specifying in detail
- Propensity Score in Bayesian CI — why the propensity score drops from the likelihood but still matters
- Potential Outcomes Framework — ignorability assumptions that enable this factorization
- Causal Estimands — SATE vs. PATE vs. MATE distinctions in posterior inference
See Also
- Sensitivity Analysis in Observational Studies — transparent parametrization for non-identified parameters
- Bayesian Propensity Scores and IPW — practical Bayesian IPW approaches