General Structure of Bayesian Causal Inference

Summary

Bayesian causal inference treats the missing potential outcomes as parameters to be imputed. The full-data likelihood factorizes into three components: the assignment mechanism, the outcome model, and the covariate model. Under ignorability and prior independence (Assumption 3.2), inference for causal estimands depends only on the outcome model — but the propensity score model and design stage remain essential for valid inference.

Overview

Because causal inference involves missing potential outcomes, it is inherently a missing data problem. The Bayesian paradigm naturally handles missing data through imputation from the posterior predictive distribution — making it particularly well-suited for causal inference.

This section (§3 of Li et al. 2022) reviews the general Bayesian framework first outlined by Rubin (1978) and Li, Ding, Mealli (2022).

Setup

For each unit $i$ , four quantities are associated: ${Y_{i} (0), Y_{i} (1), Z_{i}, X_{i}}$ , where $Z_{i} = (Z_{1}, \dots, Z_{N})^{T}$ and $Y_{i} (1 - Z_{i})$ is missing.

Bayesian inference views all these quantities as random variables and specifies a model for them. Under the Bayesian model, the random variables for each unit are i.i.d. governed by a parameter $θ = (θ_{Z}, θ_{Y}, θ_{X})$ .

Full-Data Likelihood Factorization

Definition: Full-Data Likelihood Factorization

The joint distribution of the full data (observed and missing) for each unit $i$ factorizes as:
$Pr (Z_{i} ∣ Y_{i} (0), Y_{i} (1), X_{i}; θ) = Pr (Y_{i} (0), Y_{i} (1) ∣ X_{i}; θ_{Y}) \cdot Pr (Z_{i} ∣ X_{i}; θ_{Z})$
(under ignorability, the assignment mechanism further reduces to $Pr (Z_{i} ∣ X_{i}; θ_{Z})$ , the propensity score model)

The three terms represent:

Assignment mechanism model: $Pr (Z_{i} ∣ X_{i}; θ_{Z})$ — the propensity score model
Potential outcome model: $Pr (Y_{i} (0), Y_{i} (1) ∣ X_{i}; θ_{Y})$ — the outcome model
Covariate model: $Pr (X_{i}; θ_{X})$ — usually replaced by the empirical distribution $\hat{F}_{X}$

Prior Independence Assumption

Assumption 3.2 — Prior Independence

The parameters for the assignment mechanism $θ_{Z}$ , outcome $θ_{Y}$ , and covariates $θ_{X}$ are a priori distinct and independent:
$p (θ_{Z}, θ_{Y}, θ_{X}) = p (θ_{Z}) \cdot p (θ_{Y}) \cdot p (θ_{X})$

This assumption is:

Unique to Bayesian causal inference — imposed primarily for computational convenience
Potentially problematic in high dimensions — can act as a strongly informative prior (prior dogmatism), because independent priors on $θ_{Z}$ and $θ_{Y}$ implicitly constrain the marginal distributions of outcomes in each treatment group

Prior Dogmatism

In high dimensions, prior independence (Assumption 3.2) effectively acts as a strongly informative prior as $p$ increases. This is the Bayesian analogue of regularization-induced confounding (§4). Knowing why this happens is crucial for designing priors that avoid it.

Ignorability of the Propensity Score

Under Assumptions 2.1 (ignorability) and 3.2 (prior independence), the observed-data likelihood based on the factorization becomes:

i : Z_{i} = 1 \prod Pr (Y_{i} (1) ∣ X_{i}; θ_{Y}) \cdot i : Z_{i} = 0 \prod Pr (Y_{i} (0) ∣ X_{i}; θ_{Y})

Key result: the propensity score model $Pr (Z_{i} ∣ X_{i}; θ_{Z})$ is ignorable — it does not appear in the likelihood for causal estimands $τ^{S}$ , $τ^{P}$ , or $τ (x)$ .

The same ignorability argument applies to:

The covariate model $Pr (X_{i}; θ_{X})$ (in most settings)
Estimands that depend only on $θ_{Y}$ : SATE, PATE, CATE, MATE

Exception: the SATE involves both observed and missing potential outcomes ${Y_{i} (0), Y_{i} (1)}_{i = 1}^{N}$ , requiring imputation via the outcome model and data augmentation.

Posterior Inference for Causal Effects

Example 3.1 — Covariate Adjustment in a Randomized Experiment

Model potential outcomes as bivariate normal for each unit:
$(Y_{i} (1) Y_{i} (0)) (X_{i}, μ_{1}, β_{0}, σ_{0}^{2}, σ_{1}^{2}) \sim N ((β_{1}^{'} X_{i} β_{0}^{'} X_{i}), (σ_{1}^{2} ρ σ_{1} σ_{0} ρ σ_{1} σ_{0} σ_{0}^{2}))$

PATE: $τ^{P} = (β_{1} - β_{0})^{'} E (X_{i})$ — depends only on $θ_{X}$ and $θ_{Y}$ (not on $ρ$ )

SATE: $τ^{S} = (β_{1} - β_{0})^{'} \overset{ˉ}{X}$ — also independent of $ρ$

MATE: $τ^{M} = (β_{1} - β_{0})^{'} \overset{ˉ}{X}$ — same as SATE here

For the SATE, posterior inference requires specifying $ρ$ (association between potential outcomes). The posterior distribution of $τ^{S}$ would be sensitive to the prior on $ρ$ .

Bayesian inference for the SATE is more complex because it depends on $Y_{i} (0)$ and $Y_{i} (1)$ jointly, involving both observed and missing quantities. The most common sampling strategy is data augmentation: iteratively simulate $θ$ from its posterior given data and impute $Y_{i}^{mis}$ , then derive the posterior for any estimand.

Identifiability in the Bayesian Framework

In Bayesian inference, identifiability has a different character than in the Frequentist paradigm:

A parameter is identified if any two distinct values give different distributions of the observed data
Under the Bayesian paradigm, even non-identified parameters (like $ρ$ above) have posterior distributions — because the prior provides information
Gustafson (2015) proposed: a parameter is weakly/partially identifiable if a large region of its posterior is flat, or its posterior depends crucially on the prior even with large samples

This blurring of identifiability motivates transparent parametrization: separate identifiable from non-identifiable parameters, treating the latter as sensitivity parameters (see Sensitivity Analysis in Observational Studies).

Bayesian Bootstrap

An alternative general-purpose approach: the Bayesian bootstrap (Rubin, 1981). It simulates the posterior distribution of any parameter under a non-parametric Dirichlet process prior. It can incorporate IPW and doubly-robust estimators as M-estimation problems into Bayesian inference. However, it doesn’t capitalize on the main strength of Bayesian inference (versatile priors + unified framework).

Connections

Bayesian Outcome Models — specifying $Pr (Y_{i} (0), Y_{i} (1) ∣ X_{i}; θ_{Y})$ in detail
Propensity Score in Bayesian CI — why the propensity score drops from the likelihood but still matters
Potential Outcomes Framework — ignorability assumptions that enable this factorization
Causal Estimands — SATE vs. PATE vs. MATE distinctions in posterior inference

Second Brain

Explorer

General Structure of Bayesian Causal Inference

General Structure of Bayesian Causal Inference

Overview

Setup

Full-Data Likelihood Factorization

Prior Independence Assumption

Ignorability of the Propensity Score

Posterior Inference for Causal Effects

Identifiability in the Bayesian Framework

Bayesian Bootstrap

Connections

See Also

Graph View

Table of Contents

Backlinks