General Structure of Bayesian Causal Inference

Summary

Bayesian causal inference treats the missing potential outcomes as parameters to be imputed. The full-data likelihood factorizes into three components: the assignment mechanism, the outcome model, and the covariate model. Under ignorability and prior independence (Assumption 3.2), inference for causal estimands depends only on the outcome model — but the propensity score model and design stage remain essential for valid inference.

Overview

Because causal inference involves missing potential outcomes, it is inherently a missing data problem. The Bayesian paradigm naturally handles missing data through imputation from the posterior predictive distribution — making it particularly well-suited for causal inference.

This section (§3 of Li et al. 2022) reviews the general Bayesian framework first outlined by Rubin (1978) and Li, Ding, Mealli (2022).

Setup

For each unit , four quantities are associated: , where and is missing.

Bayesian inference views all these quantities as random variables and specifies a model for them. Under the Bayesian model, the random variables for each unit are i.i.d. governed by a parameter .

Full-Data Likelihood Factorization

Definition: Full-Data Likelihood Factorization

The joint distribution of the full data (observed and missing) for each unit factorizes as:

(under ignorability, the assignment mechanism further reduces to , the propensity score model)

The three terms represent:

  1. Assignment mechanism model: — the propensity score model
  2. Potential outcome model: — the outcome model
  3. Covariate model: — usually replaced by the empirical distribution

Prior Independence Assumption

Assumption 3.2 — Prior Independence

The parameters for the assignment mechanism , outcome , and covariates are a priori distinct and independent:

This assumption is:

  • Unique to Bayesian causal inference — imposed primarily for computational convenience
  • Potentially problematic in high dimensions — can act as a strongly informative prior (prior dogmatism), because independent priors on and implicitly constrain the marginal distributions of outcomes in each treatment group

Prior Dogmatism

In high dimensions, prior independence (Assumption 3.2) effectively acts as a strongly informative prior as increases. This is the Bayesian analogue of regularization-induced confounding (§4). Knowing why this happens is crucial for designing priors that avoid it.

Ignorability of the Propensity Score

Under Assumptions 2.1 (ignorability) and 3.2 (prior independence), the observed-data likelihood based on the factorization becomes:

Key result: the propensity score model is ignorable — it does not appear in the likelihood for causal estimands , , or .

The same ignorability argument applies to:

  • The covariate model (in most settings)
  • Estimands that depend only on : SATE, PATE, CATE, MATE

Exception: the SATE involves both observed and missing potential outcomes , requiring imputation via the outcome model and data augmentation.

Posterior Inference for Causal Effects

Example 3.1 — Covariate Adjustment in a Randomized Experiment

Model potential outcomes as bivariate normal for each unit:

  • PATE: — depends only on and (not on )
  • SATE: — also independent of
  • MATE: — same as SATE here

For the SATE, posterior inference requires specifying (association between potential outcomes). The posterior distribution of would be sensitive to the prior on .

Bayesian inference for the SATE is more complex because it depends on and jointly, involving both observed and missing quantities. The most common sampling strategy is data augmentation: iteratively simulate from its posterior given data and impute , then derive the posterior for any estimand.

Identifiability in the Bayesian Framework

In Bayesian inference, identifiability has a different character than in the Frequentist paradigm:

  • A parameter is identified if any two distinct values give different distributions of the observed data
  • Under the Bayesian paradigm, even non-identified parameters (like above) have posterior distributions — because the prior provides information
  • Gustafson (2015) proposed: a parameter is weakly/partially identifiable if a large region of its posterior is flat, or its posterior depends crucially on the prior even with large samples

This blurring of identifiability motivates transparent parametrization: separate identifiable from non-identifiable parameters, treating the latter as sensitivity parameters (see Sensitivity Analysis in Observational Studies).

Bayesian Bootstrap

An alternative general-purpose approach: the Bayesian bootstrap (Rubin, 1981). It simulates the posterior distribution of any parameter under a non-parametric Dirichlet process prior. It can incorporate IPW and doubly-robust estimators as M-estimation problems into Bayesian inference. However, it doesn’t capitalize on the main strength of Bayesian inference (versatile priors + unified framework).

Connections

See Also