Propensity Score in Bayesian Causal Inference

Summary

The propensity score is central to Frequentist causal inference for ensuring balance, but under ignorability it drops from the Bayesian likelihood. Despite this, incorporating the propensity score into Bayesian analysis is essential for robust inference, particularly in observational studies with limited overlap. Three strategies exist: (1) include as a covariate in the outcome model, (2) use dependent priors linking assignment and outcome models, (3) posterior predictive p-values.

The Central Tension

A major debate in Bayesian causal inference concerns the role of the propensity score. The tension:

  • Likelihood argument: Under ignorability (Assumptions 2.1 and 3.2), the propensity score model drops from the likelihood. The Bayesian posterior for causal estimands depends only on the outcome model. In this view, the propensity score seems irrelevant.

  • Practical argument: The propensity score is ubiquitous in Frequentist causal inference for constructing IPW, matching, and doubly-robust estimators — all of which ensure overlap and balance, reducing sensitivity to the outcome model.

Resolution (Li et al.): Even though the propensity score is ignorable in the likelihood sense, the design stage (ensuring covariate overlap and balance) is critical regardless of inferential mode. Three strategies exist for incorporating the propensity score into Bayesian inference.

Strategy 1: Propensity Score as Covariate in Outcome Model

The propensity score was first proposed by Zigler (2016) as the only covariate in a Bayesian outcome model: .

The more common approach: include as an additional covariate in the outcome model alongside :

This effectively conducts outcome regression on propensity score strata.

Bayesian double robustness (Wang et al. 2012; Saarela et al. 2016):

  • When the outcome model is correct: reduces to because is a function of — so the propensity score is redundant
  • When the outcome model is misspecified: the results are robust because the treatment and control groups are approximately balanced in covariate propensity score strata

Implementation (BCF, Hahn et al. 2020): For the Bayesian Causal Forest, the propensity score enters the prognostic function — adding estimated as an input significantly improves empirical CATE estimation (see ^def-bcf).

Key subtlety: The propensity score enters the outcome model as a two-stage procedure:

  1. Estimate from data
  2. Plug into the Bayesian outcome model

This is not dogmatically Bayesian (it doesn’t propagate uncertainty from the first stage), but provides more robust posterior inference to model misspecification. The joint modeling approach (estimating and simultaneously) has a feedback problem: the outcome model fit informs propensity score estimation, distorting its balancing property and biasing causal estimates.

Strategy 2: Dependent Priors

Rather than modifying the likelihood, one imposes dependent priors that link the assignment and outcome models while keeping them mathematically separate.

Example 1 (Antonelli et al. 2019): Simultaneous variable selection for propensity score and outcome models.

  • Logistic propensity score model:
  • Linear outcome model:
  • Spike-and-slab priors on and with dependence hyperparameter
  • controls the strength of prior dependence: larger implies stronger prior that a variable selected in the outcome model is also selected in the propensity score model
  • Advantage: jointly selects variables relevant to either treatment or outcome, ensuring important confounders are included

Example 2 (Zigler & Dominici 2014):

  • and , with flat priors on and
  • The posterior mean of the PATE the Hájek IPW estimator:

when propensity scores are known. This provides a Bayesian justification for the Hájek IPW estimator.

Dependent priors achieve desirable finite-sample results and are more reasonable in real-world studies. However, specification is case-dependent and there is no general solution.

Strategy 3: Posterior Predictive P-values

A not-dogmatically-Bayesian strategy: specify both a propensity score model and an outcome model , obtain posterior draws from their respective predictive distributions, and plug the posterior draws into the doubly-robust estimator (see ^def-dr).

This gives a posterior predictive distribution of (Ding & Liu 2016).

  • Provides a straightforward way to integrate Bayesian modeling with Frequentist procedures (doubly-robust estimation)
  • Enables proper uncertainty quantification
  • Simulation studies show advantages over the Frequentist p-value (Ding & Liu 2016 §76)

Joint modeling alternative: Draw posterior inference simultaneously for and — but the feedback problem (outcome model informs propensity score estimation) violates unconfoundedness assumption and biases causal estimates. The suggested remedy: fit a Bayesian model for first, then plug in (§11 of paper).

Summary Table

StrategyApproachKey advantageLimitation
Covariate in outcome modelDouble robustness; practicalTwo-stage; not fully Bayesian
Dependent priorsJoint prior on Proper uncertainty propagation; finite-sample gainsCase-dependent; no general recipe
Posterior predictivePlug draws into Integrates Bayesian + FrequentistNot dogmatically Bayesian

Connections

See Also