Propensity Score in Bayesian Causal Inference

Summary

The propensity score $e (x) = Pr (Z = 1 ∣ X = x)$ is central to Frequentist causal inference for ensuring balance, but under ignorability it drops from the Bayesian likelihood. Despite this, incorporating the propensity score into Bayesian analysis is essential for robust inference, particularly in observational studies with limited overlap. Three strategies exist: (1) include as a covariate in the outcome model, (2) use dependent priors linking assignment and outcome models, (3) posterior predictive p-values.

The Central Tension

A major debate in Bayesian causal inference concerns the role of the propensity score. The tension:

Likelihood argument: Under ignorability (Assumptions 2.1 and 3.2), the propensity score model $Pr (Z_{i} ∣ X_{i}; θ_{Z})$ drops from the likelihood. The Bayesian posterior for causal estimands depends only on the outcome model. In this view, the propensity score seems irrelevant.
Practical argument: The propensity score is ubiquitous in Frequentist causal inference for constructing IPW, matching, and doubly-robust estimators — all of which ensure overlap and balance, reducing sensitivity to the outcome model.

Resolution (Li et al.): Even though the propensity score is ignorable in the likelihood sense, the design stage (ensuring covariate overlap and balance) is critical regardless of inferential mode. Three strategies exist for incorporating the propensity score into Bayesian inference.

Strategy 1: Propensity Score as Covariate in Outcome Model

The propensity score was first proposed by Zigler (2016) as the only covariate in a Bayesian outcome model: $μ (Y ∣ e (X)) = Pr (Y (z) ∣ e (X))$ .

The more common approach: include $\overset{e}{^} (X)$ as an additional covariate in the outcome model alongside $X$ :

μ (z, x, e (x))

This effectively conducts outcome regression on propensity score strata.

Bayesian double robustness (Wang et al. 2012; Saarela et al. 2016):

When the outcome model is correct: $μ (z, X_{i}, e (X_{i}))$ reduces to $μ (z, X_{i})$ because $e (X_{i})$ is a function of $X_{i}$ — so the propensity score is redundant
When the outcome model is misspecified: the results are robust because the treatment and control groups are approximately balanced in covariate propensity score strata

Implementation (BCF, Hahn et al. 2020): For the Bayesian Causal Forest, the propensity score enters the prognostic function $g_{1} (\cdot)$ — adding estimated $\overset{e}{^} (X)$ as an input significantly improves empirical CATE estimation (see ^def-bcf).

Key subtlety: The propensity score enters the outcome model as a two-stage procedure:

Estimate $\overset{e}{^} (X)$ from data
Plug $\overset{e}{^} (X)$ into the Bayesian outcome model $μ (z, x, \overset{e}{^} (x))$

This is not dogmatically Bayesian (it doesn’t propagate uncertainty from the first stage), but provides more robust posterior inference to model misspecification. The joint modeling approach (estimating $e$ and $μ$ simultaneously) has a feedback problem: the outcome model fit informs propensity score estimation, distorting its balancing property and biasing causal estimates.

Strategy 2: Dependent Priors

Rather than modifying the likelihood, one imposes dependent priors that link the assignment and outcome models while keeping them mathematically separate.

Example 1 (Antonelli et al. 2019): Simultaneous variable selection for propensity score and outcome models.

Logistic propensity score model: $logit Pr (Z_{i} = 1 ∣ X_{i}) = α^{'} X_{i}$
Linear outcome model: $Y_{i} ∣ Z_{i}, X_{i} \sim N (Z_{i} β^{T} + β^{T} X_{i}, σ^{2})$
Spike-and-slab priors on $α_{j}$ and $β_{j}$ with dependence hyperparameter $ω \in [1, \infty)$
$ω$ controls the strength of prior dependence: larger $ω$ implies stronger prior that a variable selected in the outcome model is also selected in the propensity score model
Advantage: jointly selects variables relevant to either treatment or outcome, ensuring important confounders are included

Example 2 (Zigler & Dominici 2014):

$Y_{i} (1) ∣ X_{i} \sim N (μ_{1}, σ_{1}^{2} (e (X_{i})))$ and $Y_{i} (0) ∣ X_{i} \sim N (μ_{0}, σ_{0}^{2} (1 - e (X_{i})))$ , with flat priors on $μ_{1}$ and $μ_{0}$
The posterior mean of the PATE $\approx$ the Hájek IPW estimator:

\overset{τ}{^}^{H \overset{a}{ˊ} jek} = \frac{\sum _{i} Z _{i} Y _{i} / e ( X _{i} )}{\sum _{i} Z _{i} / e ( X _{i} )} - \frac{\sum _{i} ( 1 - Z _{i} ) Y _{i} / ( 1 - e ( X _{i} ))}{\sum _{i} ( 1 - Z _{i} ) / ( 1 - e ( X _{i} ))}

when propensity scores are known. This provides a Bayesian justification for the Hájek IPW estimator.

Dependent priors achieve desirable finite-sample results and are more reasonable in real-world studies. However, specification is case-dependent and there is no general solution.

Strategy 3: Posterior Predictive P-values

A not-dogmatically-Bayesian strategy: specify both a propensity score model $e (X_{i}; θ_{Z})$ and an outcome model $μ (X_{i}; θ_{Y})$ , obtain posterior draws from their respective predictive distributions, and plug the posterior draws into the doubly-robust estimator $\overset{τ}{^}^{DR}$ (see ^def-dr).

This gives a posterior predictive distribution of $\overset{τ}{^}^{DR}$ (Ding & Liu 2016).

Provides a straightforward way to integrate Bayesian modeling with Frequentist procedures (doubly-robust estimation)
Enables proper uncertainty quantification
Simulation studies show advantages over the Frequentist p-value (Ding & Liu 2016 §76)

Joint modeling alternative: Draw posterior inference simultaneously for $θ_{Z}$ and $θ_{Y}$ — but the feedback problem (outcome model informs propensity score estimation) violates unconfoundedness assumption and biases causal estimates. The suggested remedy: fit a Bayesian model for $ϵ = Y - \overset{μ}{^} (X)$ first, then plug in (§11 of paper).

Summary Table

Strategy	Approach	Key advantage	Limitation
Covariate in outcome model	$μ (z, x, \overset{e}{^} (x))$	Double robustness; practical	Two-stage; not fully Bayesian
Dependent priors	Joint prior on $(θ_{Z}, θ_{Y})$	Proper uncertainty propagation; finite-sample gains	Case-dependent; no general recipe
Posterior predictive	Plug draws into $\overset{τ}{^}^{DR}$	Integrates Bayesian + Frequentist	Not dogmatically Bayesian

Connections

General Structure of Bayesian CI — why propensity score drops from likelihood under ignorability
Bayesian Outcome Models — outcome models that the propensity score enters
Frequentist Causal Estimation — Hájek IPW and doubly-robust estimators
Bayesian Propensity Scores and IPW — existing vault note on Bayesian IPW (Heiss blog, Liao-Zigler method)

Second Brain

Explorer

Propensity Score in Bayesian Causal Inference

Propensity Score in Bayesian Causal Inference

The Central Tension

Strategy 1: Propensity Score as Covariate in Outcome Model

Strategy 2: Dependent Priors

Strategy 3: Posterior Predictive P-values

Summary Table

Connections

See Also

Graph View

Table of Contents

Backlinks