Bayesian Propensity Score Weighting
Summary
Propensity scores and inverse probability treatment weights (IPTW) cannot be naively inserted into a Bayesian model because they are not part of the data-generating process. Liao & Zigler (2020) resolve this by marginalizing over the posterior distribution of propensity scores: run the outcome model K times with K different weight sets drawn from the Bayesian treatment model’s posterior, then combine results with Rubin’s rules.
Overview
Inverse probability weighting (IPW) adjusts for confounding by creating pseudo-populations where treated and untreated groups have similar covariate distributions. When combined with Bayesian inference, however, a fundamental incompatibility arises: weights are not a parameter in any Bayesian likelihood.
This note documents the conceptual problem and the Liao & Zigler (2020) solution, with full R/brms implementation.
Standard Frequentist IPW Workflow
The frequentist workflow for IPW consists of four steps:
Inverse Probability of Treatment Weight (IPTW)
For a binary treatment with propensity score :
Applying these weights to an outcome model creates a pseudo-population where treated and untreated groups are comparable across confounders .
Steps:
- Treatment model (design stage): Fit logistic regression
net ~ confoundersto get . - Propensity scores: = predicted probability of treatment.
- Weights: Compute IPTW from .
- Outcome model (analysis stage): Fit weighted regression
outcome ~ treatmentwith IPTW weights; coefficient on treatment = ATE.
# Step 1: Frequentist treatment model
model_treatment_freq <- glm(net ~ income + temperature + health,
data = nets, family = binomial(link = "logit"))
# Steps 2-3: Propensity scores and IPTW
nets_with_weights <- augment(model_treatment_freq, nets,
type.predict = "response") %>%
rename(propensity = .fitted) %>%
mutate(iptw = (net_num / propensity) + ((1 - net_num) / (1 - propensity)))
# Step 4: Outcome model
model_outcome_freq <- lm(malaria_risk ~ net, data = nets_with_weights, weights = iptw)
# Coefficient on net ≈ -10 (recovers the true -10 malaria risk point decrease)What IPTW is Doing: Pseudo-populations
The weights rescale the sample to make treated and untreated groups comparable:
- Treated individuals with low (surprisingly treated) receive high weight.
- Untreated individuals with high (surprisingly untreated) receive high weight.
After weighting, the two groups mirror each other’s propensity score distributions, allowing causal interpretation of the outcome model coefficient.
The Fundamental Problem with Bayesian Propensity Scores
Average Treatment Effect (ATE)
The estimand depends on treatment , covariates , and outcome .
A correct Bayesian model would use:
The weights are nowhere in this expression. IPTWs appear conceptually in the likelihood but they are not part of the data-generating process — there is no “weight parameter” to set a prior on. As Robins, Hernán, and Wasserman (2015) state: “Bayesian inference must ignore the propensity score.”
The Liao-Zigler Solution
Liao & Zigler (2020) treat the propensity score as a latent parameter and marginalize over its posterior distribution:
Liao-Zigler Marginalization
The propensity score is estimated Bayesianly in the treatment model, then integrated out to produce a -free ATE.
Practical algorithm:
- Fit a Bayesian treatment model to get a posterior distribution over propensity scores.
- Draw samples of propensity scores from the posterior.
- For each of the samples, compute weights and run the frequentist outcome model.
- Combine the ATEs using Rubin’s rules (averaging ATEs; combining SEs to capture between-model variance).
This is analogous to multiple imputation: run the same model on slightly different data (different weights) and combine results.
R/brms Implementation
Step 1: Bayesian Treatment Model
library(brms)
library(tidyverse)
model_treatment <- brm(
bf(net ~ income + temperature + health,
decomp = "QR"), # QR decomposition for numerical stability
family = bernoulli(), # logistic regression
data = nets,
chains = 4, cores = 4, iter = 1000,
seed = 1234, backend = "cmdstanr"
)Step 2: Extract K Posterior Propensity Score Samples
# posterior_epred gives P(net=1 | confounders, theta^(k)) for each posterior draw k
pred_probs_chains <- posterior_epred(model_treatment)
# dim: (2000 draws) × (1752 people)Step 3: Nest Propensity Scores, Compute Weights, Run K Outcome Models
# Nest each draw's propensity scores into its own row
pred_probs_nested <- pred_probs_chains %>%
as_tibble(.name_repair = "unique") %>%
mutate(draw = 1:n()) %>%
pivot_longer(-draw, names_to = "row", values_to = "prob") %>%
mutate(row = as.numeric(str_remove(row, "\\.\\.\\."))) %>%
group_by(draw) %>%
nest() %>%
ungroup()
# For each draw: compute IPTW and run outcome model
outcome_models <- pred_probs_nested %>%
mutate(outcome_model = map(data, ~{
df <- bind_cols(nets, .) %>%
mutate(iptw = (net_num / prob) + ((1 - net_num) / (1 - prob)))
lm(malaria_risk ~ net, data = df, weights = iptw)
})) %>%
mutate(
tidied = map(outcome_model, tidy),
ate = map_dbl(tidied, ~filter(., term == "netTRUE") %>% pull(estimate)),
ate_se = map_dbl(tidied, ~filter(., term == "netTRUE") %>% pull(std.error))
)Step 4: Combine with Rubin’s Rules
# Average ATE (≈ -10, matching the true effect)
mean(outcome_models$ate)
# [1] -10.1
# Combine standard errors with Rubin's rules
rubin_se <- function(ates, sigmas) {
sqrt(mean(sigmas^2) + var(ates))
}
rubin_se(outcome_models$ate, outcome_models$ate_se)
# [1] 1.02 (larger than naive average of SEs: 0.659)Key point: Rubin’s rules correctly inflate the SE to account for uncertainty in the treatment model’s propensity scores — uncertainty that a purely frequentist approach ignores.
Mosquito Nets and Malaria Risk
Setup: 1,752 individuals; binary treatment (mosquito net use); outcome (malaria risk 0–100). Confounders identified by DAG: income, health, temperature. True effect (simulated): −10 malaria risk points. Results:
- Frequentist IPTW: ATE = −10.1 ± 0.66 SE
- Bayesian Liao-Zigler: ATE = −10.1 ± 1.02 SE (wider, correctly incorporating treatment model uncertainty)
Limitations and Open Questions
- The outcome model is still frequentist (
lm()). The distribution of 2,000 ATEs resembles a posterior but is not formally one — the uncertainty all comes from the treatment model, not the outcome model. - To run a fully Bayesian outcome model (
brm()) with these weights, one would need 2,000 separatebrm()calls — computationally prohibitive. - A technically correct solution (single
brm()call with per-iteration weights) was subsequently developed by Jordan Nafa.
Connections
- DAGs and Causal Identification — The DAG determines which confounders to include in the treatment model
- Nonparametric Causal Inference — Alternative approach: BART directly models the response surface without weights
- Bayesian Linear Regression — The outcome model component; Bayesian extension is the ultimate goal
- Differences-in-Differences — Another identification strategy; also benefits from Bayesian implementation
See Also
- Synthetic Control — Alternative to IPTW for single-unit treatment; uses weights differently
- Counterfactual Inference — Bayesian approach to predicting counterfactual outcomes
- Missing Data Models — Multiple imputation (which Rubin’s rules were originally designed for)