Bayesian Outcome Models

Summary

The outcome model is the central component in Bayesian causal inference. Common specifications range from linear regression to non-parametric models (BART, Gaussian Process). In high-dimensional settings, standard regularization priors can induce bias through regularization-induced confounding, a phenomenon unique to causal inference.

Overview

In Bayesian causal inference, the outcome model is specified to estimate the CATE . Two common approaches:

  1. Joint model: specify a single for both treatment arms
  2. Separate models: model each arm separately with , known as S-learner (joint) and T-learner (separate) in the literature

Linear Outcome Model

The simplest outcome model: linear regression with a treatment-covariate interaction term:

where the interaction captures treatment effect heterogeneity. Equivalent to fitting a linear regression in each group.

Limitations: linear models are easy to implement but often too restrictive. They do not adapt to the true data-generating mechanism in regions of poor covariate overlap.

Non-Parametric Outcome Models

The recent focus on heterogeneous treatment effects has driven adoption of flexible, non-parametric outcome models. The most widely used are regression trees and their ensembles.

BART (Bayesian Additive Regression Trees)

Definition: BART for Causal Inference

BART places priors on the parameters of an ensemble of regression trees to control the depth and the degree of shrinkage of the mean function in terminal nodes. For causal inference, one specifies as an S-learner (joint model) or fits a separate BART model for each treatment arm (T-learner).

  • Originally proposed by Chipman, George, McCulloch (2010)
  • For causal inference: Hill (2011) first advocated using BART as an S-learner
  • Has been shown to outperform many competing methods, including Frequentist random forests, in numerous empirical applications (Hill 2011, Dorie et al. 2019)
  • Available in R: BayesTree, BART, dbarts packages

Bayesian Causal Forest (BCF)

Definition: Bayesian Causal Forest (BCF)

Hahn et al. (2020) proposed BCF: separate the outcome model as

where models the distribution of and represents the heterogeneous treatment effect, with a separate BART prior for and .

Advantages of BCF over standard BART:

  • Fast computation; good performance of default hyperparameters
  • Available software: bcf R package
  • Importantly: adding the estimated propensity score as an additional input to significantly improves empirical estimation of the CATE

Gaussian Process (GP) Outcome Model

A Gaussian Process prior on provides:

  • Potential bias reduction by widening credible intervals as overlap decreases (adaptively)
  • Flexible covariance function: e.g., Gaussian kernel with signal-to-noise ratio and inverse-bandwidth

However, the GP prior’s uncertainty does not automatically adapt to overlap — see Example 4.1.

Example 4.1 — Priors and Overlap in Estimating the CATE

Example 4.1 — Choice of Priors in Estimating the CATE (Li et al. §4b)

Setup: 250 treated () and 250 control () units. Single covariate . True outcome: , . True CATE for all .

Three outcome model priors:

  • (i) Linear:
  • (ii) Gaussian Process with Gaussian kernel
  • (iii) BART

Findings (illustrated in Figure 1):

  • In the region of good overlap (40–50 in ): all three models agree
  • Linear model: overconfident everywhere; does not widen uncertainty in poor overlap regions
  • GP model: trades potential bias for wider credible intervals as overlap decreases, but uncertainty does not fully adapt
  • BART: shorter error bars than GP (wider than linear), but width remains similar regardless of overlap → overconfident in poor overlap regions

Lesson: A desirable prior should accurately reflect uncertainty according to the degree of covariate overlap — uncertainty should increase as overlap decreases.

Challenges in High Dimensions

Two Settings

  1. Non-parametric outcome model with infinite/large parameters (regardless of )
  2. High-dimensional covariates ( large relative to )

Both are increasingly common in causal inference, especially when targeting the CATE.

Regularization-Induced Confounding

Regularization-Induced Confounding

In high dimensions, Bayesian regularization priors (spike-and-slab, Bayesian LASSO, model averaging) on the nuisance parameters — the regression coefficients for the covariate-outcome relationship — can induce bias in causal estimates.

Mechanism (Hahn et al. 2020; Linero 2021): Under Assumption 3.2 (prior independence), many Bayesian regularization priors on concentrate the selection bias around zero as .

This is prior dogmatism — the prior effectively removes confounding, regardless of what the data say.

Solution: Use double machine learning strategies — regularize the propensity score model and outcome model jointly, ensuring the regularized propensity score enters the outcome model for valid causal inference. See §5 of the paper and Propensity Score in Bayesian CI.

Key References

  • Robins & Ritov (1997): non-parametric estimators have slow convergence rates in high dimensions
  • Hahn et al. (2020) Bayesian Regression Tree Models for Causal Inference — BCF, identification of regularization-induced confounding
  • Linero (2021) — rigorous treatment of Bayesian ignorability in non-parametric models

Model Averaging

High-dimensional settings often use Bayesian model averaging techniques:

  • Spike-and-slab priors (Antonelli et al. 2019)
  • Bayesian LASSO (Park & Casella 2008)
  • Model averaging (Raftery et al. 1997)

These achieve regularization via sparsity-inducing priors but must be used carefully due to regularization-induced confounding.

Connections

See Also