Bayesian Outcome Models

Summary

The outcome model $μ_{z} (x) = E [Y ∣ Z = z, X = x]$ is the central component in Bayesian causal inference. Common specifications range from linear regression to non-parametric models (BART, Gaussian Process). In high-dimensional settings, standard regularization priors can induce bias through regularization-induced confounding, a phenomenon unique to causal inference.

Overview

In Bayesian causal inference, the outcome model $μ_{z} (x) = E [Y_{i} ∣ X_{i} = x, Z_{i} = z]$ is specified to estimate the CATE $τ (x) = μ_{1} (x) - μ_{0} (x)$ . Two common approaches:

Joint model: specify a single $μ (z, x) = μ (z, x)$ for both treatment arms
Separate models: model each arm separately with $μ_{z} (x) = μ (z, x)$ , known as S-learner (joint) and T-learner (separate) in the literature

Linear Outcome Model

The simplest outcome model: linear regression with a treatment-covariate interaction term:

μ (z, x) = x + z + x z

where the $x z$ interaction captures treatment effect heterogeneity. Equivalent to fitting a linear regression in each group.

Limitations: linear models are easy to implement but often too restrictive. They do not adapt to the true data-generating mechanism in regions of poor covariate overlap.

Non-Parametric Outcome Models

The recent focus on heterogeneous treatment effects has driven adoption of flexible, non-parametric outcome models. The most widely used are regression trees and their ensembles.

BART (Bayesian Additive Regression Trees)

Definition: BART for Causal Inference

BART places priors on the parameters of an ensemble of regression trees to control the depth and the degree of shrinkage of the mean function in terminal nodes. For causal inference, one specifies $μ (z, x)$ as an S-learner (joint model) or fits a separate BART model for each treatment arm (T-learner).

Originally proposed by Chipman, George, McCulloch (2010)
For causal inference: Hill (2011) first advocated using BART as an S-learner $μ (z, x)$
Has been shown to outperform many competing methods, including Frequentist random forests, in numerous empirical applications (Hill 2011, Dorie et al. 2019)
Available in R: BayesTree, BART, dbarts packages

Bayesian Causal Forest (BCF)

Definition: Bayesian Causal Forest (BCF)

Hahn et al. (2020) proposed BCF: separate the outcome model as
$μ (z, x) = g_{1} (x) + g_{2} (x) \cdot z$
where $g_{1} (x)$ models the distribution of $Y (0)$ and $g_{2} (x)$ represents the heterogeneous treatment effect, with a separate BART prior for $g_{1} (\cdot)$ and $g_{2} (\cdot)$ .

Advantages of BCF over standard BART:

Fast computation; good performance of default hyperparameters
Available software: bcf R package
Importantly: adding the estimated propensity score as an additional input to $g_{1} (\cdot)$ significantly improves empirical estimation of the CATE

Gaussian Process (GP) Outcome Model

A Gaussian Process prior on $μ_{z} (\cdot)$ provides:

Potential bias reduction by widening credible intervals as overlap decreases (adaptively)
Flexible covariance function: e.g., Gaussian kernel with signal-to-noise ratio $ρ$ and inverse-bandwidth $λ$ $Σ_{ij} = δ^{2} ρ^{2} exp {- λ^{2} ∥ x_{i} - x_{j} ∥^{2}}$

However, the GP prior’s uncertainty does not automatically adapt to overlap — see Example 4.1.

Example 4.1 — Priors and Overlap in Estimating the CATE

Example 4.1 — Choice of Priors in Estimating the CATE (Li et al. §4b)

Setup: 250 treated ( $Z_{i} = 1$ ) and 250 control ( $Z_{i} = 0$ ) units. Single covariate $X \sim Gamma (mean 60 [treated], mean 35 [control], sd 8)$ . True outcome: $Y_{i} (z) = 10 + 5 z - 0.3 X_{i} + ϵ_{i}$ , $ϵ_{i} \sim N (0, 1)$ . True CATE $= 5$ for all $x$ .

Three outcome model priors:

(i) Linear: $μ (z, x) = α_{z} + β_{z} x$

(ii) Gaussian Process with Gaussian kernel

(iii) BART

Findings (illustrated in Figure 1):

In the region of good overlap (40–50 in $X$ ): all three models agree

Linear model: overconfident everywhere; does not widen uncertainty in poor overlap regions

GP model: trades potential bias for wider credible intervals as overlap decreases, but uncertainty does not fully adapt

BART: shorter error bars than GP (wider than linear), but width remains similar regardless of overlap → overconfident in poor overlap regions

Lesson: A desirable prior should accurately reflect uncertainty according to the degree of covariate overlap — uncertainty should increase as overlap decreases.

Challenges in High Dimensions

Two Settings

Non-parametric outcome model with infinite/large parameters (regardless of $p$ )
High-dimensional covariates ( $p$ large relative to $N$ )

Both are increasingly common in causal inference, especially when targeting the CATE.

Regularization-Induced Confounding

Regularization-Induced Confounding

In high dimensions, Bayesian regularization priors (spike-and-slab, Bayesian LASSO, model averaging) on the nuisance parameters — the regression coefficients for the covariate-outcome relationship — can induce bias in causal estimates.

Mechanism (Hahn et al. 2020; Linero 2021): Under Assumption 3.2 (prior independence), many Bayesian regularization priors on $θ_{Y}$ concentrate the selection bias $δ_{z} = E [Y_{i} ∣ Z_{i} = z, X_{i}] - E [Y_{i} (z)]$ around zero as $p \to \infty$ .

This is prior dogmatism — the prior effectively removes confounding, regardless of what the data say.

Solution: Use double machine learning strategies — regularize the propensity score model and outcome model jointly, ensuring the regularized propensity score enters the outcome model for valid causal inference. See §5 of the paper and Propensity Score in Bayesian CI.

Key References

Robins & Ritov (1997): non-parametric estimators have slow convergence rates in high dimensions
Hahn et al. (2020) Bayesian Regression Tree Models for Causal Inference — BCF, identification of regularization-induced confounding
Linero (2021) — rigorous treatment of Bayesian ignorability in non-parametric models

Model Averaging

High-dimensional settings often use Bayesian model averaging techniques:

Spike-and-slab priors (Antonelli et al. 2019)
Bayesian LASSO (Park & Casella 2008)
Model averaging (Raftery et al. 1997)

These achieve regularization via sparsity-inducing priors but must be used carefully due to regularization-induced confounding.

Connections

General Structure of Bayesian CI — the outcome model is the core of Bayesian causal inference
Propensity Score in Bayesian CI — strategies to incorporate propensity score into outcome model
Nonparametric Causal Inference — existing vault note on non-parametric Bayesian causal methods

Second Brain

Explorer

Bayesian Outcome Models

Bayesian Outcome Models

Overview

Linear Outcome Model

Non-Parametric Outcome Models

BART (Bayesian Additive Regression Trees)

Bayesian Causal Forest (BCF)

Gaussian Process (GP) Outcome Model

Example 4.1 — Priors and Overlap in Estimating the CATE

Challenges in High Dimensions

Two Settings

Regularization-Induced Confounding

Key References

Model Averaging

Connections

See Also

Graph View

Table of Contents

Backlinks