Frequentist Causal Estimation

Summary

Three main Frequentist estimator classes exist under the potential outcomes framework: outcome modeling (regression), inverse probability weighting (IPW), and doubly-robust (DR) estimators. Each exploits the ignorability assumption differently. DR estimators are consistent if either the propensity score model or the outcome model is correctly specified — but not necessarily both.

Overview

Under the ignorability assumption, the CATE and PATE are identified from observed data. Frequentist causal estimation operationalizes this identification through three estimator families, reviewed here as context for the Bayesian approach in General Structure of Bayesian CI.

The key identification result: under ignorability,

μ_{z} (x) = E [Y_{i} (z) ∣ X_{i} = x] = E [Y_{i} ∣ Z_{i} = z, X_{i} = x]

so causal means equal observable conditional means. See ^eq-identification.

Outcome Modeling (Regression Estimator)

The simplest approach: specify an outcome model $μ_{z} (x) = E [Y ∣ Z = z, X = x]$ , estimate it from data, then impute missing potential outcomes.

The outcome-model PATE estimator:

\overset{τ}{^}^{reg} = N^{- 1} i = 1 \sum N [\overset{μ}{^}_{1} (X_{i}) - \overset{μ}{^}_{0} (X_{i})]

Consistent for $τ^{P}$ if the outcome model is correctly specified.
In poor overlap regions, estimates rely on extrapolation — sensitive to model misspecification.
A misspecified linear outcome model still gives a consistent estimate in randomized experiments, but not in observational studies.

Inverse Probability Weighting (IPW)

Uses the propensity score $e (x) = Pr (Z_{i} = 1 ∣ X_{i} = x)$ to reweight units.

Definition: IPW Estimator

$\overset{τ}{^}^{IPW} = N^{- 1} i = 1 \sum N [\frac{Z _{i} Y _{i}}{e ( X _{i} )} - \frac{( 1 - Z _{i} ) Y _{i}}{1 - e ( X _{i} )}]$

Consistent if the propensity score model is correctly specified.
The propensity score is a balancing score: conditioning on $e (X_{i})$ balances the multivariate distribution of $X$ between treatment groups.
When $e (X_{i})$ is unknown (observational data), it must be estimated, e.g. via logistic regression.

Hájek (normalized) IPW:

\overset{τ}{^}^{H \overset{a}{ˊ} jek} = \frac{\sum _{i = 1}^{N} Z _{i} Y _{i} / e ( X _{i} )}{\sum _{i = 1}^{N} Z _{i} / e ( X _{i} )} - \frac{\sum _{i = 1}^{N} ( 1 - Z _{i} ) Y _{i} / ( 1 - e ( X _{i} ))}{\sum _{i = 1}^{N} ( 1 - Z _{i} ) / ( 1 - e ( X _{i} ))}

The Hájek estimator normalizes weights to sum to 1, reducing variance.

Doubly-Robust (DR) Estimator

Combines outcome modeling and IPW for robustness.

Definition: Doubly-Robust (DR) Estimator

$\overset{τ}{^}^{DR} = \overset{τ}{^}^{reg} + N^{- 1} i = 1 \sum N [\frac{Z _{i} R _{i}}{e ( X _{i} )} - \frac{( 1 - Z _{i} ) R _{i}}{1 - e ( X _{i} )}]$
where $R_{i} = Y_{i} - \overset{μ}{^}_{Z_{i}} (X_{i})$ is the residual from the outcome model.

Theorem: Double Robustness

$\overset{τ}{^}^{DR}$ is consistent for $τ^{P}$ if either:

the propensity score model $e (x)$ is correctly specified, or

the outcome model $μ_{z} (x)$ is correctly specified (but not necessarily both).

The DR estimator is “doubly robust” because the bias of $\overset{τ}{^}^{reg}$ is a product of the residuals of the propensity score model and outcome model — if either residual is zero (correct specification), the bias vanishes.

Matching and Weighting Methods

Matching methods find pairs of treated and control units with similar covariates (e.g., based on propensity score or Mahalanobis distance) and estimate $τ^{P}$ by the difference in average outcomes between matched groups.

Weighting methods assign weight $w_{i}$ to each unit so the weighted covariate distribution is balanced, then compute a weighted difference in outcomes. IPW is the canonical weighting method; the Hájek estimator is its normalized version.

These can be viewed as non-parametric versions of $\overset{τ}{^}^{IPW}$ , $\overset{τ}{^}^{reg}$ , and $\overset{τ}{^}^{DR}$ based on nearest-neighbor regressions.

Connections to Bayesian Causal Inference

The Bayesian approach (see General Structure of Bayesian CI) treats causal inference as a missing data problem: impute the missing potential outcomes from the posterior predictive distribution, then compute any estimand. This automatically yields uncertainty quantification for any causal functional.

The propensity score — central to Frequentist approaches — has a nuanced role in Bayesian inference:

Under ignorability, the propensity score drops out of the likelihood for causal estimands (§3 of the paper)
Yet it is essential for ensuring overlap and balance in the design stage
See Propensity Score in Bayesian CI for the three strategies to incorporate it

Second Brain

Explorer

Frequentist Causal Estimation

Frequentist Causal Estimation

Overview

Outcome Modeling (Regression Estimator)

Inverse Probability Weighting (IPW)

Doubly-Robust (DR) Estimator

Matching and Weighting Methods

Connections to Bayesian Causal Inference

See Also

Graph View

Table of Contents

Backlinks