Logic of Regression Adjustment

Summary

Regression adjustment uses measured confounders to identify the causal effect of a treatment by blocking backdoor paths in a DAG. It recovers the Population Average Treatment Effect (PATE) for the primary treatment — not for the confounders themselves. The adjustment set is a methodological tool for a single causal path, not a license to interpret all included coefficients.

Overview

Regression adjustment is one of several strategies — alongside randomization, propensity-score weighting, and matching — for estimating causal effects from observational data. Its logic stems from the potential outcomes framework (Rubin 1974, 1976) and is made precise by DAG-based identification theory (Pearl).

The key insight: adjusting for confounders makes treatment assignment “as good as random” conditional on , thus isolating the causal path .

Potential Outcomes Setup

Following Rubin’s potential outcomes framework, define:

  • : observed outcome for unit
  • : observed treatment status
  • : set of measured confounders influencing both treatment assignment and outcome

Potential Outcomes Causal Effect

The unit-level causal effect of treatment is the difference in potential outcomes:

Since we observe each unit under only one treatment status (the fundamental problem of causal inference), unit-level effects are unobservable. We target population-level summaries instead.

Population Average Treatment Effect (PATE)

In a Bayesian framework, the posterior distribution of the PATE is:

This is the expected change in outcome if all units were treated versus if no units were treated, integrated over the distribution of confounders .

What Regression Adjustment Identifies

Given the simple confounded DAG with unobserved confounder creating , the backdoor criterion requires conditioning on to block the path . After adjustment:

The coefficient consistently estimates the causal effect of on if:

  1. The conditional independence assumption (CIA) holds:
  2. is sufficient to block all backdoor paths into
  3. Overlap: for all values of

Under these conditions, recovers the causal effect of . The coefficient , however, does not recover the causal effect of — because the CIA was invoked for , not for .

Single-Path Identification

Let be a valid adjustment set satisfying the backdoor criterion for the causal path . Then regression of on and identifies the causal effect . It does not identify the causal effect of any on unless a separate valid adjustment set for the path is also conditioned on.

The Adjustment Set as a Sacrifice

Nafa (2022) frames the adjustment set not as a collection of “co-causes” to be interpreted, but as a sacrifice: variables we include specifically and only to block confounding paths for the treatment we care about.

The relationship between treatment and outcome is the path we care about and the adjustment set is a sacrifice we make on the altar of causal identification. — A. Jordan Nafa (2022)

This reframing has practical implications:

  • Choose the adjustment set based on DAG analysis — not on whether variables “seem important” or “have large coefficients”
  • Minimal sufficient adjustment sets are preferable: include only what is needed to block backdoor paths
  • Do not include colliders or descendants of colliders (conditioning on them opens new biasing paths)
  • Avoid the kitchen-sink approach: adding more variables does not necessarily improve causal identification

Strategies for Causal Identification of Multiple Paths

If a researcher genuinely wants causal estimates for both and , they must:

  1. Specify a separate DAG analysis for the path
  2. Find a valid adjustment set satisfying the backdoor criterion for (which may differ from )
  3. Defend the identifying assumptions for both paths separately
  4. Fit separate models or use a joint identification strategy

Identifying multiple paths simultaneously requires all of:

  • Separate exogenous variation for each path of interest (e.g., multiple instruments)
  • Strong domain-theoretic justification for independence of unobserved confounders
  • Or experimental / quasi-experimental designs that address each path independently

Connections

See Also