Logic of Regression Adjustment
Summary
Regression adjustment uses measured confounders to identify the causal effect of a treatment by blocking backdoor paths in a DAG. It recovers the Population Average Treatment Effect (PATE) for the primary treatment — not for the confounders themselves. The adjustment set is a methodological tool for a single causal path, not a license to interpret all included coefficients.
Overview
Regression adjustment is one of several strategies — alongside randomization, propensity-score weighting, and matching — for estimating causal effects from observational data. Its logic stems from the potential outcomes framework (Rubin 1974, 1976) and is made precise by DAG-based identification theory (Pearl).
The key insight: adjusting for confounders makes treatment assignment “as good as random” conditional on , thus isolating the causal path .
Potential Outcomes Setup
Following Rubin’s potential outcomes framework, define:
- : observed outcome for unit
- : observed treatment status
- : set of measured confounders influencing both treatment assignment and outcome
Potential Outcomes Causal Effect
The unit-level causal effect of treatment is the difference in potential outcomes:
Since we observe each unit under only one treatment status (the fundamental problem of causal inference), unit-level effects are unobservable. We target population-level summaries instead.
Population Average Treatment Effect (PATE)
In a Bayesian framework, the posterior distribution of the PATE is:
This is the expected change in outcome if all units were treated versus if no units were treated, integrated over the distribution of confounders .
What Regression Adjustment Identifies
Given the simple confounded DAG with unobserved confounder creating , the backdoor criterion requires conditioning on to block the path . After adjustment:
The coefficient consistently estimates the causal effect of on if:
- The conditional independence assumption (CIA) holds:
- is sufficient to block all backdoor paths into
- Overlap: for all values of
Under these conditions, recovers the causal effect of . The coefficient , however, does not recover the causal effect of — because the CIA was invoked for , not for .
Single-Path Identification
Let be a valid adjustment set satisfying the backdoor criterion for the causal path . Then regression of on and identifies the causal effect . It does not identify the causal effect of any on unless a separate valid adjustment set for the path is also conditioned on.
The Adjustment Set as a Sacrifice
Nafa (2022) frames the adjustment set not as a collection of “co-causes” to be interpreted, but as a sacrifice: variables we include specifically and only to block confounding paths for the treatment we care about.
The relationship between treatment and outcome is the path we care about and the adjustment set is a sacrifice we make on the altar of causal identification. — A. Jordan Nafa (2022)
This reframing has practical implications:
- Choose the adjustment set based on DAG analysis — not on whether variables “seem important” or “have large coefficients”
- Minimal sufficient adjustment sets are preferable: include only what is needed to block backdoor paths
- Do not include colliders or descendants of colliders (conditioning on them opens new biasing paths)
- Avoid the kitchen-sink approach: adding more variables does not necessarily improve causal identification
Strategies for Causal Identification of Multiple Paths
If a researcher genuinely wants causal estimates for both and , they must:
- Specify a separate DAG analysis for the path
- Find a valid adjustment set satisfying the backdoor criterion for (which may differ from )
- Defend the identifying assumptions for both paths separately
- Fit separate models or use a joint identification strategy
Identifying multiple paths simultaneously requires all of:
- Separate exogenous variation for each path of interest (e.g., multiple instruments)
- Strong domain-theoretic justification for independence of unobserved confounders
- Or experimental / quasi-experimental designs that address each path independently
Connections
- DAGs and Causal Identification — Provides the formal backdoor criterion and rules for valid adjustment sets
- Potential Outcomes Framework — The PATE definition and potential outcomes notation underpinning this framework
- Table 2 Fallacy — The downstream error: misinterpreting confounder coefficients as causally identified
- Bayesian Propensity Score Weighting — An alternative adjustment strategy using propensity scores; same identification conditions apply
- The Selection Problem — Why adjustment is necessary: non-random treatment assignment creates backdoor paths
See Also
- Conditional Independence Assumption — The CIA / unconfoundedness assumption required for regression adjustment
- Omitted Variables Bias — What happens when adjustment set is insufficient (fails to block all backdoor paths)
- Regression and the CEF — The statistical relationship between regression and the Conditional Expectation Function