Logic of Regression Adjustment

Summary

Regression adjustment uses measured confounders to identify the causal effect of a treatment by blocking backdoor paths in a DAG. It recovers the Population Average Treatment Effect (PATE) for the primary treatment — not for the confounders themselves. The adjustment set is a methodological tool for a single causal path, not a license to interpret all included coefficients.

Overview

Regression adjustment is one of several strategies — alongside randomization, propensity-score weighting, and matching — for estimating causal effects from observational data. Its logic stems from the potential outcomes framework (Rubin 1974, 1976) and is made precise by DAG-based identification theory (Pearl).

The key insight: adjusting for confounders $Z$ makes treatment assignment “as good as random” conditional on $Z$ , thus isolating the causal path $X \to Y$ .

Potential Outcomes Setup

Following Rubin’s potential outcomes framework, define:

$Y_{i}$ : observed outcome for unit $i$
$X_{i} \in {0, 1}$ : observed treatment status
$Z$ : set of measured confounders influencing both treatment assignment and outcome

Potential Outcomes Causal Effect

The unit-level causal effect of treatment $X$ is the difference in potential outcomes:
$Y_{i} (X_{i} = 1, Z_{i}) - Y_{i} (X_{i} = 0, Z_{i})$
Since we observe each unit under only one treatment status (the fundamental problem of causal inference), unit-level effects are unobservable. We target population-level summaries instead.

Population Average Treatment Effect (PATE)

In a Bayesian framework, the posterior distribution of the PATE is:
$PATE = \int E [Y_{ij} (X_{ij} = 1, Z_{ij})] - E [Y_{ij} (X_{ij} = 0, Z_{ij})] d Z_{ij}$
This is the expected change in outcome if all units were treated versus if no units were treated, integrated over the distribution of confounders $Z$ .

What Regression Adjustment Identifies

Given the simple confounded DAG $X \leftarrow Z \to Y$ with unobserved confounder $U$ creating $Z \leftarrow U \to Y$ , the backdoor criterion requires conditioning on $Z$ to block the path $X \leftarrow Z \to Y$ . After adjustment:

E [Y ∣ X, Z] = α + β_{X} X + β_{Z} Z + \dots

The coefficient $β_{X}$ consistently estimates the causal effect of $X$ on $Y$ if:

The conditional independence assumption (CIA) holds: $Y (x) ⊥ X ∣ Z$
$Z$ is sufficient to block all backdoor paths into $X$
Overlap: $0 < P (X = 1 ∣ Z) < 1$ for all values of $Z$

Under these conditions, $β_{X}$ recovers the causal effect of $X \to Y$ . The coefficient $β_{Z}$ , however, does not recover the causal effect of $Z \to Y$ — because the CIA was invoked for $X$ , not for $Z$ .

Single-Path Identification

Let $Z$ be a valid adjustment set satisfying the backdoor criterion for the causal path $X \to Y$ . Then regression of $Y$ on $X$ and $Z$ identifies the causal effect $X \to Y$ . It does not identify the causal effect of any $z_{k} \in Z$ on $Y$ unless a separate valid adjustment set for the path $z_{k} \to Y$ is also conditioned on.

The Adjustment Set as a Sacrifice

Nafa (2022) frames the adjustment set not as a collection of “co-causes” to be interpreted, but as a sacrifice: variables we include specifically and only to block confounding paths for the treatment we care about.

The relationship between treatment and outcome is the path we care about and the adjustment set is a sacrifice we make on the altar of causal identification. — A. Jordan Nafa (2022)

This reframing has practical implications:

Choose the adjustment set based on DAG analysis — not on whether variables “seem important” or “have large coefficients”
Minimal sufficient adjustment sets are preferable: include only what is needed to block backdoor paths
Do not include colliders or descendants of colliders (conditioning on them opens new biasing paths)
Avoid the kitchen-sink approach: adding more variables does not necessarily improve causal identification

Strategies for Causal Identification of Multiple Paths

If a researcher genuinely wants causal estimates for both $X \to Y$ and $Z \to Y$ , they must:

Specify a separate DAG analysis for the path $Z \to Y$
Find a valid adjustment set $W$ satisfying the backdoor criterion for $Z \to Y$ (which may differ from $Z$ )
Defend the identifying assumptions for both paths separately
Fit separate models or use a joint identification strategy

Identifying multiple paths simultaneously requires all of:

Separate exogenous variation for each path of interest (e.g., multiple instruments)
Strong domain-theoretic justification for independence of unobserved confounders
Or experimental / quasi-experimental designs that address each path independently

Connections

DAGs and Causal Identification — Provides the formal backdoor criterion and rules for valid adjustment sets
Potential Outcomes Framework — The PATE definition and potential outcomes notation underpinning this framework
Table 2 Fallacy — The downstream error: misinterpreting confounder coefficients as causally identified
Bayesian Propensity Score Weighting — An alternative adjustment strategy using propensity scores; same identification conditions apply
The Selection Problem — Why adjustment is necessary: non-random treatment assignment creates backdoor paths

Second Brain

Explorer

Logic of Regression Adjustment

Logic of Regression Adjustment

Overview

Potential Outcomes Setup

What Regression Adjustment Identifies

The Adjustment Set as a Sacrifice

Strategies for Causal Identification of Multiple Paths

Connections

See Also

Graph View

Table of Contents

Backlinks