Identifying Assumptions for Staggered DiD

Summary

Point-identification of $A TT (g, t)$ in the Callaway–Sant’Anna framework rests on four assumptions beyond random sampling: limited treatment anticipation (Assumption 3, with horizon $δ \geq 0$ ), one of two conditional parallel trends assumptions — based on a never-treated group (Assumption 4) or on not-yet-treated groups (Assumption 5) — and an overlap/common-support condition (Assumption 6). All allow covariate-specific trends, making them strictly weaker than randomization-based or unconditional parallel-trends assumptions used elsewhere.

Overview

These conditions extend the canonical parallel-trends assumption to multiple groups and periods, and crucially allow it to hold only after conditioning on covariates $X$ — important when groups differ in observables that drive untreated outcome dynamics (e.g. job-training programs where age, education, employment history differ across participants; Heckman et al. 1997). They also accommodate (bounded) anticipation of treatment. The choice between Assumptions 4 and 5 governs which comparison group is valid; the anticipation horizon $δ$ governs the reference period.

Main Content

Assumption 3 — Limited Treatment Anticipation

There is a known $δ \geq 0$ such that
$E [Y_{t} (g) ∣ X, G_{g} = 1] = E [Y_{t} (0) ∣ X, G_{g} = 1] a.s. for all g \in G, t < g - δ .$
When $δ = 0$ this is the standard no-anticipation condition (units do not respond before treatment starts). $δ > 0$ permits anticipation up to $δ$ periods (e.g. $δ = 1$ if units react one period early). Under Assumption 3, $A TT (g, t) = 0$ for all pre-treatment periods $t < g - δ$ . The parallel-trends assumptions become stronger as $δ$ grows (Remark 1) — a previously-unnoticed trade-off.

Assumption 4 — Conditional Parallel Trends (Never-Treated comparison)

Let $δ$ be as in Assumption 3. For each $g \in G$ and $t \in {2, \dots, T}$ with $t \geq g - δ$ ,
$E [Y_{t} (0) - Y_{t - 1} (0) ∣ X, G_{g} = 1] = E [Y_{t} (0) - Y_{t - 1} (0) ∣ X, C = 1] a.s.$
Conditional on covariates, group- $g$ and the never-treated group ( $C = 1$ ) would have followed parallel untreated paths. Favored when a sizable never-treated group exists that is similar to the eventually-treated. Under $δ = 0$ it places no restriction on observed pre-treatment trends.

Assumption 5 — Conditional Parallel Trends (Not-Yet-Treated comparison)

Let $δ$ be as in Assumption 3. For each $g \in G$ and $(s, t) \in {2, \dots, T}^{2}$ with $t \geq g - δ$ and $t + δ \leq s < \overset{g}{ˉ}$ ,
$E [Y_{t} (0) - Y_{t - 1} (0) ∣ X, G_{g} = 1] = E [Y_{t} (0) - Y_{t - 1} (0) ∣ X, D_{s} = 0, G_{g} = 0] a.s.$
Uses groups not-yet-treated by time $t + δ$ as comparison. Favored when no/too-small never-treated group exists, as it exploits more comparison units (more informative inference). Drawback: unlike Assumption 4 it does restrict pre-treatment trends, which can fail when early periods experienced different shocks than later ones (Marcus & Sant’Anna 2020). Practitioners uncomfortable using never-treated units (who may behave differently) can drop them and proceed under Assumption 5 (Remark 2).

Assumption 6 — Overlap (common support)

For each $t \in {2, \dots, T}$ , $g \in G$ , there exists $ε > 0$ with
$P (G_{g} = 1) > ε and p_{g, t} (X) < 1 - ε a.s.$
A positive fraction starts treatment in $g$ , and the generalized propensity score is uniformly bounded away from one. Rules out “irregular identification” (Khan & Tamer 2010). Extends the overlap conditions of Heckman et al., Abadie (2005), and Sant’Anna & Zhao (2020).

Conditional vs. unconditional. The unconditional versions of Assumptions 4–5 are still weaker than the parallel-trends conditions in de Chaisemartin & D’Haultfœuille (2020) and Sun & Abraham (2020), and weaker than the randomization-of-adoption-date assumption in Athey & Imbens (2018). Allowing conditioning on $X$ permits covariate-specific trends; ignoring them when present biases unconditional DiD. Only pre-treatment covariates may be used — post-treatment covariates can be affected by treatment (Wooldridge 2005b).

Do not pre-test to pick the assumption

It is tempting to use statistical pre-tests to choose between parallel-trends versions, but Roth (2020) shows this distorts inference. The authors recommend choosing based on the application’s context, not data-driven tests (Sec. 2.3, fn. 8).

Examples

Minimum wage: covariates that make parallel trends plausible

County characteristics used to justify conditional parallel trends: census region, county population, median income, fraction white, fraction with HS education, poverty rate. Treated counties differ markedly (much less likely Southern; population ~94k vs ~53k; 89% vs 83% white) — so unconditional parallel trends is suspect and conditioning is warranted. Sant’Anna & Song (2019) propensity-score specification tests fail to reject correct specification.

Connections

Conditions for the identification results in Doubly-Robust Estimands for ATT(g,t).
Apply to the target defined in Group-Time Average Treatment Effects.
Visualized as a conditioning structure in Directed Acyclic Graphs.
Anticipation horizon $δ$ pins the reference period $g - δ - 1$ used by the estimands.

Second Brain

Explorer

Identifying Assumptions for Staggered DiD

Identifying Assumptions for Staggered DiD

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks