Identifying Assumptions for Staggered DiD
Summary
Point-identification of in the Callaway–Sant’Anna framework rests on four assumptions beyond random sampling: limited treatment anticipation (Assumption 3, with horizon ), one of two conditional parallel trends assumptions — based on a never-treated group (Assumption 4) or on not-yet-treated groups (Assumption 5) — and an overlap/common-support condition (Assumption 6). All allow covariate-specific trends, making them strictly weaker than randomization-based or unconditional parallel-trends assumptions used elsewhere.
Overview
These conditions extend the canonical parallel-trends assumption to multiple groups and periods, and crucially allow it to hold only after conditioning on covariates — important when groups differ in observables that drive untreated outcome dynamics (e.g. job-training programs where age, education, employment history differ across participants; Heckman et al. 1997). They also accommodate (bounded) anticipation of treatment. The choice between Assumptions 4 and 5 governs which comparison group is valid; the anticipation horizon governs the reference period.
Main Content
Assumption 3 — Limited Treatment Anticipation
There is a known such that
When this is the standard no-anticipation condition (units do not respond before treatment starts). permits anticipation up to periods (e.g. if units react one period early). Under Assumption 3, for all pre-treatment periods . The parallel-trends assumptions become stronger as grows (Remark 1) — a previously-unnoticed trade-off.
Assumption 4 — Conditional Parallel Trends (Never-Treated comparison)
Let be as in Assumption 3. For each and with ,
Conditional on covariates, group- and the never-treated group () would have followed parallel untreated paths. Favored when a sizable never-treated group exists that is similar to the eventually-treated. Under it places no restriction on observed pre-treatment trends.
Assumption 5 — Conditional Parallel Trends (Not-Yet-Treated comparison)
Let be as in Assumption 3. For each and with and ,
Uses groups not-yet-treated by time as comparison. Favored when no/too-small never-treated group exists, as it exploits more comparison units (more informative inference). Drawback: unlike Assumption 4 it does restrict pre-treatment trends, which can fail when early periods experienced different shocks than later ones (Marcus & Sant’Anna 2020). Practitioners uncomfortable using never-treated units (who may behave differently) can drop them and proceed under Assumption 5 (Remark 2).
Assumption 6 — Overlap (common support)
For each , , there exists with
A positive fraction starts treatment in , and the generalized propensity score is uniformly bounded away from one. Rules out “irregular identification” (Khan & Tamer 2010). Extends the overlap conditions of Heckman et al., Abadie (2005), and Sant’Anna & Zhao (2020).
Conditional vs. unconditional. The unconditional versions of Assumptions 4–5 are still weaker than the parallel-trends conditions in de Chaisemartin & D’Haultfœuille (2020) and Sun & Abraham (2020), and weaker than the randomization-of-adoption-date assumption in Athey & Imbens (2018). Allowing conditioning on permits covariate-specific trends; ignoring them when present biases unconditional DiD. Only pre-treatment covariates may be used — post-treatment covariates can be affected by treatment (Wooldridge 2005b).
Do not pre-test to pick the assumption
It is tempting to use statistical pre-tests to choose between parallel-trends versions, but Roth (2020) shows this distorts inference. The authors recommend choosing based on the application’s context, not data-driven tests (Sec. 2.3, fn. 8).
Examples
Minimum wage: covariates that make parallel trends plausible
County characteristics used to justify conditional parallel trends: census region, county population, median income, fraction white, fraction with HS education, poverty rate. Treated counties differ markedly (much less likely Southern; population ~94k vs ~53k; 89% vs 83% white) — so unconditional parallel trends is suspect and conditioning is warranted. Sant’Anna & Song (2019) propensity-score specification tests fail to reject correct specification.
Connections
- Conditions for the identification results in Doubly-Robust Estimands for ATT(g,t).
- Apply to the target defined in Group-Time Average Treatment Effects.
- Visualized as a conditioning structure in Directed Acyclic Graphs.
- Anticipation horizon pins the reference period used by the estimands.
See Also
- The Experimental Ideal — randomization as the strongest (and stronger) benchmark
- Synthetic Control — alternative when parallel trends is implausible
- de Chaisemartin & D’Haultfœuille (2020); Sun & Abraham (2020) — stronger parallel-trends variants