Doubly-Robust Estimands for ATT(g,t)
Summary
Theorem 1 shows the group-time ATT is nonparametrically point-identified three observationally-equivalent ways — outcome regression (OR), inverse probability weighting (IPW), and doubly-robust (DR) — extending Heckman et al. (1997/98), Abadie (2005), and Sant’Anna & Zhao (2020) to multiple groups and periods. The reference period is (the last period before anticipation could matter); the comparison group is the never-treated under Assumption 4 or the not-yet-treated under Assumption 5. The DR form is preferred in practice because it stays consistent if either the propensity-score model or the outcome-regression model is correct, not necessarily both.
Overview
CS’s identification results are constructive: each estimand is a population expectation that becomes a plug-in estimator by replacing nuisance functions (propensity score , outcome regression ) with parametric fits and the expectation with a sample average. The three approaches differ in which part of the data-generating process they model: OR models the comparison group’s outcome evolution; IPW models the probability of group membership; DR models both but needs only one correct (Robins-style double robustness).
Main Content
Define the population outcome regressions for the comparison groups:
The three estimands (never-treated comparison)
Let the change be the “long difference” . Then:
The not-yet-treated analogues (2.5)–(2.7) replace with and use .
Theorem 1 — Nonparametric identification of
Let Assumptions 1, 2, 3, 6 hold. (i) If Assumption 4 (never-treated) holds, then for all , with ,
(ii) If Assumption 5 (not-yet-treated) holds, then for all , with ,
Here (drops early-treated groups when anticipation is allowed). The three estimands are identical as identification targets, but generally differ as estimators once nuisance functions are fitted (Remark 5).
Key structural insights from Theorem 1 (proved in Appendix A via Lemmas A.1–A.2):
- Reference period . This is the most recent period before anticipation could matter; the more anticipation allowed (larger ), the further back the reference goes.
- Comparison-group choice = parallel-trends choice. Assumption 4 → never-treated as a fixed comparison; Assumption 5 → not-yet-treated-by-. Under Assumption 5 with everyone eventually treated (), one can only identify for (the last cohort’s effect is unidentified).
- The DR proof. The DR estimand equals minus a term that vanishes by the law of iterated expectations whenever the weights are correctly normalized; symmetrically it equals minus a vanishing term — hence consistent if either model is right.
Unconditional collapse (no role for covariates)
When Assumptions 3–5 hold unconditionally on , (2.2)–(2.4) collapse to the intuitive 2x2-style contrast
and (2.5)–(2.7) collapse to . (2.9)
A TWFE regression is NOT with covariates
Remarks 3–4: subsetting to periods and groups and running gives in the unconditional case. But adding covariates linearly () does not recover unless one assumes homogeneous-in- effects AND rules out covariate-specific trends (Słoczyński 2018). The estimands above need neither restriction.
Doubly-robust plug-in estimators (Sec. 4)
The feasible DR estimators are Hájek-type (weights sum to one, improving finite-sample behavior):
with and , where is a fitted (e.g. logit) propensity score and a fitted (e.g. OLS) outcome regression. Estimation is two-step: (1) fit nuisances per ; (2) plug into the sample analogue. The not-yet-treated version (4.2) is symmetric.
Examples
Minimum wage: which estimand for which assumption
Under unconditional parallel trends (Panel a) covariates play no role, so the simple contrast (2.8) is used. Under conditional parallel trends (Panel b) the DR estimator (4.1) is used: a logit generalized propensity score per characteristic (with quadratic terms for population and median income) and an OLS outcome regression with the same covariates. The whole exercise (all plus 1000 bootstrap iterations) runs in ~3.0 seconds on a laptop.
Connections
- Identifies the target of Group-Time Average Treatment Effects under Identifying Assumptions for Staggered DiD.
- DR estimators feed the asymptotics in Simultaneous Inference via Multiplier Bootstrap.
- Aggregated into summaries in Aggregating Group-Time Effects.
- Generalizes the regression-adjustment logic in Mostly Harmless Econometrics.
See Also
- Sant’Anna & Zhao (2020) — the 2-period DR DiD estimator extended here
- Abadie (2005) — semiparametric IPW DiD
- How to use Bayesian propensity scores and inverse probability weights — Bayesian IPW counterpart