Doubly-Robust Estimands for ATT(g,t)

Summary

Theorem 1 shows the group-time ATT is nonparametrically point-identified three observationally-equivalent ways — outcome regression (OR), inverse probability weighting (IPW), and doubly-robust (DR) — extending Heckman et al. (1997/98), Abadie (2005), and Sant’Anna & Zhao (2020) to multiple groups and periods. The reference period is $g - δ - 1$ (the last period before anticipation could matter); the comparison group is the never-treated under Assumption 4 or the not-yet-treated under Assumption 5. The DR form is preferred in practice because it stays consistent if either the propensity-score model or the outcome-regression model is correct, not necessarily both.

Overview

CS’s identification results are constructive: each estimand is a population expectation that becomes a plug-in estimator by replacing nuisance functions (propensity score $p_{g}$ , outcome regression $m_{g, t, δ}$ ) with parametric fits and the expectation with a sample average. The three approaches differ in which part of the data-generating process they model: OR models the comparison group’s outcome evolution; IPW models the probability of group membership; DR models both but needs only one correct (Robins-style double robustness).

Main Content

Define the population outcome regressions for the comparison groups:

m_{g, t, δ}^{n e v} (X) = E [Y_{t} - Y_{g - δ - 1} ∣ X, C = 1], m_{g, t, δ}^{n y} (X) = E [Y_{t} - Y_{g - δ - 1} ∣ X, D_{t + δ} = 0, G_{g} = 0] .

The three estimands (never-treated comparison)

Let the change be the “long difference” $Y_{t} - Y_{g - δ - 1}$ . Then:
$A T T_{or}^{n e v} (g, t; δ) = E [\frac{G _{g}}{E [ G _{g} ]} (Y_{t} - Y_{g - δ - 1} - m_{g, t, δ}^{n e v} (X))] (2.3)$ $A T T_{i pw}^{n e v} (g, t; δ) = E \frac{G _{g}}{E [ G _{g} ]} - \frac{\frac{p _{g} ( X ) C}{1 - p _{g} ( X )}}{E [ \frac{p _{g} ( X ) C}{1 - p _{g} ( X )} ]} (Y_{t} - Y_{g - δ - 1}) (2.2)$ $A T T_{d r}^{n e v} (g, t; δ) = E \frac{G _{g}}{E [ G _{g} ]} - \frac{\frac{p _{g} ( X ) C}{1 - p _{g} ( X )}}{E [ \frac{p _{g} ( X ) C}{1 - p _{g} ( X )} ]} (Y_{t} - Y_{g - δ - 1} - m_{g, t, δ}^{n e v} (X)) (2.4)$
The not-yet-treated analogues (2.5)–(2.7) replace $\frac{p _{g} ( X ) C}{1 - p _{g} ( X )}$ with $\frac{p _{g, t + δ} ( X ) ( 1 - D _{t + δ} ) ( 1 - G _{g} )}{1 - p _{g, t + δ} ( X )}$ and use $m_{g, t, δ}^{n y}$ .

Theorem 1 — Nonparametric identification of $A TT (g, t)$

Let Assumptions 1, 2, 3, 6 hold. (i) If Assumption 4 (never-treated) holds, then for all $g \in G_{δ}$ , $t \in {2, \dots, T - δ}$ with $t \geq g - δ$ ,
$A TT (g, t) = A T T_{i pw}^{n e v} (g, t; δ) = A T T_{or}^{n e v} (g, t; δ) = A T T_{d r}^{n e v} (g, t; δ) .$
(ii) If Assumption 5 (not-yet-treated) holds, then for all $g \in G_{δ}$ , $t \in {2, \dots, T - δ}$ with $g - δ \leq t < \overset{g}{ˉ} - δ$ ,
$A TT (g, t) = A T T_{i pw}^{n y} (g, t; δ) = A T T_{or}^{n y} (g, t; δ) = A T T_{d r}^{n y} (g, t; δ) .$
Here $G_{δ} = G \cap {2 + δ, 3 + δ, \dots, T}$ (drops early-treated groups when anticipation is allowed). The three estimands are identical as identification targets, but generally differ as estimators once nuisance functions are fitted (Remark 5).

Key structural insights from Theorem 1 (proved in Appendix A via Lemmas A.1–A.2):

Reference period $g - δ - 1$ . This is the most recent period before anticipation could matter; the more anticipation allowed (larger $δ$ ), the further back the reference goes.
Comparison-group choice = parallel-trends choice. Assumption 4 → never-treated as a fixed comparison; Assumption 5 → not-yet-treated-by- $t + δ$ . Under Assumption 5 with everyone eventually treated ( $\overset{g}{ˉ} < \infty$ ), one can only identify $A TT (g, t)$ for $t < \overset{g}{ˉ} - δ$ (the last cohort’s effect is unidentified).
The DR proof. The DR estimand equals $A T T_{i pw}$ minus a term that vanishes by the law of iterated expectations whenever the weights are correctly normalized; symmetrically it equals $A T T_{or}$ minus a vanishing term — hence consistent if either model is right.

Unconditional collapse (no role for covariates)

When Assumptions 3–5 hold unconditionally on $X$ , (2.2)–(2.4) collapse to the intuitive 2x2-style contrast
$A T T_{u n c}^{n e v} (g, t; δ) = E [Y_{t} - Y_{g - δ - 1} ∣ G_{g} = 1] - E [Y_{t} - Y_{g - δ - 1} ∣ C = 1], (2.8)$
and (2.5)–(2.7) collapse to $A T T_{u n c}^{n y} (g, t; δ) = E [Y_{t} - Y_{g - δ - 1} ∣ G_{g} = 1] - E [Y_{t} - Y_{g - δ - 1} ∣ D_{t + δ} = 0]$ . (2.9)

A TWFE regression is NOT $A TT (g, t)$ with covariates

Remarks 3–4: subsetting to periods ${g - 1, t}$ and groups ${G_{g} = 1 or C = 1}$ and running $Y = α_{1} + α_{2} G_{g} + α_{3} 1 {T = t} + β (G_{g} \times 1 {T = t}) + ϵ$ gives $β = A TT (g, t)$ in the unconditional case. But adding covariates linearly ( $+ \tilde{γ} X$ ) does not recover $A TT (g, t)$ unless one assumes homogeneous-in- $X$ effects AND rules out covariate-specific trends (Słoczyński 2018). The estimands above need neither restriction.

Doubly-robust plug-in estimators (Sec. 4)

The feasible DR estimators are Hájek-type (weights sum to one, improving finite-sample behavior):
$A TT_{d r}^{n e v} (g, t; δ) = E_{n} [(\overset{w}{^}_{g}^{t re a t} - \overset{w}{^}_{g}^{co m p, n e v}) (Y_{t} - Y_{g - δ - 1} - \overset{m}{^}_{g, t, δ}^{n e v} (X; \hat{β}_{g, t, δ}^{n e v}))] (4.1)$
with $\overset{w}{^}_{g}^{t re a t} = \frac{G _{g}}{E _{n} [ G _{g} ]}$ and $\overset{w}{^}_{g}^{co m p, n e v} = \frac{p ^ _{g} ( X ; π ^ _{g} ) C / ( 1 - p ^ _{g} ( X ; π ^ _{g} ))}{E _{n} [ p ^ _{g} ( X ; π ^ _{g} ) C / ( 1 - p ^ _{g} ( X ; π ^ _{g} ))]}$ , where $\overset{p}{^}_{g}$ is a fitted (e.g. logit) propensity score and $\overset{m}{^}_{g, t, δ}^{n e v}$ a fitted (e.g. OLS) outcome regression. Estimation is two-step: (1) fit nuisances per $(g, t)$ ; (2) plug into the sample analogue. The not-yet-treated version (4.2) is symmetric.

Examples

Minimum wage: which estimand for which assumption

Under unconditional parallel trends (Panel a) covariates play no role, so the simple contrast (2.8) is used. Under conditional parallel trends (Panel b) the DR estimator (4.1) is used: a logit generalized propensity score per characteristic (with quadratic terms for population and median income) and an OLS outcome regression with the same covariates. The whole exercise (all $g, t$ plus 1000 bootstrap iterations) runs in ~3.0 seconds on a laptop.

Connections

Identifies the target of Group-Time Average Treatment Effects under Identifying Assumptions for Staggered DiD.
DR estimators feed the asymptotics in Simultaneous Inference via Multiplier Bootstrap.
Aggregated into summaries in Aggregating Group-Time Effects.
Generalizes the regression-adjustment logic in Mostly Harmless Econometrics.

Second Brain

Explorer

Doubly-Robust Estimands for ATT(g,t)

Doubly-Robust Estimands for ATT(g,t)

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks