Group-Time Average Treatment Effects

Summary

The group-time average treatment effect $A TT (g, t) = E [Y_{t} (g) - Y_{t} (0) ∣ G_{g} = 1]$ is the disaggregated building block of the Callaway–Sant’Anna framework: the average effect at calendar time $t$ for the cohort first treated in period $g$ . It imposes no restriction on treatment-effect heterogeneity across groups or over time, so it (unlike a single TWFE coefficient) always carries a well-defined causal interpretation and can be flexibly aggregated to answer many policy questions.

Overview

In the canonical two-period setup the target is $A TT = E [Y_{2} (2) - Y_{2} (0) ∣ G_{2} = 1]$ . With multiple periods and staggered adoption there is no single “treated” timing, so the paper indexes effects by both the cohort $g$ (when treatment starts) and the calendar period $t$ (when we measure). Fixing $g$ and varying $t$ traces effect dynamics for that cohort; varying $g$ compares cohorts. Because $A TT (g, t)$ is defined directly on potential outcomes — not as a regression coefficient — it sidesteps the TWFE negative-weighting problem entirely (see Difference-in-Differences with Multiple Time Periods - Overview).

Main Content

Setup and notation

$T$ periods, $t = 1, \dots, T$ . $D_{i, t} \in {0, 1}$ = treatment status of unit $i$ at $t$ .

Group $G$ = the first period a unit is treated; $G = \infty$ if never treated. $G_{i, g} = 1 {G_{i} = g}$ , and $C_{i} = 1 {G_{i} = \infty} = 1 - D_{i, T}$ flags the never-treated. $\overset{g}{ˉ} = max_{i} G_{i}$ is the last-treated group.

$G = supp (G) ∖ {\overset{g}{ˉ}} \subseteq {2, \dots, T}$ is the set of identifiable groups (drops $\overset{g}{ˉ}$ when no never-treated group exists, since it has no valid comparison). $X = supp (X) \subseteq R^{k}$ is the support of pre-treatment covariates.

Generalized propensity score $p_{g} (X) = P (G_{g} = 1 ∣ X, G_{g} + C = 1)$ : probability of first treatment at $g$ , conditional on $X$ and on being either in group $g$ or never-treated. (More generally $p_{g, s} (X)$ conditions on being in $g$ or “not-yet-treated by $s$ ”.)

Potential outcomes (multi-stage adoption)

Combining Robins’ dynamic potential outcomes with Heckman et al.’s multi-stage adoption: $Y_{i, t} (0)$ = untreated potential outcome (never participates through $T$ ); $Y_{i, t} (g)$ = potential outcome if first treated in period $g$ . Observed and potential outcomes link via
$Y_{i, t} = Y_{i, t} (0) + g = 2 \sum T (Y_{i, t} (g) - Y_{i, t} (0)) \cdot G_{i, g} . (2.1)$
We observe exactly one potential-outcome path per unit. Assumption 1 (Irreversibility): $D_{1} = 0$ a.s., and $D_{t - 1} = 1 \Rightarrow D_{t} = 1$ a.s. (staggered adoption — units never “forget” treatment). Assumption 2 (Random sampling): the panel ${Y_{i, 1}, \dots, Y_{i, T}, X_{i}, D_{i, 1}, \dots, D_{i, T}}_{i = 1}^{n}$ is iid (results extend to repeated cross-sections, Appendix B). The unit index $i$ is suppressed henceforth.

The group-time average treatment effect

The main building block is
$A TT (g, t) = E [Y_{t} (g) - Y_{t} (0) ∣ G_{g} = 1],$
the average treatment effect at period $t$ among units first treated in period $g$ . It places no restriction on (a) heterogeneity across groups, (b) the timing $g$ , or (c) how effects evolve over $t$ . The family ${A TT (g, t)}$ can therefore be read directly for heterogeneity or aggregated to answer: (a) overall effect of participating by $T$ ; (b) across-group heterogeneity; (c) effects by length of exposure $e = t - g$ (event study); (d) how effects evolve over calendar time. In the 2x2 case $A TT (g, t)$ collapses to the canonical ATT.

Why this avoids TWFE bias. A single $β$ in a TWFE regression must average all cohort-period effects with estimator-determined (possibly negative) weights. By contrast, each $A TT (g, t)$ is its own estimand; aggregation weights $w (g, t)$ are then chosen transparently by the researcher to match the question (see Aggregating Group-Time Effects), and the simple combinations the paper proposes use non-negative weights — ruling out the “positive-for-all-but-negative-estimate” pathology.

Examples

Minimum wage: reading $A TT (g, t)$ directly

With $T$ running 2001-2007 and groups $g \in {2004, 2006, 2007}$ , the cohort first raising its minimum wage in 2004 (Illinois) has $A TT (2004, 2004) \approx - 3.4%$ , $A TT (2004, 2005) \approx - 7.1%$ , $A TT (2004, 2006) \approx - 12.5%$ , $A TT (2004, 2007) \approx - 13.6%$ — i.e. teen employment falls more the longer the higher wage is in place. Each number is interpretable on its own without any homogeneity assumption.

Connections

Identified under conditions in Identifying Assumptions for Staggered DiD.
Estimated via the formulas in Doubly-Robust Estimands for ATT(g,t).
Combined into summaries in Aggregating Group-Time Effects.
Inference on the whole family in Simultaneous Inference via Multiplier Bootstrap.

Second Brain

Explorer

Group-Time Average Treatment Effects

Group-Time Average Treatment Effects

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks