Simultaneous Inference via Multiplier Bootstrap
Summary
The DR estimators of are -asymptotically linear and jointly normal (Theorem 2), with a doubly-robust influence function. Rather than plug-in standard errors, the paper uses a fast multiplier bootstrap (Theorem 3, Algorithm 1) that perturbs the influence function by random weights — no propensity re-estimation per draw, always has observations from every group, and yields simultaneous (uniform) confidence bands covering the entire path of ‘s with probability (Corollary 1), avoiding multiple-testing distortions. The minimum-wage application shows this matters: the heterogeneity-robust approach finds a clear negative employment effect where TWFE finds none.
Overview
Inference is in the large-, fixed- paradigm. Because researchers typically plot many ‘s (or , , etc.), pointwise bands would understate joint uncertainty and ignore multiple testing. The multiplier bootstrap produces bands that hold uniformly and account for the dependence across estimates — better suited to visualizing overall estimation uncertainty than pointwise intervals.
Main Content
Theorem 2 — Asymptotic linearity & joint normality of DR estimators
Under Assumptions 1–4, 6–8, for each , with , and provided the DR consistency claim (4.5) holds (either the propensity working model OR the never-treated outcome-regression working model is correctly specified):
Stacking over , with . The influence function has three pieces — a treated-weight term, a comparison-weight term, and an estimation-effect term correcting for first-step nuisance estimation — making the limiting variance account for estimating and . Assumptions 7–8 require the nuisances to be smooth parametric models with -asymptotically-linear estimators (logit/probit/(N)LS all qualify) plus weak integrability.
Theorem 3 — Validity of the multiplier bootstrap
Under the assumptions of Theorem 2, define a bootstrap draw by perturbing the empirical influence function with iid mean-zero, unit-variance, finite-third-moment weights (e.g. Mammen (1993) two-point: , , ):
Then conditional on the sample, and for any continuous functional , converges likewise. Advantages: (1) trivial/fast — just reweight, no per-draw propensity re-estimation; (2) every group always represented (the empirical bootstrap can drop a group); (3) simultaneous bands are easy; (4) extends to clustering by drawing cluster-level ‘s (Remark 10).
Algorithm 1 — Studentized simultaneous confidence band
- Draw ; 2. compute via (4.6); form . 3. Repeat times. 4. Estimate (interquartile range of the draws, normalized by the normal IQR — robust scale). 5. Form per draw; let = empirical -quantile of these. 6. Band: .
Corollary 1 — Uniform coverage
Under the assumptions of Theorem 2, for any ,
The band covers all simultaneously — no multiple-testing inflation. (Remark 11: setting gives a valid but wider constant-width band.)
Inference for summary parameters & pre-testing
Corollary 2 (Sec. 4.2): plug-in estimators of any aggregation are -asymptotically linear and normal, so the same bootstrap delivers (multiple-testing-robust) bands across event-times, groups, or calendar-times. Remark 12: pre-treatment “placebo” for (which equal 0 under the assumptions) can be estimated by swapping the long difference for the short difference ; plotting them lets one assess the parallel-trends assumption.
Examples
Minimum wage on teen employment — full findings (Sec. 5)
Data/design. 2,284 counties, 29 states, 2001–2007 (federal MW flat at $5.15). Groups = year state first raised MW; never-raisers = comparison. Outcome: county teen employment (QWI). Covariates: region, population, % white, % HS grads, poverty rate, median income (2000 County Data Book). DR estimation = logit generalized propensity score (quadratics in population, median income) + OLS outcome regression; 1000 multiplier-bootstrap iterations clustered at the county level; runs in ~3 s. Result — group-time effects (Fig. 1). Under unconditional parallel trends, 5 of 7 are significantly negative (range -2.3% to -13.6%); simple group-size-weighted average -5.2%; overall ≈ -3.9%. Under conditional parallel trends (DR, Panel b), 3 of 7 significant, range -0.9% (insignificant) to -7.1%; overall ≈ -3.1%. Result — the TWFE contrast. A TWFE post-treatment dummy with unit + region-year FE gives only -0.008 (insignificant) under conditional design (and -0.037 unconditional) — i.e. TWFE says “no/weak effect.” Interpretation. The heterogeneity-robust CS estimates find a clear, dynamically-growing negative effect of the minimum wage on teen employment that TWFE conceals. Caveats: some pre-treatment placebo differ from zero (mild evidence against parallel trends), and the size of MW increases varies across states. Key takeaway: in a textbook-complicated application, the choice of estimation method changes the qualitative conclusion. ^min-wage-full
Connections
- Provides inference for the estimators in Doubly-Robust Estimands for ATT(g,t) and the summaries in Aggregating Group-Time Effects.
- Validates the empirical conclusions previewed in Difference-in-Differences with Multiple Time Periods - Overview.
- Pre-treatment placebo plots test Identifying Assumptions for Staggered DiD.
See Also
- Chernozhukov et al. (2018); Kline & Santos (2012) — related multiplier-bootstrap band procedures
- Mammen (1993) — the two-point bootstrap weight
- Synthetic Control — alternative inference (permutation) for staggered policy effects
- Dube, Lester & Reich (2010); Meer & West (2016) — the minimum-wage literature contrasted