Synthetic Control Inference and Diagnostics

Summary

With aggregate data and small donor pools, asymptotic inference is unavailable for synthetic controls. Abadie, Diamond, and Hainmueller (2010) propose permutation inference based on the RMSPE ratio — the ratio of post-intervention fit to pre-intervention fit. This statistic is preferred over the raw treatment effect because it accounts for heterogeneous pre-treatment fit across donor units. Diagnostic checks (backdating, leave-one-out robustness) assess the credibility of the synthetic control counterfactual.

Overview

Standard inference for treatment effects relies on large-sample approximations. Synthetic control studies typically have one treated unit and a small donor pool ( $J \approx 10$ – $50$ ) — not nearly enough for asymptotic theory. Moreover, the units are aggregate entities (states, countries), not random draws from a well-defined population, making classical randomization-based inference theoretically problematic.

The solution is permutation inference (Fisher 1935): iteratively reassign treatment to each donor unit, estimate a “placebo effect,” and compare the true effect to the distribution of placebo effects.

The RMSPE Ratio Test Statistic

The raw treatment effect $\overset{τ}{^}_{1 t}$ is a poor test statistic for permutation inference because units with poor pre-treatment fit naturally generate large post-intervention gaps — even without any real treatment effect. The RMSPE ratio corrects for this.

Definition: RMSPE (Root Mean Squared Prediction Error)

For unit $j$ and time interval $[t_{1}, t_{2}]$ :
$R_{j} (t_{1}, t_{2}) = (\frac{1}{t _{2} - t _{1} + 1} t = t_{1} \sum t_{2} (Y_{j t} - \hat{Y}_{j t}^{N})^{2})^{1/2}$
where $\hat{Y}_{j t}^{N} = \sum_{k \neq = j} w_{jk} Y_{k t}$ is the synthetic control estimate for unit $j$ .

Definition: RMSPE Ratio (Abadie 2021, Eq. 12)

For unit $j$ , the RMSPE ratio is:
$r_{j} = \frac{R _{j} ( T _{0} + 1 , T )}{R _{j} ( 1 , T _{0} )}$
That is, the ratio of the post-intervention RMSPE to the pre-intervention RMSPE. This measures the fit of the synthetic control in the posttreatment period relative to its fit in the pretreatment period.

Why the ratio matters:

A large post-treatment gap ( $R_{j} (T_{0} + 1, T)$ large) is only meaningful if the pre-treatment fit was good ( $R_{j} (1, T_{0})$ small)
A unit with poor pre-treatment fit might generate a large post-treatment deviation purely by chance — this would falsely inflate significance if we used the raw gap
The ratio scales the treatment effect signal by the noise level specific to each placebo unit

Permutation Inference Procedure

Theorem: Permutation Inference for Synthetic Controls (Abadie, Diamond, and Hainmueller 2010)

Procedure:

For each unit $j = 1, \dots, J + 1$ (treated unit + all donor units), estimate a synthetic control as if $j$ were the treated unit, using all other units as the donor pool

Compute $r_{j}$ for each unit

The p-value for the two-sided test is:

$p = \frac{1}{J + 1} j = 1 \sum J + 1 1_{+} (r_{j} - r_{1})$
where $1_{+} (\cdot)$ returns 1 for nonnegative arguments and 0 otherwise; $r_{1}$ is the RMSPE ratio for the actual treated unit

Interpretation: $p$ is the fraction of units (including the treated unit) that have an RMSPE ratio at least as large as the treated unit’s. Under the null of no treatment effect, the treated unit’s $r_{1}$ should be typical among the $J + 1$ units.

Practical filtering: Donor units with very high pre-treatment RMSPE $R_{j} (1, T_{0})$ — that is, units whose synthetic controls fit poorly even before the intervention — are typically excluded from the permutation distribution. Including them would dilute the significance of the treated unit’s ratio. A common threshold is $R_{j} (1, T_{0}) < c \cdot R_{1} (1, T_{0})$ for some $c$ (e.g., $c = 5$ ).

One-sided inference: Replacing $Y_{j t} - \hat{Y}_{j t}^{N}$ with its positive or negative parts before computing RMSPE yields one-sided tests with potential power gains. This is useful in comparative case study settings where the direction of the effect is known a priori.

Design-Based Interpretation

This inference is design-based, not sampling-based. The randomness comes from the assignment mechanism — which unit was treated — not from sampling. Abadie, Diamond, and Hainmueller (2010) use a uniform benchmark (each assignment equally probable), but one could incorporate domain knowledge about assignment probabilities.

Robustness Checks and Diagnostics

1. Backdating

Definition: Backdating (Abadie 2021, Section 7)

Backdating artificially moves the intervention date backward in time. If the synthetic control is credible, applying it with the intervention backdated should produce near-zero estimated effects for the backdated periods (before the actual intervention).

Use: A synthetic control that shows large “effects” in the backdated period is likely misspecified — either the pre-treatment fit is spurious, or the parallel-trends analog is violated. Conversely, near-zero effects in the backdated period lend credibility to the synthetic control.

In the German reunification example (Abadie 2021, Figure 3), backdating the reunification to 1980 shows that the synthetic control closely tracks West Germany’s GDP from 1960–1990 (the backdated post-treatment period), providing evidence that the estimated post-1990 gap is not a statistical artifact.

2. Leave-One-Out Robustness

Definition: Leave-One-Out Robustness (Abadie 2021, Section 7)

Reestimate the synthetic control repeatedly, each time excluding one of the units that contributes positively to the synthetic control (i.e., one of the units with $w_{j} > 0$ ).

If the main result is robust to leave-one-out, the estimated treatment effects across these runs should be negative and closely clustered around the full-sample estimate (Figure 4 in Abadie 2021).

If the main conclusion reverses when a single unit is excluded, this warrants investigation: the excluded unit may have experienced an unrelated shock, or the treated unit’s characteristics may be too close to a single donor unit.

3. In-Time Placebo Tests

A related diagnostic: estimate a synthetic control for the actual treated unit but with the intervention date moved to a period when no intervention occurred. If the synthetic control’s “effect” in the placebo post-period is comparable to the actual estimated effect, the credibility of the original estimate is weakened.

This is distinct from the in-space placebo tests (permutation inference above), which iterate over donor units. In-time placebos iterate over intervention dates for the same treated unit.

Confidence Intervals

Point estimates and p-values are the primary inferential outputs. Confidence intervals can be constructed via test inversion (Firpo and Possebom 2018): invert the permutation test over a range of treatment effect magnitudes $τ_{0}$ to find all values not rejected at level $α$ .

Cattaneo, Feng, and Titiunik (2021) propose predictive intervals for $\overset{τ}{^}_{i t}$ that account for estimation uncertainty in $P_{t}^{N}$ (the untreated potential outcome model) and irreducible uncertainty from the unobserved $u_{t}$ .

Connections

Synthetic Control Bias Theory: The bias bound theory explains why the RMSPE ratio is the right test statistic — the bound depends on pre-treatment fit, so the ratio normalizes by that fit
Synthetic Control: The California Prop 99 example uses the permutation p-value ≈ 0.029 (1 of 35 placebo states more extreme)
Differences-in-Differences: DiD uses standard errors from panel regression; SC uses permutation inference because the small- $J$ setting makes asymptotics unavailable
Synthetic Control Extensions: The multi-unit extensions generalize permutation inference to $I > 1$ treated units

Second Brain

Explorer

Synthetic Control Inference and Diagnostics

Synthetic Control Inference and Diagnostics

Overview

The RMSPE Ratio Test Statistic

Permutation Inference Procedure

Robustness Checks and Diagnostics

1. Backdating

2. Leave-One-Out Robustness

3. In-Time Placebo Tests

Confidence Intervals

Connections

See Also

Graph View

Table of Contents

Backlinks