Generalized Synthetic Control Method

Summary

The Generalized Synthetic Control (GSC) method (Xu 2017) imputes treated counterfactuals using an interactive fixed effects (IFE) model estimated on the control group. A 3-step procedure — (1) estimate IFE on controls, (2) project treated pretreatment outcomes onto factor space, (3) impute counterfactuals — handles multiple treated units, variable treatment timing, and heterogeneous treatment effects in a single run. Built-in cross-validation selects the number of factors; parametric bootstrap provides standard errors and confidence intervals. DID is the special case with constant factor loadings; canonical SC is the special case with one treated unit.

Overview

Two standard approaches to causal inference with time-series cross-sectional (TSCS) data both have critical limitations:

Difference-in-differences requires parallel pre-trends — that treated and control units would have evolved identically absent treatment. This is often implausible when unobserved time-varying confounders affect units differentially.
Synthetic control matches on pre-treatment trajectories but applies only to a single treated aggregate unit and provides no formal uncertainty estimates.

The GSC method addresses both by explicitly modeling the unobserved time-varying confounders as an interactive fixed effects structure. Because the IFE model is estimated once on all control units jointly, it (1) pools information across controls efficiently, (2) handles multiple treated units in a single run, (3) recovers unit-specific factor loadings for each treated unit, and (4) supports parametric bootstrap inference.

Framework

Setting

Let $T$ and $C$ denote the sets of treated and control units. $N = N_{t r} + N_{co}$ total units observed for $T$ periods. Unit $i$ is first treated at time $T_{0 i} + 1$ and observed for $q_{i} = T - T_{0 i}$ post-treatment periods. For notational simplicity, assume all treated units start treatment at the same time, $T_{0 i} = T_{0}$ .

The IFE Model

Assumption 1: Functional Form (Xu 2017, p. 60)

The outcome of unit $i$ at time $t$ is generated by:
$Y_{i t} = δ_{i t} D_{i t} + x_{i t}^{'} β + λ_{i}^{'} f_{t} + ε_{i t}$
where:

$δ_{i t}$ = heterogeneous treatment effect for unit $i$ at time $t$ (individual treatment indicator $D_{i t} = 1$ if $i \in T$ and $t > T_{0}$ )

$x_{i t}$ = $(k \times 1)$ vector of observed covariates with common coefficients $β$

$f_{t}$ = $(r \times 1)$ vector of unobserved common factors (time-varying)

$λ_{i}$ = $(r \times 1)$ vector of unknown factor loadings (unit-specific intercepts)

$ε_{i t}$ = zero-mean idiosyncratic shocks

Identification constraints: $F^{'} F / T = I_{r}$ (factors normalized), $Λ_{co}^{'} Λ_{co}$ diagonal.

Special cases of this model:

DID: Set $f_{t} = ξ_{t}$ (scalar time trend) and $λ_{i} = 1$ for all units → $λ_{i}^{'} f_{t} = ξ_{t}$ (additive time fixed effect). Parallel trends is the restriction that factor loadings are identical across units.
Canonical SC: Restrict to $N_{t r} = 1$ and use the SC weighting scheme instead of the IFE estimator.
Two-way fixed effects: $λ_{i}^{'} f_{t} = α_{i} + ξ_{t}$ (additive unit + time fixed effects, the “twoway FE” restriction).

Decomposition of Confounders

The term $λ_{i}^{'} f_{t}$ captures unobserved time-varying confounders that affect different units with different magnitudes ( $λ_{i}$ ). For example, if a law is passed in a state because public opinion becomes more liberal, and shifting ideology affects liberal vs. conservative states differently, then $f_{t}$ captures the national ideology trend and $λ_{i}$ captures each state’s sensitivity to it. Two-way FE models would incorrectly assume $λ_{i} = λ$ for all $i$ .

ATT Estimand

Definition: Average Treatment Effect on the Treated (ATT)

The target estimand at time $t > T_{0}$ is:
$A T T_{t, t > T_{0}} = \frac{1}{N _{t r}} i \in T \sum [Y_{i t} (1) - Y_{i t} (0)]$
Because $Y_{i t} (1)$ is observed for treated units in post-treatment periods, the problem reduces to estimating $Y_{i t} (0)$ — the counterfactual untreated potential outcome.

As in Abadie, Diamond, and Hainmueller (2010), the treatment effects $δ_{i t}$ are treated as fixed given the sample (not as random draws). The estimand is the ATT in the sample drawn, not the population ATT.

Identification Assumptions

Assumption 2: Strict Exogeneity (Xu 2017, p. 61)

$ε_{i t} ⊥ ⊥ D_{j s}, x_{j s}, α_{j}, ξ_{s}, f_{s} \forall i, j, t, s$
The idiosyncratic error term of any unit at any time period is independent of all treatment assignments, observed covariates, unobserved factors and factor loadings.

This is weaker than the strict exogeneity required by two-way FE models. Two-way FE requires $(ε_{i t} + λ_{i} f_{t}) ⊥ D_{j s}$ , which demands that the composite error (including the factor component) be uncorrelated with treatment — implausible when factor loadings are correlated with treatment assignment. GSC only requires the idiosyncratic part $ε_{i t}$ to be exogenous.

Assumptions 3–5 (Xu 2017, pp. 61–62):

Assumption 3 (Weak serial dependence): Error terms allow mild autocorrelation; strong dependence (unit roots) ruled out
Assumption 4 (Regularity conditions): Moment conditions for consistency of $\hat{β}$ and the factor space $span (F)$
Assumption 5 (Cross-sectional independence and homoscedasticity): Required for valid parametric bootstrap; heteroscedasticity across time allowed

The Three-Step GSC Estimator

Definition: GSC Estimation Procedure (Xu 2017, Section 3)

Step 1: Estimate the IFE model on control group data only:
$(\hat{β}, \hat{F}, \hat{Λ}_{co}) = β, F, Λ_{co} argmin i \in C \sum (Y_{i} - X_{i} β - F \hat{λ}_{i})^{'} (Y_{i} - X_{i} β - F \hat{λ}_{i})$
subject to $\hat{F}^{'} \hat{F} / T = I_{r}$ and $\hat{Λ}_{co}^{'} \hat{Λ}_{co}$ diagonal.

Step 2: Estimate factor loadings for each treated unit by projecting its pretreatment outcomes onto the estimated factor space:
$\hat{λ}_{i} = λ_{i} argmin (Y_{i}^{0} - X_{i}^{0} \hat{β} - \hat{F}^{0} λ_{i})^{'} (Y_{i}^{0} - X_{i}^{0} \hat{β} - \hat{F}^{0} λ_{i}) = (\hat{F}^{0'} \hat{F}^{0})^{- 1} \hat{F}^{0'} (Y_{i}^{0} - X_{i}^{0} \hat{β})$
where superscript $^{0}$ denotes pretreatment periods $t = 1, \dots, T_{0}$ .

Step 3: Impute the treated counterfactual for $i \in T$ , $t > T_{0}$ :
$\hat{Y}_{i t} (0) = x_{i t}^{'} \hat{β} + \hat{λ}_{i}^{'} \hat{f}_{t}$
Treatment effect estimate:
$\hat{δ}_{i t} = Y_{i t} - \hat{Y}_{i t} (0), i \in T, t > T_{0}$
ATT estimate:
$A TT_{t} = \frac{1}{N _{t r}} i \in T \sum \hat{δ}_{i t}$

Remark 2: Consistency of the GSC Estimator (Xu 2017, p. 64)

Under Assumptions 1–4, the bias of the GSC estimator shrinks to zero as the sample grows, i.e.:
$E (A TT_{t} ∣ D, X, Λ, F) \to A T T_{t} as N_{co}, T_{0} \to \infty$
( $N_{t r}$ held fixed). Intuitively, both large $N_{co}$ and large $T_{0}$ are necessary: $N_{co}$ for consistent estimation of $\hat{β}$ and the factor space, $T_{0}$ for consistent estimation of each treated unit’s factor loadings $\hat{λ}_{i}$ .

Key implication: When $T_{0}$ is small, the “incidental parameters” problem leads to imprecise $\hat{λ}_{i}$ and hence biased treatment effect estimates. This is a crucial difference from two-way FE models.

Model Selection: Cross-Validation for Number of Factors

The number of factors $r$ is unknown and must be selected. Xu (2017) proposes a leave-one-out cross-validation procedure that uses the DID data structure — pretreatment treated observations naturally serve as a validation set.

Algorithm 1: Cross-Validation for Number of Factors (Xu 2017, pp. 63–64)

For a given candidate $r$ :

Step 1: Estimate the IFE model on control group data, obtaining $\hat{β}^{r}$ and $\hat{F}^{r}$ .

Step 2: For each pretreatment period $s \in {1, \dots, T_{0}}$ (leave-one-out loop):

(a) Hold back period $s$ ; estimate factor loadings for each treated unit $i$ using all other pretreatment periods $t \neq = s$ :

$\hat{λ}_{i, - s}^{r} = (\hat{F}_{- s}^{r 0'} \hat{F}_{- s}^{r 0})^{- 1} \hat{F}_{- s}^{r 0'} (Y_{i}^{0} - X_{i}^{0} \hat{β}^{r})_{t \neq = s}$

(b) Predict the treated outcome at period $s$ :

$\hat{Y}_{i s} (0) = x_{i s}^{'} \hat{β}^{r} + \hat{λ}_{i, - s}^{r'} \hat{f}_{s}^{r}$

(c) Save prediction error: $e_{i s} = Y_{i s} - \hat{Y}_{i s} (0)$ for $i \in T$

Step 3: Compute MSPE for this $r$ :
$MSPE (r) = s = 1 \sum T_{0} i \in T \sum e_{i s}^{2} / T_{0}$
Steps 4–5: Repeat for different $r$ ‘s; select $r^{*}$ that minimizes MSPE.

The procedure is computationally inexpensive: for each $r$ , the IFE model is estimated only once (Step 1); all other steps are OLS projections.

Inference: Parametric Bootstrap

Unlike the permutation inference of canonical SC (see Synthetic Control Inference and Diagnostics), GSC produces frequentist uncertainty estimates via parametric bootstrap — suitable when $N_{t r}$ is moderate and asymptotic theory applies.

Algorithm 2: Parametric Bootstrap for ATT Variance (Xu 2017, pp. 64–65)

Motivation: We estimate $Var (A TT_{t} ∣ D, X, Λ, F)$ — the variance conditional on the observed factors and loadings. Only $ε_{i t}$ is random.

Step 1 (Collect prediction errors): For $B_{1}$ rounds, take one control unit as “fake treated,” apply GSC to obtain prediction errors $ε_{j}^{P} = Y_{j} - \hat{Y}_{j} (0)$ . Collect $e^{P} = {ε_{(1)}^{P}, \dots, ε_{(B_{1})}^{P}}$ .

Step 2 (Apply GSC to original data): Obtain $A TT_{t}$ , estimated coefficients $\hat{β}$ , $\hat{F}$ , $\hat{Λ}_{co}$ , fitted values $\hat{Y}_{co} (0)$ , and in-sample residuals $\hat{e}$ .

Step 3 (Bootstrap loop, $B_{2}$ times): Construct bootstrapped sample $S^{(k)}$ by:
$\tilde{Y}_{i}^{(k)} (0) = X_{i} \hat{β} + \hat{F} \hat{λ}_{i} + \tilde{ε}_{i}, i \in C$ $\tilde{Y}_{j}^{(k)} (0) = X_{j} \hat{β} + \hat{F} \hat{λ}_{j} + ε_{j}^{P, (k)}, j \in T$
Apply GSC to $S^{(k)}$ ; obtain bootstrapped $A TT_{t}^{(k)}$ .

Step 4: Compute variance:
$Var (A TT_{t} ∣ D, X, Λ, F) = \frac{1}{B _{2}} k = 1 \sum B_{2} (A TT_{t}^{(k)} - \frac{1}{B _{2}} j = 1 \sum B_{2} A TT_{t}^{(j)})^{2}$
Use percentile method for confidence intervals (Efron and Tibshirani 1993).

Key design decision: Treated and control units draw residuals from different empirical distributions. Control residuals $\tilde{ε}_{i}$ come from the in-sample IFE model residuals (the model fits controls well). Treated residuals $ε_{j}^{P}$ come from out-of-sample prediction errors (the IFE model predicts treated units in pretreatment periods, so prediction errors are typically larger).

Monte Carlo Performance

Simulation DGP (Xu 2017, Eq. 3):

Y_{i t} = δ_{i t} D_{i t} + x_{i t, 1} \cdot 1 + x_{i t, 2} \cdot 3 + λ_{i}^{'} f_{t} + α_{i} + ξ_{t} + 5 + ε_{i t}

with $r = 2$ factors; treated factor loadings shifted from control range by parameter $w \in [0, 1]$ (at $w = 0$ : common support; at $w \to 1$ : no overlap).

Table 1: Finite Sample Properties (Xu 2017, Table 1)

Setup: 5 treated units, $T_{0} \in {15, 30, 50}$ , $N_{co} \in {40, 80, 120, 200}$ , 5,000 simulations, $B = 1, 000$ bootstraps, estimand = $A T T_{T_{0} + 5}$ (true value = 5).

Key results:

GSC has limited bias even when $T_{0} = 15$ and $N_{co} = 40$ ; bias → 0 as both grow

Coverage of 95% CI ≈ 0.947 across all configurations — valid bootstrap inference

Monte Carlo (Online Appendix): GSC has less bias than DID/twoway-FE estimators in the presence of time-varying confounders; less bias than the IFE estimator when treatment effects are heterogeneous across units; more efficient than the original synthetic matching method

Empirical Application: Election Day Registration and Voter Turnout

Example: EDR Laws and Voter Turnout (Xu 2017, Section 5)

Setup: 47 US states, 1920–2012 presidential elections. Nine states adopted Election Day Registration (EDR) laws (treated); 38 states never adopted them (controls). Outcome: voter turnout as % of voting-age population.

DID result (Table 2, cols 1–2): Fixed effects estimate of EDR = 0.87–0.78 percentage points (SE ≈ 3%). But Figure 2(a) shows that DID’s parallel trends assumption fails: average predicted turnout diverges substantially from actual turnout in pretreatment periods.

GSC result (Table 2, cols 3–4): Cross-validation selects $r = 2$ factors. Estimated ATT ≈ 5.13–4.90 percentage points (SE ≈ 2.27%). Figure 2(b) shows near-perfect pretreatment fit.

Factor interpretation (Figure 3):

Factor 1: captures the sharp post-1965 turnout increase in southern states (Voting Rights Act removing Jim Crow laws)

Factor 2: captures a general downward trend in turnout; southern states have the largest loadings on Factor 1 (Figure 3b)

Treated unit loadings mostly lie within the convex hull of control loadings → reliable interpolation

Heterogeneous effects by adoption wave (Table 3):

1st wave (ME, MN, WI): ATT ≈ 7.27 pp (SE 3.33) — large, significant

2nd wave (ID, NH, WY): ATT ≈ 2.17 pp (SE 2.82) — modest, insignificant

3rd wave (MT, IA, CT): ATT ≈ −1.14 pp (SE 3.00) — null/negative

Interpretation: Early adopters enacted EDR to genuinely increase turnout; later adopters introduced it to opt out of the NVRA, and by then registration costs had already fallen nationally.

Method	Treated units	Parallel trends	Uncertainty	Factor selection
Two-way FE (DID)	Multiple	Required	OLS SE	N/A (implicit $r = 0$ )
IFE estimator (Bai 2009)	Multiple	Not required	Asymptotic	Manual or IC
Canonical SC (ADH 2010)	1	Not required	Permutation	N/A (weights, not factors)
GSC (Xu 2017)	Multiple	Not required	Bootstrap	Cross-validation
Penalized SC (AL 2019)	Multiple	Not required	Permutation	Penalty ( $λ$ )
Matrix completion	Multiple	Not required	Bootstrap	Low-rank constraint

Diagnostics Recommended by Xu

Plot raw data, fitted values, and imputed counterfactuals for each treated unit
Plot factor loadings of both treated and control units (as in Figure 3b) — check that treated units lie within the convex hull of control units; extrapolation is a source of bias
Compare with DID and IFE estimates as robustness check
Use gsynth R package (Xu 2016, Harvard Dataverse) for implementation

Implementation Note

The R package gsynth implements GSC with:

gsynth(Y ~ D + X, data, index = c("unit","time"), force = "two-way", CV = TRUE, r = c(0, 5), se = TRUE, inference = "parametric", nboots = 1000)

Connections

Synthetic Control Bias Theory: The linear factor model (Abadie et al.) and IFE model (Bai 2009 / Xu) are the same structure; GSC is the estimation method that directly fits this model rather than using weighted matching
Differences-in-Differences: DID is the special case $λ_{i} = λ$ (constant loadings) of Assumption 1; GSC relaxes this restriction
Synthetic Control: SC is the special case $N_{t r} = 1$ ; GSC extends to $N_{t r} > 1$ with parametric inference
Synthetic Control Extensions: GSC is the primary method recommended for multiple treated units; contrasts with the penalized SC of Abadie and L’Hour (2019)
Bayesian Difference in Differences: Bayesian analog for aggregate time-series; Xu cites Pang (2014) Bayesian multilevel factor models as a complementary approach

Second Brain

Explorer

Generalized Synthetic Control Method

Generalized Synthetic Control Method

Overview

Framework

Setting

The IFE Model

ATT Estimand

Identification Assumptions

The Three-Step GSC Estimator

Model Selection: Cross-Validation for Number of Factors

Inference: Parametric Bootstrap

Monte Carlo Performance

Empirical Application: Election Day Registration and Voter Turnout

Diagnostics Recommended by Xu

Implementation Note

Connections

See Also

Graph View

Table of Contents

Backlinks

Second Brain

Explorer

Generalized Synthetic Control Method

Generalized Synthetic Control Method

Overview

Framework

Setting

The IFE Model

ATT Estimand

Identification Assumptions

The Three-Step GSC Estimator

Model Selection: Cross-Validation for Number of Factors

Inference: Parametric Bootstrap

Monte Carlo Performance

Empirical Application: Election Day Registration and Voter Turnout

GSC vs. Related Methods

Diagnostics Recommended by Xu

Implementation Note

Connections

See Also

Graph View

Table of Contents

Backlinks