Standard Errors and Clustering

Summary

Getting the standard errors right is crucial for valid inference. Key issues include heteroskedasticity, clustering, serial correlation in panels, and finite-sample bias of robust standard errors.

Robust Standard Errors

Heteroskedasticity-consistent (Eicker-White) standard errors:

  • Valid under minimal assumptions
  • Should be the default in applied work
  • If robust and conventional SEs differ by more than ~30%, investigate why

The Clustering Problem

When errors are correlated within groups (states, schools, firms), ignoring clustering understates standard errors, often dramatically.

The Moulton Factor

For a group-level regressor with observations per group:

where is the intraclass correlation. With and , standard errors are understated by a factor of ~3.3.

The Moulton Problem in DD

Regression-DD models with state-level treatment and individual-level data are especially vulnerable. Always cluster standard errors at the level of treatment assignment.

Serial Correlation in Panels

In DD models with many time periods, serial correlation in the error inflates standard errors beyond what simple clustering handles. Solutions:

  • Cluster at the state (group) level
  • Aggregate to the state-year level before estimation
  • Use parametric corrections for AR(1) errors

Fewer than 42 Clusters

With few clusters, cluster-robust standard errors are biased downward. Remedies:

  • Wild cluster bootstrap
  • Effective degrees of freedom corrections
  • Aggregation to cluster means

Finite-Sample Bias of Robust SEs

Robust standard errors can be biased in small samples — they tend to be too small when there are leverage points. The bias depends on the leverages and is worse with unbalanced designs.

See Also