Randomization Inference - Overview

Summary

Randomization inference treats the treatment assignment mechanism — not a sampling model on the outcomes — as the sole source of randomness for statistical inference. The potential outcomes are regarded as fixed constants attached to a finite population of units; only , the realized assignment, is random. This design-based view, originating with Fisher (1935) and Neyman (1923/1990), justifies the Fisher Randomization Test (FRT) and yields finite-sample exact -values under a sharp null. Wu & Ding (2021) extend the FRT to weak (Neyman) nulls by pairing it with a studentized statistic that is simultaneously finite-sample exact under the sharp null and asymptotically valid under the weak null.

Overview

In the Potential Outcomes Framework for a completely randomized experiment (CRE), each unit has a vector of potential outcomes , one per treatment level . The experimenter fixes group sizes summing to and assigns treatments so that every realization with has equal probability . Unit ‘s observed outcome is .

Randomization inference asks: given that the assignment was randomized, how surprising is the observed test statistic relative to the distribution it would have had over all the assignments that could equally have occurred? Because the potential outcomes are held fixed, the reference distribution is generated entirely by permuting/re-assigning — this is the randomization distribution.

Two foundational null hypotheses anchor the field (see Sharp vs Weak Null Hypotheses):

  • Fisher’s sharp null for every unit — no effect for anyone. Sharpness means the null + observed data recover all missing potential outcomes, so any statistic has a known, exact null distribution.
  • Neyman’s weak null (zero average effect) — leaves room for treatment-effect heterogeneity and does not pin down the missing potential outcomes.

The central tension this paper resolves: the FRT is purpose-built for sharp nulls, yet practitioners want to test weak nulls. Naively running an FRT with a non-studentized statistic (e.g. the difference in means ) can fail to control type I error for the weak null under variance heterogeneity (heteroscedastic potential outcomes). The fix is studentization.

Main Content

Design-based (randomization) inference ^def-design-based

Potential outcomes are fixed; the only randomness is the treatment assignment , drawn from a known distribution determined by the experimental design. Means and covariances are fixed finite-population parameters, not population expectations. Inference targets these finite-population quantities.

Randomization distribution ^def-rand-dist

Conditioning on the original data , fill in all potential outcomes (via the sharp null used for imputation), then recompute the statistic for every permutation/re-assignment . The collection , denoted , is the randomization distribution — the reference null distribution for the -value.

Conservative type I error control ^def-conservative

The FRT with statistic conservatively controls type I error at level if . Wu & Ding’s studentized test is conservative (not exact) for the weak null asymptotically, but exact for the sharp null in finite samples.

Why design-based asymptotics are needed ^thm-fp-asymptotics

The exact sampling distribution of a statistic under the weak null depends on unknown potential-outcome covariances , , which have no unbiased estimator (potential outcomes and are never jointly observed). Wu & Ding therefore adopt finite-population asymptotics (Li & Ding 2017): under regularity (Assumption 1: , convergent moments, negligible max term), with a covariance that is conservatively estimable by .

The remaining notes develop: the FRT mechanics and the sharp-null exactness result; the sharp-vs-weak distinction (Fisher vs Neyman); the [[Studentized Randomization Tests|studentized statistic that achieves dual validity]]; and the broader permutation principle and its relation to the bootstrap.

Examples

A 60-second mental model. Suppose units, two treated () and two control (), with observed outcomes (treated control). The difference in means is . Under the sharp null , every observed value would be the same regardless of assignment, so we simply re-deal which two of the four numbers are “treated.” There are equally likely assignments; computing (or ) for each gives the randomization distribution, and the exact -value is the fraction of these the observed . This is the kernel of every randomization test; subsequent notes replace with the studentized so the same procedure also handles the weak null under heteroscedasticity.

Connections

See Also