Sharp vs Weak Null Hypotheses
Summary
The sharp (Fisher) null asserts the treatment changes no individual’s potential outcomes — for all — and pins down the entire Science Table. The weak (Neyman) null asserts only that certain averages of potential outcomes are equal — e.g. , zero average treatment effect — and leaves treatment-effect heterogeneity unrestricted, so it does not determine the missing potential outcomes. The sharp null implies the weak null, never the reverse. This gap is exactly why a Fisher Randomization Test built for the sharp null can be invalid for the weak null, and why studentization is needed.
Overview
Both nulls live in the Potential Outcomes Framework with fixed potential outcomes , finite-population means , and covariances . They differ in how much they constrain:
- A sharp / strong / Fisher null confines all individual potential outcomes. With the data it recovers the whole Science Table, so every statistic has a known exact null distribution (see Fisher Randomization Test and the Sharp Null).
- A weak / average / Neyman null confines only averages of the potential outcomes. It is, by Rubin’s (2005) definition, “any hypothesis that is not sharp.” It does not recover the missing potential outcomes, because many heterogeneous Science Tables share the same averages.
The general weak hypothesis in this paper is written as a linear contrast of mean potential outcomes:
where , is a full-row-rank contrast matrix with , and (usually ). The treatment-control ATE null is the case ; one-way ANOVA is another.
Main Content
Sharp (Fisher) null ^def-sharp
for all . Constrains every individual potential outcome; together with the data it determines the full Science Table. No restriction is left for heterogeneity — there is no heterogeneity under it.
Weak (Neyman) null ^def-weak
. Constrains only averages/contrasts of means. The ATE special case is . It permits arbitrary unit-level treatment-effect heterogeneity and does not determine the missing potential outcomes.
Logical relationship: sharp ⟹ weak (strictly) ^thm-sharp-implies-weak
The sharp null implies the weak null (if every individual effect is zero, the average effect is zero), but not conversely: average effects can vanish while individual effects are large and offsetting. Hence the set of data-generating processes consistent with the weak null strictly contains those consistent with the sharp null. A test calibrated to the sharp null need not control type I error over the larger weak-null set.
Why the gap breaks the naive FRT — variance heterogeneity ^thm-frt-invalid
To FRT a weak null one imputes the Science Table via a compatible artificial sharp null that imposes treatment-unit additivity (constant effects beyond the tested contrast):
implying . The catch: under the true weak null the potential outcomes may have unequal variances (heteroscedasticity). A non-studentized statistic such as the difference in means then has a randomization distribution that does not match its true sampling distribution — the FRT with (or the classical ) fails to control type I error, even asymptotically, unless variances are equal or the design is balanced (). This is the modern face of the Neyman–Fisher controversy (Neyman 1935; Ding & Dasgupta 2018). For even balanced designs do not rescue or .
When non-studentized statistics happen to be valid ^def-special-valid
The naive statistics are proper for the weak null only in special cases: (i) homoscedastic potential outcomes ; (ii) with a balanced design (Corollary 3); or (iii) binary outcomes under the ANOVA-type null , since for binary data the mean determines the variance , so equal means force equal variances. For general weak nulls of binary outcomes () variances still differ, so studentization is still recommended.
Examples
Heterogeneity that hides in the average. Let , . Suppose the true individual effects are . The average effect is , so the weak null holds. But the sharp null is badly false (no individual has zero effect). A randomization test that imputes assuming constant effects would assign the same variance to both arms, whereas the real treated/control variances differ — inflating false rejections for a non-studentized statistic. The studentized (next note) divides by an arm-specific variance estimate and stays valid.
Connections
- Fisher Randomization Test and the Sharp Null — the test designed for the sharp null; imputation via a compatible artificial sharp null lets it reach the weak null.
- Studentized Randomization Tests — the studentized statistic that restores validity for the weak null.
- Randomization Inference - Overview — broader framing of design-based inference.
- Potential Outcomes Framework — defines , , and the Science Table.
See Also
- Power Analysis and Sample Size — sharp vs weak nulls have different effective alternatives and power.
- The Experimental Ideal — randomization balances groups on average, the natural target of a weak null.