Studentized Randomization Tests

Summary

The central result of Wu & Ding (2021): running the FRT with a studentized (Wald-type) statistic $X^{2} = N (C \hat{\overset{ˉ}{Y}} - x)^{T} (C \hat{D} C^{T})^{- 1} (C \hat{\overset{ˉ}{Y}} - x)$ — the estimated contrast scaled by a heteroscedasticity-robust covariance estimator — yields a test with dual validity: it is finite-sample exact under the sharp null $H_{0 F}$ (a free property of any FRT) and asymptotically conservative (valid type I error) under the weak null $H_{0 N} (C, x)$ . It is model-free and agnostic to treatment-effect heterogeneity. Non-studentized statistics ( $∣ \overset{τ}{^} ∣$ , the $F$ statistic, the Box-type statistic $B$ ) lack this and can fail. Practical recommendation: always use $X^{2}$ .

Overview

Recall (from Sharp vs Weak Null Hypotheses) the weak null $H_{0 N} (C, x) : C \overset{ˉ}{Y} = x$ . The estimator $\hat{\overset{ˉ}{Y}} = (\hat{\overset{ˉ}{Y}} (1), \dots, \hat{\overset{ˉ}{Y}} (J))^{T}$ of arm means satisfies $N^{1/2} (\hat{\overset{ˉ}{Y}} - \overset{ˉ}{Y}) d N (0_{J}, V)$ with $V = D - S ⪯ D$ , where $D = diag {S (1, 1) / p_{1}, \dots, S (J, J) / p_{J}}$ . The true $V$ depends on the unestimable cross-arm covariances $S (j, k)$ , $j \neq = k$ ; but the diagonal “Neyman” estimator

\hat{D} = N diag {\hat{S} (1, 1) / N_{1}, \dots, \hat{S} (J, J) / N_{J}} p D ⪰ V

is conservative (it over-estimates the true variance). Studentizing by $C \hat{D} C^{T}$ is what makes the FRT robust to variance heterogeneity.

Proposition 4 (the criterion). The FRT with statistic $T$ controls type I error at any level for $H_{0 N} (C, x)$ if, under the null, the sampling distribution of $T$ is stochastically dominated by its randomization distribution $T_{π} ∣ W$ (written $T \leq_{st} T_{π} ∣ W$ ). A statistic with this property is called proper. The whole game is to find a $T$ that is proper; $X^{2}$ is.

Main Content

The studentized statistic $X^{2}$

$X^{2} = N (C \hat{\overset{ˉ}{Y}} - x)^{T} (C \hat{D} C^{T})^{- 1} (C \hat{\overset{ˉ}{Y}} - x) .$
A Wald-type quadratic form using the conservative robust covariance estimator $C \hat{D} C^{T}$ for $N^{1/2} (C \hat{\overset{ˉ}{Y}} - x)$ . In the treatment-control case ( $C = (1, - 1)$ ) it reduces to the squared studentized ATE,
$X^{2} = \frac{τ ^ ^{2}}{S ^ ( 1 , 1 ) / N _{1} + S ^ ( 2 , 2 ) / N _{2}} = t^{2},$
i.e. the square of Neyman’s ATE estimate over its (heteroscedasticity-robust) standard error.

Theorem 1 — $X^{2}$ is proper (dual validity)

Under Assumption 1, the sampling distribution satisfies, under $H_{0 N} (C, x)$ ,
$X^{2} d j = 1 \sum m a_{j} ξ_{j}^{2}, a_{j} \in [0, 1],$
a weighted sum of independent $χ_{1}^{2}$ variates with weights at most 1. Under the stronger Assumption 2, with $π \sim Unif (Π_{N})$ , the randomization distribution satisfies
$X_{π}^{2} ∣ W d χ_{m}^{2} almost surely .$
Because $χ_{m}^{2} = \sum_{j = 1}^{m} ξ_{j}^{2}$ stochastically dominates $\sum_{j} a_{j} ξ_{j}^{2}$ (weights $a_{j} \leq 1$ ), the FRT with $X^{2}$ asymptotically conservatively controls type I error under the weak null. Combined with finite-sample exactness under the sharp null, $X^{2}$ is robust on two classes of nulls.

Box-type statistic $B$ is NOT proper

The Box-type statistic $B = N \hat{\overset{ˉ}{Y}}^{T} M \hat{\overset{ˉ}{Y}} / tr (M \hat{D})$ (with $M = C^{T} (C C^{T})^{- 1} C$ ) has asymptotic-mean ratio $\leq 1$ but this is necessary, not sufficient, for the stochastic-dominance criterion of Proposition 4. Hence the FRT with $B$ cannot control type I error in general, even asymptotically. Exceptions: equal variances, or a one-dimensional hypothesis ( $C$ a row vector, where $B = X^{2}$ ).

OLS $F$ statistic is NOT proper; Huber–White repairs it

The classical regression $F$ statistic uses a pooled variance $\overset{σ}{^}^{2}$ , which presumes homoscedasticity — incompatible with the potential-outcomes framework. Under $H_{0 N} (C, 0_{m})$ , $m F d \sum_{j} λ_{j} (\dots) ξ_{j}^{2}$ with weights that can exceed 1, so $F$ is improper (fails type I control under heteroscedasticity). Replacing $\overset{σ}{^}^{2} (X^{T} X)^{- 1}$ by the Huber–White estimator $\hat{D}_{HW} = N diag {(N_{j} - 1) \hat{S} (j, j) / N_{j}^{2}}$ gives $X_{HW}^{2} = N (C \hat{\overset{ˉ}{Y}})^{T} (C \hat{D}_{HW} C^{T})^{- 1} C \hat{\overset{ˉ}{Y}}$ , which is asymptotically equivalent to $X^{2}$ (since $N_{j} \approx N_{j} - 1$ ). Covariate adjustment / regression-based inference must pair the robust (HW) covariance with the FRT.

Two valid tests, one statistic ^def-two-tests

Theorem 1 yields two asymptotically conservative tests from $X^{2}$ : (a) the FRT — compare observed $X^{2}$ to its randomization distribution (also finite-sample exact for $H_{0 F}$ ); (b) the $χ_{m}^{2}$ approximation — reject if $X^{2}$ exceeds the $1 - α$ quantile of $χ_{m}^{2}$ (no Monte Carlo). The FRT has the extra finite-sample-exactness property; in simulations and applications it tends to be slightly more conservative than the $χ^{2}$ approximation.

Practical recommendation ^summary-recommendation

Use the FRT with the studentized statistic $X^{2}$ (equivalently $t^{2}$ for ATE, or the Huber–White-robust $F$ ). It is model-free, finite-sample exact under the sharp null, asymptotically valid under the weak null, robust to treatment-effect heterogeneity and unequal variances, and extends to stratified, clustered, factorial, ANOVA, trend-test, and binary-outcome designs. Avoid the non-studentized $∣ \overset{τ}{^} ∣$ , plain $F$ , and Box-type $B$ for weak nulls except in the narrow special cases (equal variances; $J = 2$ balanced; binary outcomes under the equal-means null).

Examples

Charness & Gneezy (2009), financial incentives for exercise (paper’s Sec. 7.1). $N = 120$ college students, $J = 3$ arms (no / small / large incentive), $N_{1} = N_{2} = N_{3} = 40$ ; outcome = change in weekly gym visits. Sample means $\approx (- 0.029, 0.054, 0.640)$ and sample variances $(0.152, 0.386, 1.489)$ — clearly heteroscedastic. Testing the weak null $\overset{ˉ}{Y} (1) = \overset{ˉ}{Y} (2) = \overset{ˉ}{Y} (3)$ at the 1% level, the FRT-with- $X^{2}$ and its $χ^{2}$ approximation give congruent, significant results, whereas the $F$ -based test is overly conservative for this data (its $p$ -values inflate). Guided by the theory, one trusts the $X^{2}$ $p$ -values over the $F$ ones. (A tiny jitter was added to outcomes because many were exactly 0, to avoid degenerate permuted groups.)

Sanity computation of $X^{2}$ . Two arms, $\overset{τ}{^} = 2.5$ , $\hat{S} (1, 1) / N_{1} = 1.0$ , $\hat{S} (2, 2) / N_{2} = 0.56$ . Then $X^{2} = 2. 5^{2} / (1.0 + 0.56) = 6.25/1.56 \approx 4.01 = t^{2}$ (so $t \approx 2.0$ ). Compare $4.01$ to the randomization distribution of $X^{2}$ (asymptotically $χ_{1}^{2}$ , whose $0.95$ quantile is $3.84$ ) — borderline reject at 5%. Crucially the denominator used arm-specific variances, not a pooled one.

Connections

Fisher Randomization Test and the Sharp Null — supplies the finite-sample-exact half of the dual validity and the imputation machinery.
Sharp vs Weak Null Hypotheses — explains the heteroscedasticity problem that studentization solves.
Randomization Inference - Overview — finite-population asymptotics and the conservative estimator $\hat{D}$ .
Permutation Tests and Exact Inference — studentization is the same device that fixes permutation tests under unequal variances (Behrens–Fisher).

Second Brain

Explorer

Studentized Randomization Tests

Studentized Randomization Tests

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks