Partial Pooling as Multiple Comparisons Correction

Summary

In a hierarchical model, partial pooling shrinks group-level estimates toward the grand mean. This shrinkage reduces the z-score for any pairwise comparison by a factor of $1/ 1 + σ_{\overset{y}{ˉ}}^{2} / σ_{θ}^{2}$ , which is always less than 1. The effect is strongest when group-level variance is small relative to within-group variance — precisely when multiple comparisons are most concerning. This provides a natural, data-adaptive multiple comparisons correction without the power loss of classical methods.

Overview

Classical multiple comparisons corrections (Bonferroni, FDR) adjust the threshold for significance — they widen confidence intervals or lower the $p$ -value cutoff, keeping point estimates unchanged. Multilevel models take a fundamentally different approach: they adjust the estimates themselves through partial pooling, pulling them toward each other. This section formalizes why this works as a multiple comparisons correction.

The Multilevel Model

Consider a simple normal-normal hierarchical model for group effects:

y_{j} ∣ θ_{j} \sim N (θ_{j}, σ_{\overset{y}{ˉ}}^{2}), j = 1, \dots, J

θ_{j} \sim N (μ, σ_{θ}^{2})

where $y_{j}$ is the group mean, $σ_{\overset{y}{ˉ}}^{2}$ is the within-group sampling variance, and $σ_{θ}^{2}$ is the between-group variance.

The Algebra of Shrinkage

Theorem: Posterior Mean and Variance Under Partial Pooling (Gelman et al., 2009, Sec. 3.2)

For the normal-normal hierarchical model, the posterior mean and standard deviation for group $j$ are:
$posterior E (θ_{j}) = (\frac{1}{σ _{θ}^{2}} μ + \frac{1}{σ _{\overset{y}{ˉ}}^{2}} \overset{y}{ˉ}_{j}) / (\frac{1}{σ _{θ}^{2}} + \frac{1}{σ _{\overset{y}{ˉ}}^{2}})$ $posterior sd (θ_{j}) = \frac{1}{\frac{1}{σ _{θ}^{2}} + \frac{1}{σ _{\overset{y}{ˉ}}^{2}}}$
The posterior mean is a precision-weighted average of the prior mean $μ$ and the data $\overset{y}{ˉ}_{j}$ . The smaller $σ_{θ}^{2}$ (more similar groups), the more the estimate is pulled toward $μ$ .

Z-Score Shrinkage for Comparisons

The key result for multiple comparisons: what happens to the z-score when comparing two groups?

Theorem: Z-Score Shrinkage Factor (Gelman et al., 2009, Sec. 3.2)

For a comparison $θ_{j} - θ_{k}$ between two groups:
$posterior E (θ_{j} - θ_{k}) = \frac{σ _{θ}^{2}}{σ _{\overset{y}{ˉ}}^{2} + σ _{θ}^{2}} (\overset{y}{ˉ}_{j} - \overset{y}{ˉ}_{k})$ $posterior sd (θ_{j} - θ_{k}) = 2 σ_{\overset{y}{ˉ}} σ_{θ} / σ_{\overset{y}{ˉ}}^{2} + σ_{θ}^{2}$
The posterior z-score for the comparison is:
$z_{Bayes} = classical z-score \frac{y ˉ _{j} - y ˉ _{k}}{2 σ _{\overset{y}{ˉ}}} \cdot shrinkage factor \frac{1}{1 + σ _{\overset{y}{ˉ}}^{2} / σ _{θ}^{2}}$
The shrinkage factor is always $< 1$ and approaches 0 as $σ_{θ}^{2} \to 0$ (groups are identical). It approaches 1 as $σ_{θ}^{2} \to \infty$ (groups are unrelated, no pooling).

Definition: Variance Ratio

The variance ratio $σ_{θ}^{2} / σ_{\overset{y}{ˉ}}^{2}$ determines the degree of shrinkage:

Small (groups similar): strong shrinkage, large reduction in z-scores, effective multiple comparisons correction

Large (groups different): weak shrinkage, z-scores close to classical values, little correction needed

This is the key adaptive property: the model corrects most when correction is most needed.

Why This Is Better Than Classical Corrections

Property	Classical (Bonferroni/FDR)	Multilevel (Partial Pooling)
Adjusts	Threshold / interval width	Point estimates + intervals
Point estimates	Unchanged	Improved (shrunk toward mean)
Interval width	Always wider	Can be narrower than classical
Adapts to data	No (fixed penalty for $m$ tests)	Yes (adapts to variance ratio)
Power	Reduced	Preserved or improved
Type S error	Not addressed	Reduced by shrinkage
Type M error	Not addressed	Reduced by shrinkage

Simulation Evidence

Example: Eight Schools — Small Effects (Gelman et al., 2009, Sec. 4.2)

Setup: Simulate 8 school treatment effects from $N (0, 5^{2})$ with standard errors from Rubin (1981). Perform 1000 replications, computing all $(2 8) = 28$ pairwise comparisons.

Classical results: 7% of comparisons are statistically significant. Of these, only 63% have the correct sign — a devastating Type S error rate of 37%.

Bayesian results: Only 0.5% of comparisons are significant, with 89% correct sign. The shrinkage has already performed the multiple comparisons correction.

Across replications: Classical analysis finds at least one significant comparison in 47% of simulations. Bayesian analysis: only 5%.

Example: Eight Schools — Large Effects (Gelman et al., 2009, Sec. 4.2)

Setup: Same as above but with $N (0, 1 0^{2})$ — effects are now large relative to standard errors.

Classical results: 12% significant, 86% correct sign.

Bayesian results: 3% significant, 96% correct sign.

Interpretation: With larger effects, less shrinkage occurs (higher variance ratio), so the Bayesian model makes fewer corrections. But it still reduces false claims while maintaining higher accuracy on the claims it does make.

Example: State Test Scores — High Variance Ratio (Gelman et al., 2009, Sec. 4.1)

Setup: NAEP 4th-grade math scores across 40+ states. The variance ratio $σ_{θ}^{2} / σ_{\overset{y}{ˉ}}^{2}$ is large because true state differences are substantial.

Result: The multilevel model produces more significant comparisons than the classical FDR-corrected analysis — more claims with confidence, fewer ambiguous cases. The hierarchical model adapts: when there’s clear evidence of real differences, it doesn’t over-correct.

The Intuition

Why does partial pooling work as a multiple comparisons correction?

Multiple comparisons are a problem because of uncertainty — if we knew the true effects, there would be nothing to correct
Classical inference uses only within-group information for each estimate — ignoring what other groups tell us
Multilevel models recognize that groups are measuring the same phenomenon — each group’s estimate is informed by all groups
Greater uncertainty → more shrinkage — precisely the groups most prone to Type S/M errors get the strongest correction
The correction is built into the model, not applied as a post hoc patch

Model Fitting

The multilevel model can be fit in R with lme4:

ihdp.fit <- lmer(y ~ treatment + (1 + treatment | group))

Functions in the arm package can then sample from the posterior distribution for site-specific treatment effects. Full Bayesian fitting via Stan or BUGS allows for more flexible priors.

Connections

Greenland & Robins (1991) make a similar argument, framing multiple comparisons as an “opportunity to improve estimates through judicious use of prior information”
The James-Stein estimator (Efron & Morris, 1975) is a frequentist analogue — pooled estimates dominate unpooled ones in $\geq 3$ dimensions
Efron (2006) draws connections between empirical Bayes, hierarchical Bayes, and FDR

Second Brain

Explorer

Partial Pooling as Multiple Comparisons Correction

Partial Pooling as Multiple Comparisons Correction

Overview

The Multilevel Model

The Algebra of Shrinkage

Z-Score Shrinkage for Comparisons

Why This Is Better Than Classical Corrections

Simulation Evidence

The Intuition

Model Fitting

Connections

See Also

Graph View

Table of Contents

Backlinks