Multiple Testing Corrections

Summary

When performing many statistical tests simultaneously, the probability of false positives increases dramatically. Multiple testing corrections adjust significance thresholds to control error rates. The two main frameworks are FWER (Bonferroni — no false positives allowed) and FDR (Benjamini-Hochberg — tolerate a fixed proportion of false discoveries).

The Problem

With $m$ independent tests at $α = 0.05$ :

P (at least one false positive) = 1 - (1 - α)^{m}

Tests ( $m$ )	P(at least one FP)
1	5.0%
10	40.1%
100	99.4%
1000	~100%

This is why the Garden of Forking Paths is so dangerous — even without explicit testing, the implicit multiplicity inflates false positives.

Family-Wise Error Rate (FWER)

Bonferroni Correction

The simplest approach: reject $H_{i}$ only if $p_{i} \leq α / m$ .

α_{adjusted} = \frac{α}{m}

Controls: probability that any false positive occurs
Pro: simple, conservative, valid under any dependency structure
Con: very conservative — often yields no significant results in large-scale studies

Warning

Bonferroni becomes extremely conservative as $m$ grows. With 10,000 tests, the threshold drops to $5 \times 1 0^{- 6}$ — potentially missing many real effects.

Holm’s Step-Down Procedure

A less conservative FWER method:

Sort p-values: $p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(m)}$
Reject $H_{(i)}$ if $p_{(i)} \leq α / (m - i + 1)$
Stop at first non-rejection

Uniformly more powerful than Bonferroni while still controlling FWER.

False Discovery Rate (FDR)

Concept

Instead of preventing all false positives, FDR controls the expected proportion of false discoveries among rejected hypotheses:

FDR = E [\frac{false positives}{total rejections}]

Benjamini-Hochberg (BH) Procedure

Sort p-values: $p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(m)}$
Find the largest $k$ such that $p_{(k)} \leq \frac{k}{m} α$
Reject all $H_{(1)}, \dots, H_{(k)}$

Controls FDR at level $α$ under independence (or positive dependence)
Much more powerful than Bonferroni for large-scale testing

Q-Values (Storey)

The q-value of a test is the minimum FDR at which that test would be called significant — analogous to the p-value but for FDR rather than FWER.

When to Use Each

Method	Best for	Error controlled
Bonferroni	Few tests, each individually important	FWER (any FP)
Holm	Few tests, want more power than Bonferroni	FWER
BH/FDR	Many tests, batch follow-up (genomics, imaging)	FDR (proportion of FP)
Q-values	Ranking results by reliability	FDR

Connection to Bayesian Approaches

Tip

Hierarchical Models provide a natural Bayesian alternative to multiple testing corrections. Partial pooling shrinks estimates toward the grand mean, automatically regularizing extreme results — achieving a similar effect to FDR control but derived from the model structure rather than an ad hoc correction. See Forking Paths and Bayesian Approaches.

Second Brain

Explorer

Multiple Testing Corrections

Multiple Testing Corrections

The Problem

Family-Wise Error Rate (FWER)

Bonferroni Correction

Holm’s Step-Down Procedure

False Discovery Rate (FDR)

Concept

Benjamini-Hochberg (BH) Procedure

Q-Values (Storey)

When to Use Each

Connection to Bayesian Approaches

See Also

Graph View

Table of Contents

Backlinks