Multiple Testing Corrections

Summary

When performing many statistical tests simultaneously, the probability of false positives increases dramatically. Multiple testing corrections adjust significance thresholds to control error rates. The two main frameworks are FWER (Bonferroni — no false positives allowed) and FDR (Benjamini-Hochberg — tolerate a fixed proportion of false discoveries).

The Problem

With independent tests at :

Tests ()P(at least one FP)
15.0%
1040.1%
10099.4%
1000~100%

This is why the Garden of Forking Paths is so dangerous — even without explicit testing, the implicit multiplicity inflates false positives.

Family-Wise Error Rate (FWER)

Bonferroni Correction

The simplest approach: reject only if .

  • Controls: probability that any false positive occurs
  • Pro: simple, conservative, valid under any dependency structure
  • Con: very conservative — often yields no significant results in large-scale studies

Warning

Bonferroni becomes extremely conservative as grows. With 10,000 tests, the threshold drops to — potentially missing many real effects.

Holm’s Step-Down Procedure

A less conservative FWER method:

  1. Sort p-values:
  2. Reject if
  3. Stop at first non-rejection

Uniformly more powerful than Bonferroni while still controlling FWER.

False Discovery Rate (FDR)

Concept

Instead of preventing all false positives, FDR controls the expected proportion of false discoveries among rejected hypotheses:

Benjamini-Hochberg (BH) Procedure

  1. Sort p-values:
  2. Find the largest such that
  3. Reject all
  • Controls FDR at level under independence (or positive dependence)
  • Much more powerful than Bonferroni for large-scale testing

Q-Values (Storey)

The q-value of a test is the minimum FDR at which that test would be called significant — analogous to the p-value but for FDR rather than FWER.

When to Use Each

MethodBest forError controlled
BonferroniFew tests, each individually importantFWER (any FP)
HolmFew tests, want more power than BonferroniFWER
BH/FDRMany tests, batch follow-up (genomics, imaging)FDR (proportion of FP)
Q-valuesRanking results by reliabilityFDR

Connection to Bayesian Approaches

Tip

Hierarchical Models provide a natural Bayesian alternative to multiple testing corrections. Partial pooling shrinks estimates toward the grand mean, automatically regularizing extreme results — achieving a similar effect to FDR control but derived from the model structure rather than an ad hoc correction. See Forking Paths and Bayesian Approaches.

See Also