Power Analysis and Sample Size
Summary
Power analysis determines the minimum sample size needed to detect a meaningful effect. Under-powered studies risk missing real effects (Type II error); over-powered studies waste resources. Power depends on significance level (), desired power (), and expected effect size.
Core Concepts
| Term | Definition | Typical Value |
|---|---|---|
| Type I error () | False positive — rejecting a true null | 0.05 or 0.01 |
| Type II error () | False negative — failing to reject a false null | 0.20 |
| Power () | Probability of detecting a real effect | 0.80 or 0.90 |
| Effect size | Magnitude of the difference you want to detect | From prior studies or pilot data |
Beyond Type I and Type II
In under-powered studies the more practically dangerous errors are Type S (sign) and Type M (magnitude) errors — getting the direction of an effect wrong, or dramatically over-estimating its size. These are not controlled by conventional power analysis and are exacerbated when sample sizes are small.
Key Normal Deviates
| (two-tailed) | Power | ||
|---|---|---|---|
| 0.05 | 1.96 | 80% | 0.84 |
| 0.01 | 2.58 | 90% | 1.28 |
Sample Size Formulas
Comparing Two Means (t-test)
where is the pooled SD and is the minimum detectable difference.
Comparing Two Proportions
where is the average proportion, is the difference, and .
Survey / Single Proportion
where is expected prevalence and is margin of error.
Correlation
Practical Adjustments
- Attrition: adjust where is expected dropout rate
- One-tailed tests: ~20% fewer subjects
- Non-randomized designs: add ~20% more subjects
- Crossover designs: ~25% of parallel group requirement
- Categorical outcomes require larger samples than continuous for equivalent power
Tip
Always base effect size estimates on prior literature or pilot data. Overly optimistic effect sizes lead to under-powered studies — one of the key contributors to the replication crisis.
Connection to Bayesian Approaches
In Bayesian analysis, the concept of “power” is less central — instead, one can use posterior predictive simulation to assess whether the planned sample provides adequate precision for quantities of interest. See Fitting and Validating Computation for simulation-based approaches.
See Also
- The Experimental Ideal — the experimental framework that power analysis serves
- Researcher Degrees of Freedom — how underpowered studies amplify forking paths
- Activity Bias in Advertising — a case where more data doesn’t help if identification fails
- Multiple Testing Corrections — multiple outcomes or interim analyses require both power adjustments and multiplicity corrections
- Forking Paths and Bayesian Approaches — under-powered studies interact with analytic flexibility to inflate false discovery rates
- Fitting and Validating Computation — simulation-based calibration as a Bayesian alternative to classical power analysis
- Hierarchical Models — multilevel designs increase effective power via partial pooling; power analysis for hierarchical models differs from flat designs