Survival Analysis
Summary
Survival analysis studies the time until an event of interest occurs. Its defining feature is censoring — some subjects don’t experience the event during the study, so their exact event time is unknown. The three core methods are Kaplan-Meier estimation, the log-rank test, and Cox proportional hazards regression.
Core Concepts
Survival Function
The probability of surviving (no event) past time :
Displayed as a step-function that declines at each event time.
Hazard Function
The instantaneous rate of event occurrence at time , given survival to :
The hazard ratio (HR) compares hazard rates between groups — HR > 1 means higher event rate.
Censoring
| Type | Description |
|---|---|
| Right censoring | Event hasn’t occurred by study end or subject lost to follow-up (most common) |
| Left censoring | Event occurred before observation began |
| Interval censoring | Event occurred between two known timepoints |
Warning
Censoring must be noninformative — the reason for censoring should be unrelated to the event risk. If sicker patients drop out more (informative censoring), results are biased.
The Three Core Methods
1. Kaplan-Meier Estimator
Nonparametric estimate of :
where = events at time and = subjects at risk just before .
- Produces the familiar step-function survival curve
- Reports median survival (time when ) and survival at fixed timepoints (e.g., 5-year survival)
- Cannot adjust for covariates
2. Log-Rank Test
Tests : no difference in survival between groups.
- Compares the entire survival distribution, not just specific timepoints
- Distribution-free (no parametric assumptions)
- Limitation: poor power when survival curves cross (one group favored early, another late)
- Cannot estimate effect size or adjust for confounders
3. Cox Proportional Hazards Model
The workhorse for multivariable survival analysis:
- : baseline hazard (left unspecified — semiparametric)
- : hazard ratio for covariate
- Adjusts for confounders while estimating treatment effects
- No distributional assumptions on survival times
Proportional Hazards Assumption
Warning
The model assumes that hazard ratios are constant over time. If the treatment effect changes over the study period (e.g., surgery helps early but not late), the PH assumption is violated. Always test this before interpreting results.
When to Use Each
| Method | Purpose | Covariates? | Effect size? |
|---|---|---|---|
| Kaplan-Meier | Visualize & describe survival | No | No |
| Log-Rank | Compare groups (unadjusted) | No | No |
| Cox PH | Multivariable analysis | Yes | Yes (HR) |
Sample Size for Survival Studies
Power depends on the number of events, not total sample size:
- Calculate events needed to detect a minimum HR at desired power
- Estimate proportion of subjects who will experience the event
- Derive total sample size
Rule of thumb: at least 10 events per covariate in a Cox model.
Advanced Extensions
- Parametric models: Weibull, exponential — more efficient if distributional assumptions hold
- Competing risks: multiple event types that preclude each other (e.g., death from cancer vs. death from other causes)
- Recurrent events: events that can happen multiple times (e.g., hospitalizations)
- Frailty models: random effects for clustered data (analogous to Hierarchical Models)
Connection to Bayesian Methods
Bayesian survival analysis places priors on hazard functions or regression coefficients:
- Bayesian Cox models: priors on provide regularization, especially useful with many covariates
- Nonparametric Bayesian: Dirichlet process priors on the baseline hazard
- Posterior predictive checks (Model Checking) apply directly — simulate event times and compare to data
See Also
- The Experimental Ideal — experimental design that survival analysis often evaluates
- Generalized Linear Models — Cox model shares the GLM structure
- Missing Data Models — censoring is a form of missing data
- Power Analysis and Sample Size — sample size calculation for survival studies