Survival Analysis

Summary

Survival analysis studies the time until an event of interest occurs. Its defining feature is censoring — some subjects don’t experience the event during the study, so their exact event time is unknown. The three core methods are Kaplan-Meier estimation, the log-rank test, and Cox proportional hazards regression.

Core Concepts

Survival Function

The probability of surviving (no event) past time :

Displayed as a step-function that declines at each event time.

Hazard Function

The instantaneous rate of event occurrence at time , given survival to :

The hazard ratio (HR) compares hazard rates between groups — HR > 1 means higher event rate.

Censoring

TypeDescription
Right censoringEvent hasn’t occurred by study end or subject lost to follow-up (most common)
Left censoringEvent occurred before observation began
Interval censoringEvent occurred between two known timepoints

Warning

Censoring must be noninformative — the reason for censoring should be unrelated to the event risk. If sicker patients drop out more (informative censoring), results are biased.

The Three Core Methods

1. Kaplan-Meier Estimator

Nonparametric estimate of :

where = events at time and = subjects at risk just before .

  • Produces the familiar step-function survival curve
  • Reports median survival (time when ) and survival at fixed timepoints (e.g., 5-year survival)
  • Cannot adjust for covariates

2. Log-Rank Test

Tests : no difference in survival between groups.

  • Compares the entire survival distribution, not just specific timepoints
  • Distribution-free (no parametric assumptions)
  • Limitation: poor power when survival curves cross (one group favored early, another late)
  • Cannot estimate effect size or adjust for confounders

3. Cox Proportional Hazards Model

The workhorse for multivariable survival analysis:

  • : baseline hazard (left unspecified — semiparametric)
  • : hazard ratio for covariate
  • Adjusts for confounders while estimating treatment effects
  • No distributional assumptions on survival times

Proportional Hazards Assumption

Warning

The model assumes that hazard ratios are constant over time. If the treatment effect changes over the study period (e.g., surgery helps early but not late), the PH assumption is violated. Always test this before interpreting results.

When to Use Each

MethodPurposeCovariates?Effect size?
Kaplan-MeierVisualize & describe survivalNoNo
Log-RankCompare groups (unadjusted)NoNo
Cox PHMultivariable analysisYesYes (HR)

Sample Size for Survival Studies

Power depends on the number of events, not total sample size:

  1. Calculate events needed to detect a minimum HR at desired power
  2. Estimate proportion of subjects who will experience the event
  3. Derive total sample size

Rule of thumb: at least 10 events per covariate in a Cox model.

Advanced Extensions

  • Parametric models: Weibull, exponential — more efficient if distributional assumptions hold
  • Competing risks: multiple event types that preclude each other (e.g., death from cancer vs. death from other causes)
  • Recurrent events: events that can happen multiple times (e.g., hospitalizations)
  • Frailty models: random effects for clustered data (analogous to Hierarchical Models)

Connection to Bayesian Methods

Bayesian survival analysis places priors on hazard functions or regression coefficients:

  • Bayesian Cox models: priors on provide regularization, especially useful with many covariates
  • Nonparametric Bayesian: Dirichlet process priors on the baseline hazard
  • Posterior predictive checks (Model Checking) apply directly — simulate event times and compare to data

See Also