The Selection Problem
Summary
Selection bias arises because individuals who receive treatment differ systematically from those who don’t, even in the absence of treatment. This is the fundamental challenge that all causal inference methods aim to overcome.
Potential Outcomes Framework
For individual with treatment :
- : outcome without treatment
- : outcome with treatment
- Causal effect: (never directly observed for any individual)
The observed outcome:
The Decomposition
Selection Bias Can Be Large
In the hospital example, selection bias is negative (sick people seek hospitals) and large enough to completely mask a positive treatment effect — making hospitals appear harmful.
Solutions
| Method | How it addresses selection bias |
|---|---|
| Random assignment | Makes independent of potential outcomes |
| Matching | Controls for observables that drive selection |
| IV | Uses exogenous variation in treatment |
| Fixed effects | Controls for time-invariant unobservables |
| RD | Exploits arbitrary assignment rules |
See Also
- The Experimental Ideal
- Omitted Variables Bias
- Mostly Harmless Econometrics - Overview
- Data Collection Models — Bayesian treatment of ignorability and selection mechanisms
- Synthetic Control Extensions — penalized and matrix completion methods that address selection when pre-treatment fit is imperfect
- Xu 2016 - Overview — GSC as a general solution to the selection problem under time-varying confounding
- Synthetic Control Inference and Diagnostics — permutation inference for the synthetic control estimator that addresses selection