The Selection Problem

Summary

Selection bias arises because individuals who receive treatment differ systematically from those who don’t, even in the absence of treatment. This is the fundamental challenge that all causal inference methods aim to overcome.

Potential Outcomes Framework

For individual $i$ with treatment $D_{i} \in {0, 1}$ :

$Y_{0 i}$ : outcome without treatment
$Y_{1 i}$ : outcome with treatment
Causal effect: $Y_{1 i} - Y_{0 i}$ (never directly observed for any individual)

The observed outcome:

Y_{i} = Y_{0 i} + (Y_{1 i} - Y_{0 i}) D_{i}

The Decomposition

E [Y_{i} ∣ D_{i} = 1] - E [Y_{i} ∣ D_{i} = 0] = ATT E [Y_{1 i} - Y_{0 i} ∣ D_{i} = 1] + selection bias E [Y_{0 i} ∣ D_{i} = 1] - E [Y_{0 i} ∣ D_{i} = 0]

Selection Bias Can Be Large

In the hospital example, selection bias is negative (sick people seek hospitals) and large enough to completely mask a positive treatment effect — making hospitals appear harmful.

Solutions

Method	How it addresses selection bias
Random assignment	Makes $D_{i}$ independent of potential outcomes
Matching	Controls for observables that drive selection
IV	Uses exogenous variation in treatment
Fixed effects	Controls for time-invariant unobservables
RD	Exploits arbitrary assignment rules

Second Brain

Explorer

The Selection Problem

The Selection Problem

Potential Outcomes Framework

The Decomposition

Solutions

See Also

Graph View

Table of Contents

Backlinks