Omitted Variables Bias
Summary
The OVB formula describes the mechanical relationship between regression coefficients in models with different sets of control variables. It shows how omitting relevant variables biases the coefficients on included variables.
The Formula
If the “long” (correct) regression is:
and the “short” (omitted variable) regression is:
Then:
where is the coefficient from regressing the omitted variable on .
Interpreting the Bias
The bias has two components:
- — the effect of the omitted variable on the outcome
- — the correlation between the omitted variable and the included regressor
| Bias direction | ||
|---|---|---|
| Positive | Positive | Upward (overestimate) |
| Positive | Negative | Downward (underestimate) |
| Negative | Positive | Downward |
| Negative | Negative | Upward |
Schooling Example
For returns to schooling where “ability” () is omitted:
- Ability likely has positive effect on wages ()
- Ability is positively correlated with schooling ()
- Therefore OLS without ability controls likely overestimates the return to schooling
The OVB Formula is Mechanical
It describes the relationship between short and long regressions whether or not either has a causal interpretation. It applies to any pair of nested regression specifications.
See Also
- Conditional Independence Assumption
- Regression and the CEF
- Instrumental Variables
- Bayesian Linear Regression — Bayesian shrinkage as regularization that partially mitigates OVB in high-dimensional settings
- The Experimental Ideal — randomization eliminates OVB by construction; the gold-standard contrast to observational confounding
- Research Questions in Econometrics — FAQ #3 (identification strategy) is directly aimed at the OVB threat