Regression and the CEF
Summary
The population regression function is the best linear approximation to the conditional expectation function (CEF). This relationship holds regardless of whether the CEF is actually linear, giving regression a robust interpretation even when functional form assumptions fail.
The CEF
The conditional expectation function is:
- The best predictor of given (minimizes mean squared error)
- A function that decomposes any random variable: where is uncorrelated with any function of
Three Justifications for Regression
1. Linear CEF Theorem
If the CEF is linear, then the population regression function equals the CEF.
2. Best Linear Predictor Theorem
The regression function is the best linear predictor of given in the MMSE sense.
3. Regression-CEF Theorem (the key one)
Even when the CEF is nonlinear, regression provides the MMSE linear approximation to it:
Regression Anatomy
The coefficient on regressor in a multivariate regression:
where is the residual from regressing on all other covariates. This is the Frisch-Waugh result: each multivariate coefficient equals the bivariate coefficient after “partialling out” other variables.
Robust Standard Errors
The heteroskedasticity-consistent (robust) covariance matrix:
Always Use Robust Standard Errors
Since regression approximates a possibly nonlinear CEF, heteroskedasticity is the natural state of affairs. Robust and conventional standard errors that differ by more than 30% may indicate a problem.
Saturated Models
A saturated model has a separate parameter for every possible covariate combination — it fits the CEF perfectly and is inherently linear. Example: with two dummies , the saturated model includes both main effects and their interaction.
See Also
- Conditional Independence Assumption
- Omitted Variables Bias
- Mostly Harmless Econometrics - Overview
- Bayesian Linear Regression — the Bayesian perspective on regression, with priors providing natural regularization
- Asymptotics and Frequentist Connections — Bayesian posteriors converge to OLS estimates under flat priors
- Data Collection Models — Bayesian ignorability as the prerequisite for regression’s causal interpretation
- Local Average Treatment Effects — IV/LATE as the estimand when regression cannot recover the full ATE due to non-compliance
- Directed Acyclic Graphs — DAGs identify the adjustment set that gives regression a causal interpretation
- Spurious Association and Confounds — Statistical Rethinking’s treatment of multivariate regression and confounds, the Bayesian parallel to MHE’s CEF analysis