Empirical Bayes Interpretation of Shrinkage

Summary

Shrinkage estimators are Bayes estimators with an estimated prior. The James-Stein factor $(N - 2) / S$ is nothing but an estimate of the prior shrinkage term $1/ (A + 1)$ , formed from the marginal distribution of the pooled data. This parametric empirical Bayes viewpoint links the James-Stein Estimator to hierarchical Bayesian models (where the prior gets its own prior) and to regularization / partial pooling: shrinkage, ridge-type penalties, and random-effects models all “borrow strength” by learning a prior from many parallel cases. Efron frames it as “learning from the experience of others.”

Overview

The defining feature of empirical Bayes is that the prior is not assumed but estimated. Two equivalent readings of the James-Stein Estimator:

Frequentist: a clever biased estimator that dominates the MLE for $N \geq 3$ (see Stein’s Paradox and Risk Dominance).
Empirical Bayes: the Bayes rule $\overset{μ}{^}^{(B a yes)} = (1 - \frac{1}{A + 1}) z$ with the unknown hyperparameter replaced by a data estimate.

Both readings describe the same shrinkage. The EB reading is the bridge to modern hierarchical modeling and regularization.

Main Content

Parametric empirical Bayes: estimate the hyperparameter ^parametric-eb

Under $μ_{i} \sim N (M, A)$ , $z_{i} ∣ μ_{i} \sim N (μ_{i}, σ_{0}^{2})$ , the Bayes posterior mean (eqs. 1.32-1.34) is
$\overset{μ}{^}_{i}^{(B a yes)} = M + B (z_{i} - M), B = \frac{A}{A + σ _{0}^{2}} .$
The hyperparameters $(M, A)$ are unknown, so we estimate them from the marginal $z_{i} \sim N (M, A + σ_{0}^{2})$ : $\hat{M} = \overset{z}{ˉ}$ and the shrinkage factor from $E {(N - 3) σ_{0}^{2} / S} = 1 - B$ . Plugging in gives the EB estimator (eq. 1.35)
$\overset{μ}{^}_{i}^{(J S)} = \overset{z}{ˉ} + (1 - \frac{( N - 3 ) σ _{0}^{2}}{S}) (z_{i} - \overset{z}{ˉ}) .$
The shrinkage factor is learned, not specified — this is the essence of parametric EB.

Shrinkage toward a regression line (borrowing strength via covariates) ^reg-shrinkage

The prior mean can itself depend on covariates: $μ_{i} \sim in d N (M_{0} + M_{1} \cdot age_{i}, A)$ (eq. 1.38). The EB estimate (eq. 1.39) then shrinks toward the fitted regression line $\overset{μ}{^}_{i}^{(re g)} = \hat{M}_{0} + \hat{M}_{1} \cdot age_{i}$ :
$\overset{μ}{^}_{i}^{(J S)} = \overset{μ}{^}_{i}^{(re g)} + (1 - \frac{( N - 4 ) σ _{0}^{2}}{S}) (z_{i} - \overset{μ}{^}_{i}^{(re g)}), S = \sum (z_{i} - \overset{μ}{^}_{i}^{(re g)})^{2} .$
Tukey’s phrase “borrowing strength” captures this: each case is improved by the experience of all the others, here channeled through a regression fit. This is the conceptual ancestor of random-effects regression.

Link to hierarchical Bayes and regularization ^hierarchical-link

Hierarchical Bayes: instead of plugging in point estimates $(\hat{M}, \hat{A})$ , place a hyperprior on $(M, A)$ and integrate. EB is the “plug-in” approximation to a fully Bayesian hierarchical model; it ignores uncertainty in the estimated prior (which EB confidence intervals must later correct for).

Regularization: the shrinkage factor $B < 1$ is mathematically a ridge/penalty pulling estimates toward a center; minimizing $\sum (z_{i} - μ_{i})^{2} + λ \sum (μ_{i} - \overset{z}{ˉ})^{2}$ reproduces shrinkage, with $λ$ the EB-estimated penalty. Shrinkage trades a little bias for a large variance reduction — the bias-variance tradeoff underlying Overfitting and Information Criteria.

Robbins (nonparametric EB): the same “estimate the prior from the marginal” idea, but recovering the entire posterior-mean curve rather than one hyperparameter (see Robbins Formula and Poisson Empirical Bayes).

Schematic: case 1 learning from the others (Efron Fig. 1.1) ^learning-from-others

The $N - 1$ “other” cases are observed first, yielding estimates $(\hat{M}, \hat{A})$ of the prior parameters. The estimated prior $N (\hat{M}, \hat{A})$ then supplements the direct evidence $z_{1} \sim N (μ_{1}, 1)$ for estimating $μ_{1}$ . (In practice $\overset{μ}{^}_{1}^{(J S)}$ uses $z_{1}$ along with the others, which improves accuracy.) “Which others?” is the central design question — with thousands of parallel cases the borrowed experience is vast.

Examples

Baseball: shrinkage = estimated prior in action (Efron Table 1.1)

With $\overset{z}{ˉ} = 0.265$ , $σ_{0}^{2} = \overset{z}{ˉ} (1 - \overset{z}{ˉ}) /45$ , the estimated shrinkage factor $1 - (N - 3) σ_{0}^{2} / S$ pulls all 18 players toward $0.265$ . Clemente ( $z = .400$ ) is shrunk to $\overset{μ}{^}^{(J S)} = .294$ ; Alvis ( $z = .156$ ) is pulled up to $.242$ . The shrinkage factor was never assumed — it was estimated from how spread out the 18 averages are (the marginal $S$ ). The result: prediction error ratio $0.28$ vs the MLE.

Limited translation = protecting against a misestimated prior

Because EB plugs in a single estimated prior, outliers (Clemente) can be over-shrunk. The limited-translation estimator $\overset{μ}{^}_{i}^{(D)}$ (eq. 1.37) with $D = 1$ caps deviation at $σ_{0} = 0.066$ , so Clemente’s prediction becomes $\overset{μ}{^}_{1}^{(D)} = 0.334$ rather than $\overset{μ}{^}_{1}^{(J S)} = 0.294$ , losing only ~10% of the overall JS advantage. This is a pragmatic acknowledgment that the estimated prior is imperfect for unusual cases.

Connections

James-Stein Estimator — the shrinkage estimator interpreted here as estimated-prior Bayes.
Robbins Formula and Poisson Empirical Bayes — the nonparametric form of estimating the prior.
Stein’s Paradox and Risk Dominance — why the estimated-prior shrinkage still wins frequentist-wise.
Hierarchical Models — the fully Bayesian version that integrates over the prior’s uncertainty.
Partial Pooling as Multiple Comparisons Correction — partial pooling is shrinkage with an estimated prior.
Overfitting and Information Criteria — shrinkage/regularization and the bias-variance tradeoff.
Empirical Bayes - Overview — the umbrella program.

Second Brain

Explorer

Empirical Bayes Interpretation of Shrinkage

Empirical Bayes Interpretation of Shrinkage

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks