James-Stein Estimator
Summary
The James-Stein (JS) estimator shrinks the vector of observed values toward a center by an empirically estimated factor: with . It is exactly the Bayes estimator with the unknown shrinkage term replaced by its unbiased estimate formed from the marginal distribution. The general “shrink toward the grand mean” form pulls each toward . This is empirical Bayes in action: the prior is estimated from the parallel cases.
Overview
Consider parallel normal estimation problems (Efron eq. 1.7):
with total squared-error loss and risk .
The obvious estimator — used implicitly in every regression and ANOVA — is the MLE , with constant risk for every . If the prior were known, Bayes rule (eq. 1.10) gives posterior with , and the Bayes estimator (eq. 1.16) is
With this shrinks the MLE halfway toward . But if is unknown we cannot use it — this is precisely where empirical Bayes enters (see Empirical Bayes - Overview).
Main Content
Estimating the shrinkage factor from the marginal ^marginal-estimate
Integrating the prior out, the marginal distribution of (eq. 1.20) is
Hence , which yields the key unbiased estimate of the shrinkage term:
The unknown Bayes quantity is thus estimable directly from the pooled data — see Robbins Formula and Poisson Empirical Bayes for the analogous nonparametric move.
James-Stein estimator (shrink toward 0) ^js-estimator
Substituting the unbiased estimate for in the Bayes rule gives the James-Stein estimator (Efron eq. 1.23):
The name “empirical Bayes” is apt: the Bayes estimator (1.16) is itself empirically estimated from the data. This is only possible because similar problems are under simultaneous consideration.
James-Stein estimator (shrink toward the grand mean) ^js-grand-mean
We need not shrink toward . Starting from the more general prior , (eq. 1.32), the Bayes rule with has empirical Bayes form (eq. 1.35):
with and . Each is pulled toward the grand mean by a data-estimated factor; the risk-dominance theorem holds now for (one degree of freedom is spent estimating ).
Overall Bayes risk and the modest EB penalty ^js-risk
The overall Bayes risk of is , versus . The James-Stein estimator pays only a small penalty for not knowing (eqs. 1.24-1.25):
For , , is only 20% greater than the true Bayes risk — almost all the Bayesian savings are recovered without knowing the prior.
Limited-translation compromise ^limited-translation
To protect genuinely unusual cases from being over-shrunk, Efron’s limited-translation estimator (eq. 1.37) follows the JS estimate but never deviates more than from :
Taking in the baseball data costs only ~10% of the overall JS advantage while sharply limiting damage to outliers like Clemente.
Examples
Baseball batting averages (Efron Table 1.1, )
Early-1970-season batting averages (hits/45 at-bats) predict true season averages . With grand average and (binomial variance), the JS estimates (1.35) shrink each player toward :
Player hits/AB true Clemente 18/45 .400 .346 .294 F. Robinson 17/45 .378 .298 .289 Munson 8/45 .178 .316 .247 Alvis 7/45 .156 .200 .242 Grand Average .265 .265 .265 The ratio of total prediction errors is
a roughly 3.5x accuracy gain for the empirical Bayes estimates. (The are binomial here, violating the exact normal theorem conditions, but the JS effect is quite insensitive to the model.)
Regression-based shrinkage (kidney data)
Combining covariate information with shrinkage (eq. 1.39), JS shrinks toward a fitted regression line rather than toward :
Connections
- Empirical Bayes - Overview — places JS in the broader EB program.
- Robbins Formula and Poisson Empirical Bayes — the nonparametric sibling; both estimate the prior from the marginal.
- Stein’s Paradox and Risk Dominance — the proof that JS dominates the MLE for .
- Empirical Bayes Interpretation of Shrinkage — why "" is an estimated prior; link to hierarchical Bayes.
- Partial Pooling as Multiple Comparisons Correction — shrinkage toward as partial pooling.
- Hierarchical Models — the fully Bayesian generalization.