James-Stein Estimator

Summary

The James-Stein (JS) estimator shrinks the vector of observed values toward a center by an empirically estimated factor: with . It is exactly the Bayes estimator with the unknown shrinkage term replaced by its unbiased estimate formed from the marginal distribution. The general “shrink toward the grand mean” form pulls each toward . This is empirical Bayes in action: the prior is estimated from the parallel cases.

Overview

Consider parallel normal estimation problems (Efron eq. 1.7):

with total squared-error loss and risk .

The obvious estimator — used implicitly in every regression and ANOVA — is the MLE , with constant risk for every . If the prior were known, Bayes rule (eq. 1.10) gives posterior with , and the Bayes estimator (eq. 1.16) is

With this shrinks the MLE halfway toward . But if is unknown we cannot use it — this is precisely where empirical Bayes enters (see Empirical Bayes - Overview).

Main Content

Estimating the shrinkage factor from the marginal ^marginal-estimate

Integrating the prior out, the marginal distribution of (eq. 1.20) is

Hence , which yields the key unbiased estimate of the shrinkage term:

The unknown Bayes quantity is thus estimable directly from the pooled data — see Robbins Formula and Poisson Empirical Bayes for the analogous nonparametric move.

James-Stein estimator (shrink toward 0) ^js-estimator

Substituting the unbiased estimate for in the Bayes rule gives the James-Stein estimator (Efron eq. 1.23):

The name “empirical Bayes” is apt: the Bayes estimator (1.16) is itself empirically estimated from the data. This is only possible because similar problems are under simultaneous consideration.

James-Stein estimator (shrink toward the grand mean) ^js-grand-mean

We need not shrink toward . Starting from the more general prior , (eq. 1.32), the Bayes rule with has empirical Bayes form (eq. 1.35):

with and . Each is pulled toward the grand mean by a data-estimated factor; the risk-dominance theorem holds now for (one degree of freedom is spent estimating ).

Overall Bayes risk and the modest EB penalty ^js-risk

The overall Bayes risk of is , versus . The James-Stein estimator pays only a small penalty for not knowing (eqs. 1.24-1.25):

For , , is only 20% greater than the true Bayes risk — almost all the Bayesian savings are recovered without knowing the prior.

Limited-translation compromise ^limited-translation

To protect genuinely unusual cases from being over-shrunk, Efron’s limited-translation estimator (eq. 1.37) follows the JS estimate but never deviates more than from :

Taking in the baseball data costs only ~10% of the overall JS advantage while sharply limiting damage to outliers like Clemente.

Examples

Baseball batting averages (Efron Table 1.1, )

Early-1970-season batting averages (hits/45 at-bats) predict true season averages . With grand average and (binomial variance), the JS estimates (1.35) shrink each player toward :

Playerhits/ABtrue
Clemente18/45.400.346.294
F. Robinson17/45.378.298.289
Munson8/45.178.316.247
Alvis7/45.156.200.242
Grand Average.265.265.265

The ratio of total prediction errors is

a roughly 3.5x accuracy gain for the empirical Bayes estimates. (The are binomial here, violating the exact normal theorem conditions, but the JS effect is quite insensitive to the model.)

Regression-based shrinkage (kidney data)

Combining covariate information with shrinkage (eq. 1.39), JS shrinks toward a fitted regression line rather than toward :

See Empirical Bayes Interpretation of Shrinkage.

Connections

See Also