Choosing the Global Scale and Effective Nonzeros

Summary

The single most consequential hyperparameter of the horseshoe is the global scale , and there was no principled way to set it. Piironen & Vehtari define the effective number of nonzero coefficients and derive its prior mean as a function of . Inverting that relation turns a prior guess for the number of relevant variables into a concrete scale . The key lesson: must scale as , so the popular default is usually a poor choice.

Overview

For a fixed , the implied sparsity depends on the dimension , the noise , and the sample size — so reasoning about a single is not enough. The right object is the aggregate effective model size. The authors prefer full Bayesian inference for over plug-in estimates (the marginal-likelihood estimate can collapse to for very sparse vectors; cross-validation ignores posterior uncertainty), but the prior still needs a sensible location — which supplies. This note builds directly on the shrinkage factor of Global-Local Shrinkage Priors and feeds the slab logic of Regularized Horseshoe (Finnish Horseshoe).

Main Content

Effective number of nonzero coefficients

When the are near 0 or 1 (as for the horseshoe), counts how many coefficients are active/unshrunk — an interpretable measure of effective model size.

Prior mean and variance of

Using and with :

For standardized predictors () these simplify to

Prior-guess formula for the global scale

Solving for standardized predictors gives the scale that places most prior mass for near a prior guess :

Either fix or, better, use it as the scale of a weakly-informative half-normal/half-Cauchy hyperprior, e.g. . Two structural facts: (i) must scale as to keep beliefs invariant to and ; (ii) is typically far from 1 or , the scales used by the defaults and .

Connection to the oracle result

Why the default is dubious. Sampling , , then computing , shows: gives a near-symmetric prior around ; a half-normal skews toward ; a half-Cauchy adds a thick tail. But places far too much mass on large , favoring solutions with most coefficients unshrunk — sensible only when is strongly identified by data. Crucially, the first three priors keep the same prior under changes in or ; does not.

Examples

  • Worked : , , , prior guess relevant variables gives .
  • Five of a hundred: of , , : .
  • Sampling : to inspect any , draw and , compute from the shrinkage-factor formula, then — works for any scale-mixture prior even when closed-form moments are unavailable.

Connections

See Also