Choosing the Global Scale and Effective Nonzeros
Summary
The single most consequential hyperparameter of the horseshoe is the global scale , and there was no principled way to set it. Piironen & Vehtari define the effective number of nonzero coefficients and derive its prior mean as a function of . Inverting that relation turns a prior guess for the number of relevant variables into a concrete scale . The key lesson: must scale as , so the popular default is usually a poor choice.
Overview
For a fixed , the implied sparsity depends on the dimension , the noise , and the sample size — so reasoning about a single is not enough. The right object is the aggregate effective model size. The authors prefer full Bayesian inference for over plug-in estimates (the marginal-likelihood estimate can collapse to for very sparse vectors; cross-validation ignores posterior uncertainty), but the prior still needs a sensible location — which supplies. This note builds directly on the shrinkage factor of Global-Local Shrinkage Priors and feeds the slab logic of Regularized Horseshoe (Finnish Horseshoe).
Main Content
Effective number of nonzero coefficients
When the are near 0 or 1 (as for the horseshoe), counts how many coefficients are active/unshrunk — an interpretable measure of effective model size.
Prior mean and variance of
Using and with :
For standardized predictors () these simplify to
Prior-guess formula for the global scale
Solving for standardized predictors gives the scale that places most prior mass for near a prior guess :
Either fix or, better, use it as the scale of a weakly-informative half-normal/half-Cauchy hyperprior, e.g. . Two structural facts: (i) must scale as to keep beliefs invariant to and ; (ii) is typically far from 1 or , the scales used by the defaults and .
Connection to the oracle result
For the simplified model (, ), van der Pas et al. (2014) prove the minimax-optimal scale (up to a log factor) is , where is the true number of nonzeros. Setting and , the formula gives as with . So -based tuning recovers the oracle but is more generally applicable.
Why the default is dubious. Sampling , , then computing , shows: gives a near-symmetric prior around ; a half-normal skews toward ; a half-Cauchy adds a thick tail. But places far too much mass on large , favoring solutions with most coefficients unshrunk — sensible only when is strongly identified by data. Crucially, the first three priors keep the same prior under changes in or ; does not.
Examples
- Worked : , , , prior guess relevant variables gives .
- Five of a hundred: of , , : .
- Sampling : to inspect any , draw and , compute from the shrinkage-factor formula, then — works for any scale-mixture prior even when closed-form moments are unavailable.
Connections
- Aggregates the shrinkage factor of Global-Local Shrinkage Priors.
- Tunes for The Horseshoe Prior; the same (with = guess for coefficients far from zero) carries to the Regularized Horseshoe (Finnish Horseshoe), where with .
- “Effective model size” is the Bayesian-shrinkage analogue of effective parameters in Overfitting and Information Criteria.
- scaling and pooling strength echo Hierarchical Linear Models.
See Also
- Regularized Horseshoe (Finnish Horseshoe) — uses and shrinks by the slab
- The Horseshoe Prior — the prior whose this calibrates
- Global-Local Shrinkage Priors — source of the shrinkage factor
- Horseshoe and Regularized Horseshoe Priors — overview hub
- Overfitting and Information Criteria — effective number of parameters
- Hierarchical Linear Models — global scale as a pooling hyperparameter