Global-Local Shrinkage Priors

Summary

Global-local shrinkage priors write each regression coefficient as a zero-mean Gaussian whose variance is a product of a single global scale (shrinks everything toward zero) and a local scale (lets individual coefficients escape). They are the continuous, easy-to-sample alternative to discrete spike-and-slab priors. Every such prior shares the same shrinkage factor , whose prior density distinguishes ridge (mass near ), lasso (peaked interior), and the horseshoe (U-shaped, mass at both 0 and 1).

Overview

Two prior families dominate sparse Bayesian estimation: discrete two-component spike-and-slab priors (Spike-and-Slab Prior for Covariate Selection), and continuous shrinkage priors. The spike-and-slab is intuitive (a delta-spike spike makes it Bayesian model averaging) but its posterior is sensitive to the slab width and inclusion probability, and inference over the model space is expensive (often needing EP or VI). Continuous shrinkage priors are easy to implement, sample with generic tools (Stan), and can match spike-and-slab performance. This note covers the shared scaffolding; specific members are The Horseshoe Prior and the Regularized Horseshoe (Finnish Horseshoe).

Main Content

Global-local scale mixture

A global-local shrinkage prior on coefficients of the regression , , is a scale mixture of Gaussians

where is the global scale common to all coefficients and is the local scale specific to . The choice of the local mixing density defines the family member.

The shrinkage factor

With uncorrelated predictors (, ), the conditional posterior mean is relative to the MLE , where

means complete shrinkage to zero, means the coefficient is left at its MLE. This expression holds for any scale-mixture-of-Gaussians prior, regardless of — only the implied prior on differs across priors.

Implied prior on (horseshoe)

For the half-Cauchy choice , at fixed the shrinkage factor follows

When this reduces to — the U-shaped “horseshoe” density with spikes at and .

Where the classics sit in -space. The shape of is the cleanest way to compare priors:

  • Ridge / Gaussian ( fixed, i.e. a plain ): all coefficients share one variance, so concentrates at a single interior value — uniform shrinkage of every coefficient, no separation of signal from noise.
  • Lasso / Laplace (double-exponential, ): a single interior mode for ; shrinks moderately and cannot simultaneously leave strong signals unshrunk and crush noise.
  • Horseshoe (half-Cauchy ): -like U-shape — mass at (relevant, no shrinkage, thanks to heavy Cauchy tails) and at (irrelevant, complete shrinkage). This bimodality is exactly the sparse behavior we want. See The Horseshoe Prior.

Changing (equivalently ) tilts the U: small (e.g. ) pushes mass toward (more coefficients shrunk), large pushes toward . Because for fixed the sparsity also depends on dimension , one must reason about all jointly — leading to in Choosing the Global Scale and Effective Nonzeros.

Examples

  • Why scale predictors: , so variables with larger scale are treated as more relevant a priori. Standardize to unless the raw scales genuinely carry relevance information; alternatively absorb the scale into the local prior, .
  • Reading the U-shape: With , piles up near , so a priori most coefficients are expected to be shrunk to zero — the sparse regime.

Connections

See Also