Horseshoe and Regularized Horseshoe Priors

Summary

The horseshoe is a global-local shrinkage prior for sparse high-dimensional regression (Carvalho, Polson & Scott 2010). Each coefficient gets its own heavy-tailed local scale multiplied by a global scale , so the prior aggressively shrinks noise coefficients toward zero while leaving genuinely large signals essentially unpenalized. The regularized (“Finnish”) horseshoe (Piironen & Vehtari 2017) adds a slab that caps the effective scale of the largest coefficients, fixing the heavy-tail instability of the plain horseshoe in weakly-identified or separable problems and letting the user encode a prior guess at the number of relevant predictors.

Overview

In high-dimensional regression where most coefficients are expected to be (near) zero but a few are large, neither a single Gaussian ridge prior (over-shrinks signals) nor a Laplace/LASSO prior (under-shrinks noise, biases signals) is ideal. Global-local shrinkage priors resolve this by giving every coefficient its own scale, drawn from a heavy-tailed distribution. The horseshoe is the canonical example and a default choice for sparse Bayesian regression. This note supports Bayesian Linear Regression, where it appears as the state-of-the-art shrinkage option.

Main Content

Definition: Horseshoe prior (Carvalho, Polson & Scott 2010)

For regression coefficients , :

where is the half-Cauchy distribution, is the local (per-coefficient) scale, and is the global scale shared across coefficients.

Name / intuition. With and unit noise, the shrinkage factor has a distribution — a symmetric “horseshoe” shape with mass piled at (no shrinkage, signal kept) and (total shrinkage, noise killed). The half-Cauchy tails on allow arbitrarily large signals to escape shrinkage; the global controls overall sparsity.

Definition: Regularized (Finnish) horseshoe (Piironen & Vehtari 2017)

Replace the local scale with a slab-truncated version :

For coefficients far below the slab scale () this reduces to the ordinary horseshoe; for coefficients far above it () the prior tends to — a Gaussian slab of width that regularizes otherwise unbounded large coefficients.

Choosing the global scale. Piironen & Vehtari recommend setting from a prior guess of the number of relevant predictors:

where is the number of predictors, the sample size, and the noise scale. This turns vague sparsity beliefs into a calibrated prior.

Why “regularized”

The plain horseshoe’s Cauchy tails place no bound on the largest coefficients. In well-identified problems this is harmless, but under weak identification or separation (e.g. logistic regression with quasi-separable data) the unbounded tails cause poorly-behaved, hard-to-sample posteriors. The slab term caps the effective prior variance of large signals, restoring stable, geometry-friendly posteriors while preserving the horseshoe’s sharp noise-vs-signal separation.

Connections

  • A shrinkage prior for Bayesian Linear Regression — contrasts with ridge (Gaussian, over-shrinks signals) and Bayesian LASSO (Laplace, under-shrinks noise).
  • Conceptually a continuous relaxation of spike-and-slab variable selection; compare the Spike-and-Slab Prior for Covariate Selection used in structural time-series.
  • Global-local scale mixtures are estimated with the same HMC/NUTS machinery as other hierarchical priors (Efficient MCMC, Hierarchical Models); non-centered parameterization of is usually needed for good geometry.

See Also