Horseshoe and Regularized Horseshoe Priors

Summary

The horseshoe is a global-local shrinkage prior for sparse high-dimensional regression (Carvalho, Polson & Scott 2010). Each coefficient gets its own heavy-tailed local scale $λ_{j}$ multiplied by a global scale $τ$ , so the prior aggressively shrinks noise coefficients toward zero while leaving genuinely large signals essentially unpenalized. The regularized (“Finnish”) horseshoe (Piironen & Vehtari 2017) adds a slab that caps the effective scale of the largest coefficients, fixing the heavy-tail instability of the plain horseshoe in weakly-identified or separable problems and letting the user encode a prior guess at the number of relevant predictors.

Overview

In high-dimensional regression where most coefficients are expected to be (near) zero but a few are large, neither a single Gaussian ridge prior (over-shrinks signals) nor a Laplace/LASSO prior (under-shrinks noise, biases signals) is ideal. Global-local shrinkage priors resolve this by giving every coefficient its own scale, drawn from a heavy-tailed distribution. The horseshoe is the canonical example and a default choice for sparse Bayesian regression. This note supports Bayesian Linear Regression, where it appears as the state-of-the-art shrinkage option.

Main Content

Definition: Horseshoe prior (Carvalho, Polson & Scott 2010)

For regression coefficients $β_{j}$ , $j = 1, \dots, D$ :
$β_{j} ∣ λ_{j}, τ \sim N (0, λ_{j}^{2} τ^{2}), λ_{j} \sim C^{+} (0, 1), τ \sim C^{+} (0, τ_{0}),$
where $C^{+}$ is the half-Cauchy distribution, $λ_{j}$ is the local (per-coefficient) scale, and $τ$ is the global scale shared across coefficients.

Name / intuition. With $τ = 1$ and unit noise, the shrinkage factor $κ_{j} = 1/ (1 + λ_{j}^{2})$ has a $Beta (1/2, 1/2)$ distribution — a symmetric “horseshoe” shape with mass piled at $κ_{j} \approx 0$ (no shrinkage, signal kept) and $κ_{j} \approx 1$ (total shrinkage, noise killed). The half-Cauchy tails on $λ_{j}$ allow arbitrarily large signals to escape shrinkage; the global $τ$ controls overall sparsity.

Definition: Regularized (Finnish) horseshoe (Piironen & Vehtari 2017)

Replace the local scale $λ_{j}$ with a slab-truncated version $\tilde{λ}_{j}$ :
$β_{j} ∣ λ_{j}, τ, c \sim N (0, \tilde{λ}_{j}^{2} τ^{2}), \tilde{λ}_{j}^{2} = \frac{c ^{2} λ _{j}^{2}}{c ^{2} + τ ^{2} λ _{j}^{2}}, c^{2} \sim Inv-Gamma (ν /2, ν s^{2} /2) .$
For coefficients far below the slab scale ( $τ λ_{j} ≪ c$ ) this reduces to the ordinary horseshoe; for coefficients far above it ( $τ λ_{j} ≫ c$ ) the prior tends to $N (0, c^{2})$ — a Gaussian slab of width $c$ that regularizes otherwise unbounded large coefficients.

Choosing the global scale. Piironen & Vehtari recommend setting $τ_{0}$ from a prior guess $p_{0}$ of the number of relevant predictors:
$τ_{0} = \frac{p _{0}}{D - p _{0}} \cdot \frac{σ}{n},$
where $D$ is the number of predictors, $n$ the sample size, and $σ$ the noise scale. This turns vague sparsity beliefs into a calibrated prior.

Why “regularized”

The plain horseshoe’s Cauchy tails place no bound on the largest coefficients. In well-identified problems this is harmless, but under weak identification or separation (e.g. logistic regression with quasi-separable data) the unbounded tails cause poorly-behaved, hard-to-sample posteriors. The slab term $c$ caps the effective prior variance of large signals, restoring stable, geometry-friendly posteriors while preserving the horseshoe’s sharp noise-vs-signal separation.

Connections

A shrinkage prior for Bayesian Linear Regression — contrasts with ridge (Gaussian, over-shrinks signals) and Bayesian LASSO (Laplace, under-shrinks noise).
Conceptually a continuous relaxation of spike-and-slab variable selection; compare the Spike-and-Slab Prior for Covariate Selection used in structural time-series.
Global-local scale mixtures are estimated with the same HMC/NUTS machinery as other hierarchical priors (Efficient MCMC, Hierarchical Models); non-centered parameterization of $λ_{j}, τ$ is usually needed for good geometry.

Second Brain

Explorer

Horseshoe and Regularized Horseshoe Priors

Horseshoe and Regularized Horseshoe Priors

Overview

Main Content

Why “regularized”

Connections

See Also

Graph View

Table of Contents