The Horseshoe Prior

Summary

The horseshoe prior (Carvalho, Polson & Scott, 2009/2010) is the global-local shrinkage prior with half-Cauchy local scales $λ_{j} \sim C^{+} (0, 1)$ . Its name comes from the $Beta (\frac{1}{2}, \frac{1}{2})$ , horseshoe-shaped density it induces on the shrinkage factor $κ_{j}$ : mass piles up at $κ = 0$ (signals untouched) and $κ = 1$ (noise crushed). The heavy Cauchy tails give tail-robustness — strong signals are barely shrunk — which is usually an asset but becomes a liability under weak likelihoods.

Overview

The horseshoe is the most prominent member of the Global-Local Shrinkage Priors family and the starting point for the Regularized Horseshoe (Finnish Horseshoe). It has shown performance comparable to the gold-standard Spike-and-Slab Prior for Covariate Selection across many examples while remaining a simple continuous prior that samples in Stan. This note covers its definition, the horseshoe-shaped $κ$ density, tail-robustness, and the connection to spike-and-slab.

Main Content

Horseshoe prior

For the linear Gaussian regression $y_{i} = β^{T} x_{i} + ε_{i}$ , $ε_{i} \sim N (0, σ^{2})$ :
$β_{j} ∣ λ_{j}, τ \sim N (0, τ^{2} λ_{j}^{2}), λ_{j} \sim C^{+} (0, 1), j = 1, \dots, D .$
The global scale $τ$ pulls all coefficients toward zero; the thick half-Cauchy tails of $λ_{j}$ allow some coefficients to escape. Large $τ$ gives diffuse priors with little shrinkage; $τ \to 0$ shrinks all $β_{j}$ to zero. An intercept $β_{0}$ , if present, gets a flat prior.

Horseshoe-shaped density on $κ_{j}$

With $λ_{j} \sim C^{+} (0, 1)$ , the shrinkage factor $κ_{j} = (1 + n σ^{- 2} τ^{2} s_{j}^{2} λ_{j}^{2})^{- 1}$ follows, at fixed $τ, σ$ ,
$p (κ_{j} ∣ τ, σ) = \frac{1}{π} \frac{a _{j}}{( a _{j}^{2} - 1 ) κ _{j} + 1} \frac{1}{κ _{j} 1 - κ _{j}}, a_{j} = τ σ^{- 1} n s_{j} .$
For $a_{j} = 1$ this is exactly $Beta (\frac{1}{2}, \frac{1}{2})$ , the symmetric U-shaped “horseshoe” with unbounded density at both $κ_{j} = 0$ and $κ_{j} = 1$ . A priori we therefore expect both relevant variables ( $κ_{j} \approx 0$ , no shrinkage) and irrelevant variables ( $κ_{j} \approx 1$ , complete shrinkage).

Tail-robustness. Because the half-Cauchy local scale has Cauchy tails, the marginal prior on a large $β_{j}$ has heavy tails too. The shrinkage factor satisfies $κ_{j} \to 0$ for large signals, so $\overset{ˉ}{β}_{j} \to \hat{β}_{j}$ — strong coefficients are essentially not shrunk. Carvalho–Polson–Scott view this as a key strength: signals are not over-shrunk. The downside (motivating the Regularized Horseshoe (Finnish Horseshoe)) is that there is no way to regularize the largest coefficients: under a weak/flat likelihood (separable logistic regression), the heavy tails let $∣ β_{j} ∣ \to \infty$ and posterior means can vanish, sharing the pathologies of the Cauchy prior.

Relation to spike-and-slab

Writing the spike-and-slab with $ε = 0$ as $β_{j} ∣ λ_{j}, c \sim N (0, c^{2} λ_{j}^{2})$ , $λ_{j} \sim Ber (π)$ , the shrinkage factor takes only two values: $κ_{j} = (1 + n σ^{- 2} s_{j}^{2} c^{2})^{- 1}$ (slab, prob. $π$ ) and $κ_{j} = 1$ (spike, prob. $1 - π$ ). Letting $c \to \infty$ puts all mass at $κ = 0$ and $κ = 1$ — the discrete analogue of the horseshoe’s continuous U-shape. This is why the two priors perform similarly.

Examples

$Beta (\frac{1}{2}, \frac{1}{2})$ intuition: the density $\propto κ^{- 1/2} (1 - κ)^{- 1/2}$ is the arcsine distribution; almost all prior mass is near the two endpoints, encoding “each coefficient is either clearly in or clearly out.”
Default- $τ$ pitfall: the popular default $τ \sim C^{+} (0, 1)$ ignores that $a_{j}$ depends on $n$ and $σ$ ; it implies an implausibly large effective model size unless $τ$ is strongly identified by data (see Choosing the Global Scale and Effective Nonzeros).

Connections

A specific instance of Global-Local Shrinkage Priors (half-Cauchy local scales).
Extended to fix tail problems by the Regularized Horseshoe (Finnish Horseshoe).
Its $τ$ is set via $m_{eff}$ in Choosing the Global Scale and Effective Nonzeros.
The continuous analogue of Spike-and-Slab Prior for Covariate Selection.
Prior on regression coefficients of Bayesian Linear Regression; the shared global $τ$ mirrors Hierarchical Linear Models and the multiplicity control of Partial Pooling as Multiple Comparisons Correction.

Second Brain

Explorer

The Horseshoe Prior

The Horseshoe Prior

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks