The Horseshoe Prior

Summary

The horseshoe prior (Carvalho, Polson & Scott, 2009/2010) is the global-local shrinkage prior with half-Cauchy local scales . Its name comes from the , horseshoe-shaped density it induces on the shrinkage factor : mass piles up at (signals untouched) and (noise crushed). The heavy Cauchy tails give tail-robustness — strong signals are barely shrunk — which is usually an asset but becomes a liability under weak likelihoods.

Overview

The horseshoe is the most prominent member of the Global-Local Shrinkage Priors family and the starting point for the Regularized Horseshoe (Finnish Horseshoe). It has shown performance comparable to the gold-standard Spike-and-Slab Prior for Covariate Selection across many examples while remaining a simple continuous prior that samples in Stan. This note covers its definition, the horseshoe-shaped density, tail-robustness, and the connection to spike-and-slab.

Main Content

Horseshoe prior

For the linear Gaussian regression , :

The global scale pulls all coefficients toward zero; the thick half-Cauchy tails of allow some coefficients to escape. Large gives diffuse priors with little shrinkage; shrinks all to zero. An intercept , if present, gets a flat prior.

Horseshoe-shaped density on

With , the shrinkage factor follows, at fixed ,

For this is exactly , the symmetric U-shaped “horseshoe” with unbounded density at both and . A priori we therefore expect both relevant variables (, no shrinkage) and irrelevant variables (, complete shrinkage).

Tail-robustness. Because the half-Cauchy local scale has Cauchy tails, the marginal prior on a large has heavy tails too. The shrinkage factor satisfies for large signals, so — strong coefficients are essentially not shrunk. Carvalho–Polson–Scott view this as a key strength: signals are not over-shrunk. The downside (motivating the Regularized Horseshoe (Finnish Horseshoe)) is that there is no way to regularize the largest coefficients: under a weak/flat likelihood (separable logistic regression), the heavy tails let and posterior means can vanish, sharing the pathologies of the Cauchy prior.

Relation to spike-and-slab

Examples

  • intuition: the density is the arcsine distribution; almost all prior mass is near the two endpoints, encoding “each coefficient is either clearly in or clearly out.”
  • Default- pitfall: the popular default ignores that depends on and ; it implies an implausibly large effective model size unless is strongly identified by data (see Choosing the Global Scale and Effective Nonzeros).

Connections

See Also