The Horseshoe Prior
Summary
The horseshoe prior (Carvalho, Polson & Scott, 2009/2010) is the global-local shrinkage prior with half-Cauchy local scales . Its name comes from the , horseshoe-shaped density it induces on the shrinkage factor : mass piles up at (signals untouched) and (noise crushed). The heavy Cauchy tails give tail-robustness — strong signals are barely shrunk — which is usually an asset but becomes a liability under weak likelihoods.
Overview
The horseshoe is the most prominent member of the Global-Local Shrinkage Priors family and the starting point for the Regularized Horseshoe (Finnish Horseshoe). It has shown performance comparable to the gold-standard Spike-and-Slab Prior for Covariate Selection across many examples while remaining a simple continuous prior that samples in Stan. This note covers its definition, the horseshoe-shaped density, tail-robustness, and the connection to spike-and-slab.
Main Content
Horseshoe prior
For the linear Gaussian regression , :
The global scale pulls all coefficients toward zero; the thick half-Cauchy tails of allow some coefficients to escape. Large gives diffuse priors with little shrinkage; shrinks all to zero. An intercept , if present, gets a flat prior.
Horseshoe-shaped density on
With , the shrinkage factor follows, at fixed ,
For this is exactly , the symmetric U-shaped “horseshoe” with unbounded density at both and . A priori we therefore expect both relevant variables (, no shrinkage) and irrelevant variables (, complete shrinkage).
Tail-robustness. Because the half-Cauchy local scale has Cauchy tails, the marginal prior on a large has heavy tails too. The shrinkage factor satisfies for large signals, so — strong coefficients are essentially not shrunk. Carvalho–Polson–Scott view this as a key strength: signals are not over-shrunk. The downside (motivating the Regularized Horseshoe (Finnish Horseshoe)) is that there is no way to regularize the largest coefficients: under a weak/flat likelihood (separable logistic regression), the heavy tails let and posterior means can vanish, sharing the pathologies of the Cauchy prior.
Relation to spike-and-slab
Writing the spike-and-slab with as , , the shrinkage factor takes only two values: (slab, prob. ) and (spike, prob. ). Letting puts all mass at and — the discrete analogue of the horseshoe’s continuous U-shape. This is why the two priors perform similarly.
Examples
- intuition: the density is the arcsine distribution; almost all prior mass is near the two endpoints, encoding “each coefficient is either clearly in or clearly out.”
- Default- pitfall: the popular default ignores that depends on and ; it implies an implausibly large effective model size unless is strongly identified by data (see Choosing the Global Scale and Effective Nonzeros).
Connections
- A specific instance of Global-Local Shrinkage Priors (half-Cauchy local scales).
- Extended to fix tail problems by the Regularized Horseshoe (Finnish Horseshoe).
- Its is set via in Choosing the Global Scale and Effective Nonzeros.
- The continuous analogue of Spike-and-Slab Prior for Covariate Selection.
- Prior on regression coefficients of Bayesian Linear Regression; the shared global mirrors Hierarchical Linear Models and the multiplicity control of Partial Pooling as Multiple Comparisons Correction.
See Also
- Global-Local Shrinkage Priors — the parent framework
- Regularized Horseshoe (Finnish Horseshoe) — the slab-regularized fix
- Choosing the Global Scale and Effective Nonzeros — setting
- Spike-and-Slab Prior for Covariate Selection — the discrete gold standard
- Horseshoe and Regularized Horseshoe Priors — overview hub
- Partial Pooling as Multiple Comparisons Correction — shrinkage as multiplicity control