Horseshoe and Regularized Horseshoe Priors

Summary

Piironen & Vehtari (2017) fix two long-standing problems with the horseshoe prior for sparse Bayesian regression: (1) there was no principled way to set the global shrinkage scale , and (2) the horseshoe leaves large coefficients completely unregularized, which is harmful under weak likelihoods (e.g. separable logistic regression). Their solutions are the effective number of nonzeros that turns prior beliefs about sparsity () into a concrete prior for , and the regularized (Finnish) horseshoe, which adds a Student- slab of scale to softly cap the largest coefficients.

Overview

This is the hub note for an Obsidian cluster on global-local shrinkage priors. The paper is an extension of Piironen & Vehtari (2017a) and targets regression/classification with many predictors of which only a few are expected to be nonzero.

The four companion notes break the contribution into pieces:

The two main theoretical advances are summarized below; details live in the companion notes.

Main Content

The model is the standard linear Gaussian regression with a horseshoe prior on the coefficients.

Horseshoe prior for linear regression

For , , the horseshoe prior is the global-local scale mixture

where is the global scale (pulls all coefficients toward 0) and the half-Cauchy local scales have heavy tails that let some escape the shrinkage. An intercept gets a relatively flat prior (no reason to shrink it).

Shrinkage factor

Assuming uncorrelated predictors with (so ), the posterior mean satisfies where is the MLE and

is the shrinkage factor: is complete shrinkage to zero, is no shrinkage. As , ; as , .

Regularized (Finnish) horseshoe

Replace the local scale by a slab-truncated version:

When (small coefficient) and we recover the original horseshoe; when (large coefficient) so the prior approaches — a Gaussian slab of width that “soft-truncates” the heavy Cauchy tails. Letting recovers the unregularized horseshoe.

Prior guess for the global scale

If is the prior guess for the number of relevant predictors out of , set the global scale so that the prior mean of equals :

must scale as to keep prior beliefs about consistent — which is exactly why the default is a dubious choice (it ignores and and puts far too much mass on large ).

The paper also shows the regularized horseshoe is the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe corresponds to spike-and-slab with an infinitely wide slab. See Spike-and-Slab Prior for Covariate Selection.

Examples

  • Setting : With predictors, observations, , and a prior guess relevant variables: . This is far from the scale 1 used by the naive default.
  • Logistic regression / separation: When data are separable the likelihood is flat, the MLE diverges, and the Cauchy-tailed horseshoe lets the largest , making posterior means vanish. The slab scale (e.g. via giving a Student- slab) caps this. For binary classification a workable plug-in is , e.g. .

Connections

See Also