Spike-and-Slab Prior for Covariate Selection

Summary

The spike-and-slab prior enables automatic selection of which control time series to include in the regression component of the BSTS model. The “spike” places point mass at zero (a variable is excluded); the “slab” is a weakly informative Gaussian for included variables. A Gibbs sampler updates the binary inclusion vector jointly with regression coefficients, allowing selection from tens or hundreds of candidates without overfitting.

Overview

When constructing a synthetic control, there may be many candidate predictor series. Including all of them would cause overfitting; pre-selecting a subset requires strong domain knowledge. The spike-and-slab prior provides an automatic Bayesian solution.

Main Content

Definition: Spike-and-Slab Prior

Let be binary inclusion indicators, where if and otherwise. The full prior factorizes as:

Spike (inclusion probability):

where is the prior probability of including predictor .

Slab (Gaussian prior for included coefficients):

Inverse-Gamma prior on error variance:

Setting the Inclusion Prior

Rather than setting individually, the recommended approach is to elicit an expected model size and set:

This scales naturally with (total number of predictors) and avoids having to specify a hierarchical prior.

Special cases:

  • : Force predictor into the model
  • : Force predictor out of the model

Zellner’s g-Prior for the Slab

The precision matrix in equation (2.10) uses a g-prior (Zellner 1986):

  • : number of observations worth of prior weight; default
  • : mixing weight between (full correlation structure) and diagonal (independent); default
  • : number of observations

Interpretation: observations worth of prior information; is the prior information matrix. The averaging with the diagonal ensures propriety when is not positive definite.

Sufficient Statistics for Posterior Sampling

Given the spike-and-slab structure, the posterior sufficient statistics for are:

These are updated efficiently in the Gibbs sampler by drawing each given (all others fixed).

Computational Efficiency

Each full-conditional evaluates easily because takes only two values. The dimension of matrices in (2.13) is (number of included variables), which is small if the model is truly sparse. Thus even with hundreds of candidates, the algorithm is fast.

Connections

See Also