Spike-and-Slab Prior for Covariate Selection
Summary
The spike-and-slab prior enables automatic selection of which control time series to include in the regression component of the BSTS model. The “spike” places point mass at zero (a variable is excluded); the “slab” is a weakly informative Gaussian for included variables. A Gibbs sampler updates the binary inclusion vector jointly with regression coefficients, allowing selection from tens or hundreds of candidates without overfitting.
Overview
When constructing a synthetic control, there may be many candidate predictor series. Including all of them would cause overfitting; pre-selecting a subset requires strong domain knowledge. The spike-and-slab prior provides an automatic Bayesian solution.
Main Content
Definition: Spike-and-Slab Prior
Let be binary inclusion indicators, where if and otherwise. The full prior factorizes as:
Spike (inclusion probability):
where is the prior probability of including predictor .
Slab (Gaussian prior for included coefficients):
Inverse-Gamma prior on error variance:
Setting the Inclusion Prior
Rather than setting individually, the recommended approach is to elicit an expected model size and set:
This scales naturally with (total number of predictors) and avoids having to specify a hierarchical prior.
Special cases:
- : Force predictor into the model
- : Force predictor out of the model
Zellner’s g-Prior for the Slab
The precision matrix in equation (2.10) uses a g-prior (Zellner 1986):
- : number of observations worth of prior weight; default
- : mixing weight between (full correlation structure) and diagonal (independent); default
- : number of observations
Interpretation: observations worth of prior information; is the prior information matrix. The averaging with the diagonal ensures propriety when is not positive definite.
Sufficient Statistics for Posterior Sampling
Given the spike-and-slab structure, the posterior sufficient statistics for are:
These are updated efficiently in the Gibbs sampler by drawing each given (all others fixed).
Computational Efficiency
Each full-conditional evaluates easily because takes only two values. The dimension of matrices in (2.13) is (number of included variables), which is small if the model is truly sparse. Thus even with hundreds of candidates, the algorithm is fast.
Connections
- Used by MCMC Inference for CausalImpact — Gibbs sampler updates
- Part of Bayesian Structural Time-Series Model — applies to the static regression component
- Related to Nonparametric Models Overview — different approach to regularization
- In the advertising application (CausalImpact Empirical Application), expected model size with expected
See Also
- Bayesian Structural Time-Series Model — full model
- MCMC Inference for CausalImpact — how this prior is sampled
- The Horseshoe Prior — continuous global-local alternative to spike-and-slab for sparsity