Spike-and-Slab Prior for Covariate Selection

Summary

The spike-and-slab prior enables automatic selection of which control time series to include in the regression component of the BSTS model. The “spike” places point mass at zero (a variable is excluded); the “slab” is a weakly informative Gaussian for included variables. A Gibbs sampler updates the binary inclusion vector jointly with regression coefficients, allowing selection from tens or hundreds of candidates without overfitting.

Overview

When constructing a synthetic control, there may be many candidate predictor series. Including all of them would cause overfitting; pre-selecting a subset requires strong domain knowledge. The spike-and-slab prior provides an automatic Bayesian solution.

Main Content

Definition: Spike-and-Slab Prior

Let $ϱ = (ϱ_{1}, \dots, ϱ_{J})$ be binary inclusion indicators, where $ϱ_{j} = 1$ if $β_{j} \neq = 0$ and $ϱ_{j} = 0$ otherwise. The full prior factorizes as:
$p (ϱ, β, 1/ σ_{ε}^{2}) = p (ϱ) \cdot p (σ_{ε}^{2} ∣ ϱ) \cdot p (β_{ϱ} ∣ ϱ, σ_{ε}^{2}) (2.8)$
Spike (inclusion probability):
$p (ϱ) = j = 1 \prod J π_{j}^{ϱ_{j}} (1 - π_{j})^{1 - ϱ_{j}} (2.9)$
where $π_{j}$ is the prior probability of including predictor $j$ .

Slab (Gaussian prior for included coefficients):
$β_{ϱ} ∣ ϱ, σ_{ε}^{2} \sim N (b_{ϱ}, σ_{ε}^{2} (Σ_{ϱ}^{- 1})^{- 1}) (2.10)$
Inverse-Gamma prior on error variance:
$\frac{1}{σ _{ε}^{2}} \sim G (\frac{ν _{ε}}{2}, \frac{s _{ε}}{2}) (2.11)$

Setting the Inclusion Prior $π_{j}$

Rather than setting $π_{j}$ individually, the recommended approach is to elicit an expected model size $M$ and set:

π_{j} = \frac{M}{J}

This scales naturally with $J$ (total number of predictors) and avoids having to specify a hierarchical prior.

Special cases:

$π_{j} = 1$ : Force predictor $j$ into the model
$π_{j} = 0$ : Force predictor $j$ out of the model

Zellner’s g-Prior for the Slab

The precision matrix $Σ^{- 1}$ in equation (2.10) uses a g-prior (Zellner 1986):

Σ^{- 1} = \frac{g}{n} {w X^{⊤} X + (1 - w) diag (X^{⊤} X)} (2.12)

$g$ : number of observations worth of prior weight; default $g = 1$
$w$ : mixing weight between $X^{⊤} X$ (full correlation structure) and diagonal (independent); default $w = 0.5$
$n$ : number of observations

Interpretation: $g$ observations worth of prior information; $g / n \cdot X^{⊤} X$ is the prior information matrix. The averaging with the diagonal ensures propriety when $X^{⊤} X$ is not positive definite.

Sufficient Statistics for Posterior Sampling

Given the spike-and-slab structure, the posterior sufficient statistics for $(ϱ, β, σ_{ε}^{2})$ are:

V_{ϱ}^{- 1} = (X^{⊤} X)_{ϱ} + Σ_{ϱ}^{- 1}, \tilde{β}_{ϱ} = (V_{ϱ}^{- 1})^{- 1} (X_{ϱ}^{⊤} \overset{y}{˙}_{1 : n} + Σ_{ϱ}^{- 1} b_{ϱ}) (2.13)

N = ν_{ε} + n, S_{ϱ} = s_{ε} + \overset{y}{˙}_{1 : n}^{⊤} \overset{y}{˙}_{1 : n} + b_{ϱ}^{⊤} Σ_{ϱ}^{- 1} b_{ϱ} - \tilde{β}_{ϱ}^{⊤} V_{ϱ}^{- 1} \tilde{β}_{ϱ}

These are updated efficiently in the Gibbs sampler by drawing each $ϱ_{j}$ given $ϱ_{- j}$ (all others fixed).

Computational Efficiency

Each full-conditional $p (ϱ_{j} ∣ ϱ_{- j}, \dots)$ evaluates easily because $ϱ_{j}$ takes only two values. The dimension of matrices in (2.13) is $\sum_{j} ϱ_{j}$ (number of included variables), which is small if the model is truly sparse. Thus even with hundreds of candidates, the algorithm is fast.

Connections

Used by MCMC Inference for CausalImpact — Gibbs sampler updates $ϱ$
Part of Bayesian Structural Time-Series Model — applies to the static regression component
Related to Nonparametric Models Overview — different approach to regularization
In the advertising application (CausalImpact Empirical Application), expected model size $M = 3$ with $R^{2} \approx 0.8$ expected

Second Brain

Explorer

Spike-and-Slab Prior for Covariate Selection

Spike-and-Slab Prior for Covariate Selection

Overview

Main Content

Setting the Inclusion Prior $π_{j}$

Zellner’s g-Prior for the Slab

Sufficient Statistics for Posterior Sampling

Computational Efficiency

Connections

See Also

Graph View

Table of Contents

Backlinks

Second Brain

Explorer

Spike-and-Slab Prior for Covariate Selection

Spike-and-Slab Prior for Covariate Selection

Overview

Main Content

Setting the Inclusion Prior πj​

Zellner’s g-Prior for the Slab

Sufficient Statistics for Posterior Sampling

Computational Efficiency

Connections

See Also

Graph View

Table of Contents

Backlinks

Setting the Inclusion Prior $π_{j}$