MCMC Inference for CausalImpact

Summary

Posterior inference in the BSTS model uses a Gibbs sampler that alternates between a data-augmentation step (sampling states given parameters , using the Kalman filter and fast mean smoother) and a parameter-simulation step (sampling given states, with the spike-and-slab Gibbs draw for variable selection). The algorithm is linear in the number of time points and runs in < 30 seconds for typical datasets.

Overview

The posterior is not available in closed form due to the spike-and-slab prior. The Gibbs sampler alternates two steps:

Gibbs Sampler Steps

Step 1 — Data Augmentation (State Simulation)

Sample the full state sequence given parameters and data :

Algorithm: Uses the simulation smoother of Durbin & Koopman (2002), which improves on the earlier forward-filtering, backward-sampling algorithms (Carter & Kohn 1994, Frühwirth-Schnatter 1994).

Key property: Because is jointly multivariate Gaussian, the variance of does not depend on . The sampler:

  1. Generates
  2. Subtracts to get zero-mean noise
  3. Adds (via Kalman filter) to restore the correct mean

Computational complexity: Linear in (total time points, pre + post), quadratic in (state dimension). For , covariates, 10,000 iterations: < 30 seconds.

Step 2 — Parameter Simulation

Sample given states and data.

For variance parameters (, , etc.): Because error terms are available given , the posterior is Gamma by conjugacy (from the inverse-Gamma prior in Eq. 2.7).

For static regression coefficients (, , ): Gibbs sampling from the spike-and-slab posterior (see Spike-and-Slab Prior for Covariate Selection). Each is drawn independently given , then and are drawn using conjugate formulae.

Posterior Predictive Simulation

After fitting the model on pre-intervention data , the key quantity is the posterior predictive distribution over counterfactuals:

This is the distribution of what would have happened had no intervention occurred. It:

  • Is conditioned only on pre-intervention outcomes and all control series (not on parameter estimates)
  • Integrates out all and — no commitment to any particular set of covariates
  • Is a joint distribution over all post-intervention time points (not a collection of marginals) — preserves serial correlation

Sampling: Each Gibbs iteration draws a complete counterfactual trajectory using the Kalman filter run forward through the post-intervention period.

Why Integrating Out Parameters Matters

  • Integrating out and means no arbitrary covariate selection — the posterior predictive averages over all candidate subsets weighted by posterior probability
  • Integrating out means no commitment to point estimates of noise — full propagation of uncertainty
  • The result is wider but properly calibrated uncertainty intervals

Connections

See Also