Variational BOED - Overview

Summary

Foster et al. (2019), Variational Bayesian Optimal Experimental Design (NeurIPS). Introduces four fast variational EIG estimators that sidestep the double intractability of the EIG using amortized variational inference. By learning a functional approximation to an intractable density (posterior or marginal) and reusing it across outcomes, cost drops from NMC’s to and convergence improves from to — matching ordinary Monte Carlo. Two estimators give bounds (lower/upper) that sandwich the EIG; one is asymptotically consistent even with an imperfect variational family; two handle implicit-likelihood models.

Overview

Research question. Estimating the EIG (Expected Information Gain) is the central bottleneck of BOED, especially for real-time sequential experiments. The classic nested Monte Carlo estimator is consistent but prohibitively slow () because it makes a separate inner estimate for every outcome .

Key insight. Amortized variational inference lets us learn a functional approximation — e.g. a map or — once, then evaluate it at many points. Information is shared across outcomes, so the total cost becomes and the rate becomes (proved in Convergence Rates and Estimator Selection).

Contribution. Four estimators, each with distinct advantages, plus a general-purpose implementation in the probabilistic programming system Pyro.

Main Content

The four estimators at a glance (Foster 2019, Table 1)

EstimatorEq.ApproximatesBound on EIGImplicit-likelihood?Consistent?
[[Variational Posterior Estimator (Barber-Agakov)|]]6posterior via lower
[[Variational Marginal Estimator|]]9marginal via upper
[[Variational NMC Estimator|]]11posterior proposal + NMCupper (as )
[[Implicit Likelihood Estimator|]]12marginal and likelihood neither (bounded error)
  • (lower bound, tight iff = true posterior) is the Barber–Agakov bound repurposed for design; best when .
  • (upper bound, tight iff = true marginal) targets ; best when .
  • (upper bound) is the only estimator guaranteed to converge to the true EIG even when the variational family does not contain the target — trading NMC’s consistency against variational speed.
  • handles implicit likelihoods (models with nuisance latents you can sample but not evaluate, e.g. random effects).

Why bounds are useful

Having a lower bound () and an upper bound ( or ) lets you trap the true EIG of a design: if design ‘s lower bound exceeds design ‘s upper bound, is provably better — without ever computing the EIG exactly. This sandwiching is exploited heavily in Foster 2020 for design verification.

Baselines compared against (Foster 2019 §5)

  • NMC (Nested Estimation and Nested Monte Carlo) — the consistent-but-slow reference.
  • Laplace approximation to the posterior — fast but can be badly biased; exact only for Gaussian linear models.
  • LFIRE (Likelihood-Free Inference by Ratio Estimation) — implicit-likelihood baseline.
  • Donsker–Varadhan (DV) representation of the KL — an MI bound included for illustration.

Examples

Four benchmark design problems (Foster 2019 §6.1)

The estimators are validated on A/B testing (Gaussian linear model, explicit likelihood), revealed preference (economics utility model, explicit), mixed effects (item-response psychology, implicit likelihood with nuisance variables), and extrapolation (predict labels in a target region, implicit). All four methods outperform NMC; Laplace wins only on the Gaussian A/B model where it is exact. See Convergence Rates and Estimator Selection for the bias²/variance table.

Connections

  • Builds directly on Nested Estimation and Nested Monte Carlo — these estimators exist to beat NMC’s rate.
  • Repurposes mutual-information bounds from representation learning: = Barber–Agakov; = a variational marginal MI bound; see Poole et al. 2019.
  • Generalized by Foster 2020, which makes these bounds differentiable in the design too, fusing estimation and optimization.

See Also