Unified SGD BOED - Overview

Summary

Foster et al. (2020), A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments (AISTATS). Replaces the standard two-stage BOED pipeline — estimate the EIG pointwise, then hand it to a separate outer optimizer — with a single stochastic-gradient ascent that simultaneously tightens a variational lower bound on the EIG and optimizes the design . Introduces three lower bounds — BA, ACE (adaptive contrastive estimation), and PCE (prior contrastive estimation) — plus a likelihood-free extension. Because it uses SGA, it scales to high-dimensional designs (100s of dimensions) where gradient-free outer optimizers fail. Recommended default: ACE.

Overview

The problem with two-stage BOED. Existing methods estimate on a point-by-point basis and feed each estimate to an outer optimizer (Bayesian optimization, grid search). This is inefficient: it adds a level of nesting, must re-estimate for every candidate , and typically forces gradient-free optimization that does not scale to high-dimensional designs.

The unified idea. Build a variational lower bound and maximize it jointly over by stochastic gradient ascent. Optimizing tightens the bound (so EIG estimates stay accurate); optimizing moves the design toward high-EIG regions. One loop does both — no outer optimizer, and gradients let it scale.

A lower bound is essential: maximizing over with an upper bound would give an ill-posed max–min problem. Foster 2020 uses lower bounds whose gradients with respect to are tractable expectations over .

Main Content

The three lower bounds

BoundEq.IdeaTight whenNote
(Barber–Agakov)7learn posterior , optimize jointly true posteriorthe one-stage version of Foster 2019’s
(adaptive contrastive)11add contrastive samples to the denominator posterior or Adaptive Contrastive Estimation (ACE)
(prior contrastive)12use the prior to draw contrastive samples (no to learn)Prior Contrastive Estimation (PCE)

ACE improves on BA by being tight in two ways (good or many contrastive samples), and connects to the InfoNCE bound from representation learning. PCE drops the learned network entirely — cheaper, effective when the prior is a good proposal for .

Key results

  • Theorem 1 (Adaptive Contrastive Estimation (ACE)): is a valid EIG lower bound, monotonically increasing in , exact as , and exact for any if equals the true posterior. Error = an expected KL.
  • Theorem 2 (Likelihood-Free ACE and Gradient Estimation): replacing the likelihood with an unnormalized approximation still gives a valid lower bound — enabling implicit-likelihood models in a single optimization.
  • Gradient estimators for : score-function (REINFORCE), reparameterization, and Rao–Blackwellization for discrete .

Two-stage vs one-stage (the headline comparison)

On a 400-dimensional regression design, the gradient methods (BA/ACE/PCE) achieve roughly double the final EIG of two-stage baselines (Bayesian optimization / random search + VNMC). On biomolecular docking (100-dim) they beat human experts. The advantage grows with dimension.

Examples

Five experiments (Foster 2020 §4)

Death process (2-D epidemiology, EIG surface known) — gradient methods beat BO even in low dimension. Regression (400-D) — ~2× EIG over BO/random search. Advertising (ablation over dimension , analytic EIG) — gradient methods dominate as grows. Biomolecular docking (100-D, Lyu et al. 2019) — ACE beats expert designs. CES (6-D iterated behavioural economics) — ACE/PCE reduce posterior entropy faster than the Foster 2019 marginal+BO baseline. See High-Dimensional Design Applications.

Connections

  • Builds on Foster 2019: is the one-stage ; the VNMC upper bound is reused to verify designs (trap the EIG between ACE-lower and VNMC-upper).
  • Generalized / contextualized by Rainforth 2023, which presents this unified SGA scheme (their Eq. 15) as the turning point that made gradient-based EIG optimization consistent.
  • Connects to InfoNCE / contrastive representation learning (PCE ≈ InfoNCE with as the two views).

See Also