Unified SGD BOED - Overview
Summary
Foster et al. (2020), A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments (AISTATS). Replaces the standard two-stage BOED pipeline — estimate the EIG pointwise, then hand it to a separate outer optimizer — with a single stochastic-gradient ascent that simultaneously tightens a variational lower bound on the EIG and optimizes the design . Introduces three lower bounds — BA, ACE (adaptive contrastive estimation), and PCE (prior contrastive estimation) — plus a likelihood-free extension. Because it uses SGA, it scales to high-dimensional designs (100s of dimensions) where gradient-free outer optimizers fail. Recommended default: ACE.
Overview
The problem with two-stage BOED. Existing methods estimate on a point-by-point basis and feed each estimate to an outer optimizer (Bayesian optimization, grid search). This is inefficient: it adds a level of nesting, must re-estimate for every candidate , and typically forces gradient-free optimization that does not scale to high-dimensional designs.
The unified idea. Build a variational lower bound and maximize it jointly over by stochastic gradient ascent. Optimizing tightens the bound (so EIG estimates stay accurate); optimizing moves the design toward high-EIG regions. One loop does both — no outer optimizer, and gradients let it scale.
A lower bound is essential: maximizing over with an upper bound would give an ill-posed max–min problem. Foster 2020 uses lower bounds whose gradients with respect to are tractable expectations over .
Main Content
The three lower bounds
| Bound | Eq. | Idea | Tight when | Note |
|---|---|---|---|---|
| (Barber–Agakov) | 7 | learn posterior , optimize jointly | true posterior | the one-stage version of Foster 2019’s |
| (adaptive contrastive) | 11 | add contrastive samples to the denominator | posterior or | Adaptive Contrastive Estimation (ACE) |
| (prior contrastive) | 12 | use the prior to draw contrastive samples (no to learn) | Prior Contrastive Estimation (PCE) |
ACE improves on BA by being tight in two ways (good or many contrastive samples), and connects to the InfoNCE bound from representation learning. PCE drops the learned network entirely — cheaper, effective when the prior is a good proposal for .
Key results
- Theorem 1 (Adaptive Contrastive Estimation (ACE)): is a valid EIG lower bound, monotonically increasing in , exact as , and exact for any if equals the true posterior. Error = an expected KL.
- Theorem 2 (Likelihood-Free ACE and Gradient Estimation): replacing the likelihood with an unnormalized approximation still gives a valid lower bound — enabling implicit-likelihood models in a single optimization.
- Gradient estimators for : score-function (REINFORCE), reparameterization, and Rao–Blackwellization for discrete .
Two-stage vs one-stage (the headline comparison)
On a 400-dimensional regression design, the gradient methods (BA/ACE/PCE) achieve roughly double the final EIG of two-stage baselines (Bayesian optimization / random search + VNMC). On biomolecular docking (100-dim) they beat human experts. The advantage grows with dimension.
Examples
Five experiments (Foster 2020 §4)
Death process (2-D epidemiology, EIG surface known) — gradient methods beat BO even in low dimension. Regression (400-D) — ~2× EIG over BO/random search. Advertising (ablation over dimension , analytic EIG) — gradient methods dominate as grows. Biomolecular docking (100-D, Lyu et al. 2019) — ACE beats expert designs. CES (6-D iterated behavioural economics) — ACE/PCE reduce posterior entropy faster than the Foster 2019 marginal+BO baseline. See High-Dimensional Design Applications.
Connections
- Builds on Foster 2019: is the one-stage ; the VNMC upper bound is reused to verify designs (trap the EIG between ACE-lower and VNMC-upper).
- Generalized / contextualized by Rainforth 2023, which presents this unified SGA scheme (their Eq. 15) as the turning point that made gradient-based EIG optimization consistent.
- Connects to InfoNCE / contrastive representation learning (PCE ≈ InfoNCE with as the two views).
See Also
- Adaptive Contrastive Estimation (ACE) — the recommended default bound (Theorem 1)
- Prior Contrastive Estimation (PCE) — the no-network contrastive bound
- Likelihood-Free ACE and Gradient Estimation — implicit likelihoods + gradient estimators
- High-Dimensional Design Applications — the five experiments