Unified SGD BOED - Overview

Summary

Foster et al. (2020), A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments (AISTATS). Replaces the standard two-stage BOED pipeline — estimate the EIG pointwise, then hand it to a separate outer optimizer — with a single stochastic-gradient ascent that simultaneously tightens a variational lower bound on the EIG and optimizes the design $ξ$ . Introduces three lower bounds — BA, ACE (adaptive contrastive estimation), and PCE (prior contrastive estimation) — plus a likelihood-free extension. Because it uses SGA, it scales to high-dimensional designs (100s of dimensions) where gradient-free outer optimizers fail. Recommended default: ACE.

Overview

The problem with two-stage BOED. Existing methods estimate $I (ξ) = EIG (ξ)$ on a point-by-point basis and feed each estimate to an outer optimizer (Bayesian optimization, grid search). This is inefficient: it adds a level of nesting, must re-estimate $I (ξ)$ for every candidate $ξ$ , and typically forces gradient-free optimization that does not scale to high-dimensional designs.

The unified idea. Build a variational lower bound $L (ξ, ϕ) \leq I (ξ)$ and maximize it jointly over $(ξ, ϕ)$ by stochastic gradient ascent. Optimizing $ϕ$ tightens the bound (so EIG estimates stay accurate); optimizing $ξ$ moves the design toward high-EIG regions. One loop does both — no outer optimizer, and gradients let it scale.

A lower bound is essential: maximizing over $(ξ, ϕ)$ with an upper bound would give an ill-posed max–min problem. Foster 2020 uses lower bounds whose gradients with respect to $(ξ, ϕ)$ are tractable expectations over $p (θ) p (y ∣ θ, ξ)$ .

Main Content

The three lower bounds

Bound	Eq.	Idea	Tight when	Note
$I_{B A}$ (Barber–Agakov)	7	learn posterior $q_{ϕ} (θ ∣ y)$ , optimize $(ξ, ϕ)$ jointly	$q_{ϕ} =$ true posterior	the one-stage version of Foster 2019’s $\overset{μ}{^}_{post}$
$I_{A CE}$ (adaptive contrastive)	11	add $L$ contrastive samples $θ_{1 : L} \sim q_{ϕ}$ to the denominator	$q_{ϕ} =$ posterior or $L \to \infty$	Adaptive Contrastive Estimation (ACE)
$I_{PCE}$ (prior contrastive)	12	use the prior $p (θ)$ to draw contrastive samples (no $ϕ$ to learn)	$L \to \infty$	Prior Contrastive Estimation (PCE)

ACE improves on BA by being tight in two ways (good $q_{ϕ}$ or many contrastive samples), and connects to the InfoNCE bound from representation learning. PCE drops the learned network entirely — cheaper, effective when the prior is a good proposal for $p (y ∣ ξ)$ .

Key results

Theorem 1 (Adaptive Contrastive Estimation (ACE)): $I_{A CE}$ is a valid EIG lower bound, monotonically increasing in $L$ , exact as $L \to \infty$ , and exact for any $L$ if $q_{ϕ}$ equals the true posterior. Error = an expected KL.
Theorem 2 (Likelihood-Free ACE and Gradient Estimation): replacing the likelihood with an unnormalized approximation $f_{ψ} (θ, y) \geq 0$ still gives a valid lower bound — enabling implicit-likelihood models in a single optimization.
Gradient estimators for $\partial I_{A CE} / \partial ξ$ : score-function (REINFORCE), reparameterization, and Rao–Blackwellization for discrete $y$ .

Two-stage vs one-stage (the headline comparison)

On a 400-dimensional regression design, the gradient methods (BA/ACE/PCE) achieve roughly double the final EIG of two-stage baselines (Bayesian optimization / random search + VNMC). On biomolecular docking (100-dim) they beat human experts. The advantage grows with dimension.

Examples

Five experiments (Foster 2020 §4)

Death process (2-D epidemiology, EIG surface known) — gradient methods beat BO even in low dimension. Regression (400-D) — ~2× EIG over BO/random search. Advertising (ablation over dimension $D$ , analytic EIG) — gradient methods dominate as $D$ grows. Biomolecular docking (100-D, Lyu et al. 2019) — ACE beats expert designs. CES (6-D iterated behavioural economics) — ACE/PCE reduce posterior entropy faster than the Foster 2019 marginal+BO baseline. See High-Dimensional Design Applications.

Connections

Builds on Foster 2019: $I_{B A}$ is the one-stage $\overset{μ}{^}_{post}$ ; the VNMC upper bound is reused to verify designs (trap the EIG between ACE-lower and VNMC-upper).
Generalized / contextualized by Rainforth 2023, which presents this unified SGA scheme (their Eq. 15) as the turning point that made gradient-based EIG optimization consistent.
Connects to InfoNCE / contrastive representation learning (PCE ≈ InfoNCE with $θ, y$ as the two views).

Second Brain

Explorer

Unified SGD BOED - Overview

Unified SGD BOED - Overview

Overview

Main Content

The three lower bounds

Key results

Two-stage vs one-stage (the headline comparison)

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks