Variational Posterior Estimator (Barber-Agakov)
Summary
The variational posterior estimator learns an amortized approximation to the true posterior and substitutes it into the EIG. It yields a lower bound , tight iff equals the true posterior. This is the Barber–Agakov (BA) mutual-information bound, here connected to experimental design for the first time. Best when .
Overview
The posterior form of the EIG (Expected Information Gain) needs the intractable posterior . Rather than run inference separately for every outcome (as NMC effectively does), we amortize the inner expectation: learn one parametric family that maps any outcome to an approximate posterior, then reuse it everywhere.
Main Content
Definition: Variational posterior estimator (Foster 2019, Eqs. 6–7)
Define the lower bound and its Monte Carlo estimator:
with (sample then ).
Lower-bound property and tightness (Foster 2019, Appendix A)
is a lower bound on the EIG, , with equality iff for almost all . The gap is the expected KL divergence from the true posterior to the approximation:
This is the Barber–Agakov (2003) bound, originally for mutual information over noisy channels; the connection to experiment design had not previously been made.
Training the variational parameters
Optimize by maximizing the bound — no reparameterization needed because does not depend on (Foster 2019, Eq. 8):
Maximizing is equivalent to minimizing the expected forward KL — i.e. learning an amortized proposal by moment-matching the true posterior.
When to use it
Because amortizes a distribution over , it is preferable when (a simpler density-estimation target than ). Empirically (Foster 2019, Fig. 1) it substantially outperforms the marginal estimator on the A/B-test benchmark, plausibly because is lower-dimensional than there.
Examples
Sequential BOED form (Foster 2019, Eq. 14)
In iterated design the prior becomes the running posterior . Substituting and dropping the design-independent constant gives a usable bound — but note requires the density of the running posterior (a limitation the marginal/implicit estimators avoid).
Connections
- Is the one-stage ACE bound’s predecessor: Foster 2020’s is exactly this bound, now optimized jointly over design and variational parameters; ACE then improves on it by adding contrastive samples.
- Lower bound — pairs with the upper bounds / to sandwich the EIG.
- Same KL-gap structure appears in the ELBO / variational inference generally.
See Also
- Variational Marginal Estimator — the dual (upper) bound targeting
- Adaptive Contrastive Estimation (ACE) — the joint-optimization successor
- Convergence Rates and Estimator Selection — rate and estimator choice