Variational Posterior Estimator (Barber-Agakov)

Summary

The variational posterior estimator learns an amortized approximation to the true posterior and substitutes it into the EIG. It yields a lower bound , tight iff equals the true posterior. This is the Barber–Agakov (BA) mutual-information bound, here connected to experimental design for the first time. Best when .

Overview

The posterior form of the EIG (Expected Information Gain) needs the intractable posterior . Rather than run inference separately for every outcome (as NMC effectively does), we amortize the inner expectation: learn one parametric family that maps any outcome to an approximate posterior, then reuse it everywhere.

Main Content

Definition: Variational posterior estimator (Foster 2019, Eqs. 6–7)

Define the lower bound and its Monte Carlo estimator:

with (sample then ).

Lower-bound property and tightness (Foster 2019, Appendix A)

is a lower bound on the EIG, , with equality iff for almost all . The gap is the expected KL divergence from the true posterior to the approximation:

This is the Barber–Agakov (2003) bound, originally for mutual information over noisy channels; the connection to experiment design had not previously been made.

Training the variational parameters

Optimize by maximizing the bound — no reparameterization needed because does not depend on (Foster 2019, Eq. 8):

Maximizing is equivalent to minimizing the expected forward KL — i.e. learning an amortized proposal by moment-matching the true posterior.

When to use it

Because amortizes a distribution over , it is preferable when (a simpler density-estimation target than ). Empirically (Foster 2019, Fig. 1) it substantially outperforms the marginal estimator on the A/B-test benchmark, plausibly because is lower-dimensional than there.

Examples

Sequential BOED form (Foster 2019, Eq. 14)

In iterated design the prior becomes the running posterior . Substituting and dropping the design-independent constant gives a usable bound — but note requires the density of the running posterior (a limitation the marginal/implicit estimators avoid).

Connections

  • Is the one-stage ACE bound’s predecessor: Foster 2020’s is exactly this bound, now optimized jointly over design and variational parameters; ACE then improves on it by adding contrastive samples.
  • Lower bound — pairs with the upper bounds / to sandwich the EIG.
  • Same KL-gap structure appears in the ELBO / variational inference generally.

See Also