Variational Posterior Estimator (Barber-Agakov)

Summary

The variational posterior estimator $\overset{μ}{^}_{post}$ learns an amortized approximation $q_{p} (θ ∣ y, d)$ to the true posterior and substitutes it into the EIG. It yields a lower bound $L_{post} (d) \leq EIG (d)$ , tight iff $q_{p}$ equals the true posterior. This is the Barber–Agakov (BA) mutual-information bound, here connected to experimental design for the first time. Best when $dim (θ) ≪ dim (y)$ .

Overview

The posterior form of the EIG (Expected Information Gain) needs the intractable posterior $p (θ ∣ y, d)$ . Rather than run inference separately for every outcome $y$ (as NMC effectively does), we amortize the inner expectation: learn one parametric family $q_{p} (θ ∣ y, d, ϕ)$ that maps any outcome $y$ to an approximate posterior, then reuse it everywhere.

Main Content

Definition: Variational posterior estimator (Foster 2019, Eqs. 6–7)

Define the lower bound and its Monte Carlo estimator:
$EIG (d) \geq L_{post} (d) := E_{p (y, θ ∣ d)} [lo g \frac{q _{p} ( θ ∣ y , d )}{p ( θ )}] \approx \overset{μ}{^}_{post} (d) := \frac{1}{N} n = 1 \sum N lo g \frac{q _{p} ( θ _{n} ∣ y _{n} , d )}{p ( θ _{n} )},$
with $y_{n}, θ_{n} \sim i.i.d. p (y, θ ∣ d)$ (sample $θ \sim p (θ)$ then $y \sim p (y ∣ θ, d)$ ).

Lower-bound property and tightness (Foster 2019, Appendix A)

$L_{post} (d)$ is a lower bound on the EIG, $EIG (d) \geq L_{post} (d)$ , with equality iff $q_{p} (θ ∣ y, d) = p (θ ∣ y, d)$ for almost all $y$ . The gap is the expected KL divergence from the true posterior to the approximation:
$EIG (d) - L_{post} (d) = E_{p (y ∣ d)} [KL (p (θ ∣ y, d) ∥ q_{p} (θ ∣ y, d))] \geq 0.$
This is the Barber–Agakov (2003) bound, originally for mutual information over noisy channels; the connection to experiment design had not previously been made.

Training the variational parameters

Optimize $ϕ$ by maximizing the bound — no reparameterization needed because $p (y, θ ∣ d)$ does not depend on $ϕ$ (Foster 2019, Eq. 8):

ϕ^{\*} = ar g ϕ max E_{p (y, θ ∣ d)} [lo g \frac{q _{p} ( θ ∣ y , d , ϕ )}{p ( θ )}], \nabla_{ϕ} L_{post} \approx \frac{1}{S} i = 1 \sum S \nabla_{ϕ} lo g q_{p} (θ_{i} ∣ y_{i}, d, ϕ) .

Maximizing $L_{post}$ is equivalent to minimizing the expected forward KL $E_{p (y ∣ d)} [KL (p (θ ∣ y, d) ∥ q_{p})]$ — i.e. learning an amortized proposal by moment-matching the true posterior.

When to use it

Because $\overset{μ}{^}_{post}$ amortizes a distribution over $θ$ , it is preferable when $dim (θ) ≪ dim (y)$ (a simpler density-estimation target than $\overset{μ}{^}_{marg}$ ). Empirically (Foster 2019, Fig. 1) it substantially outperforms the marginal estimator on the A/B-test benchmark, plausibly because $θ$ is lower-dimensional than $y$ there.

Examples

Sequential BOED form (Foster 2019, Eq. 14)

In iterated design the prior $p (θ)$ becomes the running posterior $p (θ ∣ d_{1 : t - 1}, y_{1 : t - 1})$ . Substituting and dropping the design-independent constant $lo g p (y_{1 : t - 1} ∣ d_{1 : t - 1})$ gives a usable bound — but note $\overset{μ}{^}_{post}$ requires the density of the running posterior (a limitation the marginal/implicit estimators avoid).

Connections

Is the one-stage ACE bound’s predecessor: Foster 2020’s $I_{B A}$ is exactly this bound, now optimized jointly over design and variational parameters; ACE then improves on it by adding contrastive samples.
Lower bound — pairs with the upper bounds $\overset{μ}{^}_{marg}$ / $\overset{μ}{^}_{VNMC}$ to sandwich the EIG.
Same KL-gap structure appears in the ELBO / variational inference generally.

Second Brain

Explorer

Variational Posterior Estimator (Barber-Agakov)

Variational Posterior Estimator (Barber-Agakov)

Overview

Main Content

Training the variational parameters

When to use it

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks