Variational NMC Estimator

Summary

The variational nested Monte Carlo (VNMC) estimator $\overset{μ}{^}_{VNMC}$ combines a learned posterior proposal $q_{v} (θ ∣ y, d)$ with importance-sampled NMC. It gives an upper bound $U_{VNMC} (d, L)$ on the EIG that is tight when $q_{v}$ is the true posterior or as the number of inner samples $L \to \infty$ . This is the key property: VNMC is the only one of the four estimators that remains asymptotically consistent even when the variational family does not contain the target — it trades NMC’s slow consistency against variational speed.

Overview

The posterior and marginal estimators are fast but converge to a biased answer if the variational family cannot represent the target. NMC has the opposite profile: unbiased in the limit but slow. VNMC interpolates: use a learned proposal to make NMC efficient, keeping NMC’s asymptotic consistency. Think of it as NMC (Nested Estimation and Nested Monte Carlo) where the inner importance-sampling proposal $q_{v}$ is learned rather than fixed to the prior.

Main Content

Definition: VNMC bound and estimator (Foster 2019, Eqs. 10–11)

The upper bound uses one sample $y, θ_{0}$ from the model and $L$ samples $θ_{1 : L}$ from the proposal:
$EIG (d) \leq U_{VNMC} (d, L) := E [lo g p (y ∣ θ_{0}, d) - lo g \frac{1}{L} ℓ = 1 \sum L \frac{p ( θ _{ℓ} , y ∣ d )}{q _{v} ( θ _{ℓ} ∣ y , d )}],$
expectation over $y, θ_{0} \sim p (y, θ_{0} ∣ d) \prod_{ℓ = 1}^{L} q_{v} (θ_{ℓ} ∣ y, d)$ . The final EIG estimator uses $M ≫ L$ inner samples after training $ϕ$ :
$\overset{μ}{^}_{VNMC} (d) := \frac{1}{N} n = 1 \sum N (lo g p (y_{n} ∣ θ_{n, 0}, d) - lo g \frac{1}{M} m = 1 \sum M \frac{p ( y _{n} , θ _{n, m} ∣ d )}{q _{v} ( θ _{n, m} ∣ y _{n} , d , ϕ _{K} )}) .$

Lemma 1 — Properties of the VNMC bound (Foster 2019)

For any model $p (θ) p (y ∣ θ, d)$ and valid $q_{v} (θ ∣ y, d)$ :

Monotone tightening: $lim_{L \to \infty} U_{VNMC} (d, L) = EIG (d)$ and $U_{VNMC} (d, L_{2}) \leq U_{VNMC} (d, L_{1})$ for $L_{2} \geq L_{1} \geq 1$ .

Exactness: $U_{VNMC} (d, L) = EIG (d) \forall L \geq 1$ iff $q_{v} (θ ∣ y, d) = p (θ ∣ y, d)$ for all $y, θ$ .

Gap as expected KL: $U_{VNMC} (d, L) - EIG (d) = E_{p (y ∣ d)} [KL (\prod_{ℓ = 1}^{L} q_{v} (θ_{ℓ} ∣ y, d) \frac{1}{L} \sum_{ℓ} p (θ_{ℓ} ∣ y, d) \prod_{k \neq = ℓ} q_{v} (θ_{k} ∣ y, d))]$ .

The defining advantage: consistency without a perfect family

Property 1 means we can obtain asymptotically unbiased EIG estimates even for an imperfect $q_{v}$ simply by increasing $L$ . Training: first run $K$ steps of stochastic gradient on $U_{VNMC} (d, L)$ with fixed $L$ (fast, cost $K L$ ); then form the final NMC estimator with $M ≫ L$ (slow refinement, cost $NM$ ), removing residual bias. Standard NMC is the special case where the proposal is naively the prior ( $q_{v} = p (θ)$ ) — it skips the cheap first stage and so needs a far larger budget for the same accuracy.

Cost and rate

Total cost is $T = O (K L + NM)$ . With the $M \propto N$ NMC allocation, $\overset{μ}{^}_{VNMC}$ converges at $O ((NM)^{- 1/3})$ in its second stage — and unlike $\overset{μ}{^}_{post} / \overset{μ}{^}_{marg}$ , it keeps improving past the variational plateau because it removes asymptotic bias (Foster 2019, Fig. 2).

Examples

VNMC pre-training (Foster 2019 §6.2, Fig. 2)

On the A/B-test design point, plotting EIG estimates with $M = N$ and “0 steps” of pre-training corresponds to plain NMC. Spending some budget training $q_{v}$ (125–2500 steps) gives noticeably better estimates, and increasing $N, M$ continues to improve — VNMC does not plateau like the pure variational estimators.

Connections

Bridges NMC (consistent, slow) and the variational estimators (fast, biased). Foster 2019 notes the variational/MC interplay is not analogous to standard inference because the NMC EIG estimator is itself inherently biased.
In Foster 2020, the VNMC upper bound is paired with the ACE lower bound to trap the true EIG when verifying high-dimensional designs.
Property 3 (gap = expected KL of a product proposal) parallels the importance-weighted autoencoder (IWAE) bound structure.

Second Brain

Explorer

Variational NMC Estimator

Variational NMC Estimator

Overview

Main Content

The defining advantage: consistency without a perfect family

Cost and rate

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks