Nested Estimation and Nested Monte Carlo

Summary

Because the EIG is a nested expectation — an outer expectation over of a log-ratio whose inner term ( or ) is itself an intractable expectation — it cannot be estimated by conventional Monte Carlo. The standard tool is the nested Monte Carlo (NMC) estimator, which is biased for finite inner sample size , costs , and converges only at (optimally ), far slower than the of ordinary MC. This slow rate is the bottleneck that variational, debiasing (MLMC), and gradient methods exist to break.

Overview

The intractability of the EIG (Expected Information Gain) is double: both the posterior and the marginal likelihood are unavailable in closed form. Whichever form of the EIG we use, the integrand contains a term that is both intractable and varies between realizations of , so we must estimate a fresh inner integral for every outer sample. This is the defining feature of nested estimation and the source of its poor convergence.

Main Content

Definition: Nested Monte Carlo EIG estimator (Rainforth 2023 Eq. 7 / Foster 2019 Eq. 4)

Approximate the inner marginal with an -sample average over fresh prior draws :

Total computational cost is .

Convergence rate of NMC (Rainforth et al. 2018)

The NMC estimator has asymptotic mean-squared error for model-dependent constants , and is consistent as (under weak conditions). Balancing the two error terms at fixed budget gives the optimal allocation , yielding an overall rate of

Compare for conventional (non-nested) Monte Carlo. Two undesirable properties: (i) NMC is biased for any finite (a nonlinear of unbiased inner estimates is biased), and (ii) it is expensive because cost scales as .

Importance-sampled NMC

Replacing the simple inner average with an importance-sampling estimate using a proposal improves the constants and reduces finite-sample bias (Rainforth 2023, Eq. 8):

Learning a good amortized proposal is precisely what the variational NMC estimator does — standard NMC is the special case .

Two routes past the wall

The review (Rainforth 2023 §3) frames modern progress as two complementary families:

  1. Debiasing schemes (Multi-Level Monte Carlo). Goda et al. (2022) express the EIG as a telescoping sum of NMC estimators and use randomized MLMC with antithetic coupling to produce a fully unbiased, finite-variance estimator of the EIG and its gradient. With a randomization distribution (), it recovers the standard rate and removes the variational family’s approximation error — at higher per-sample cost. See The Computational Revolution in EIG Estimation.
  2. Functional / variational approximation. Learn an amortized approximation to the intractable density ( or ) once and reuse it across outcomes, sharing information instead of re-estimating per . This drops the cost from to and yields estimators (Variational BOED - Overview). A learned normalized approximation also automatically gives a variational bound on the EIG.

Examples

Why the makes NMC biased

For fixed , is an unbiased estimate of . But by Jensen’s inequality, so the inner estimate is negatively biased in -space and the overall NMC EIG is biased upward. The bias is and vanishes only as — the root cause of both the slow rate and the need for .

Connections

See Also