Nested Estimation and Nested Monte Carlo
Summary
Because the EIG is a nested expectation — an outer expectation over of a log-ratio whose inner term ( or ) is itself an intractable expectation — it cannot be estimated by conventional Monte Carlo. The standard tool is the nested Monte Carlo (NMC) estimator, which is biased for finite inner sample size , costs , and converges only at (optimally ), far slower than the of ordinary MC. This slow rate is the bottleneck that variational, debiasing (MLMC), and gradient methods exist to break.
Overview
The intractability of the EIG (Expected Information Gain) is double: both the posterior and the marginal likelihood are unavailable in closed form. Whichever form of the EIG we use, the integrand contains a term that is both intractable and varies between realizations of , so we must estimate a fresh inner integral for every outer sample. This is the defining feature of nested estimation and the source of its poor convergence.
Main Content
Definition: Nested Monte Carlo EIG estimator (Rainforth 2023 Eq. 7 / Foster 2019 Eq. 4)
Approximate the inner marginal with an -sample average over fresh prior draws :
Total computational cost is .
Convergence rate of NMC (Rainforth et al. 2018)
The NMC estimator has asymptotic mean-squared error for model-dependent constants , and is consistent as (under weak conditions). Balancing the two error terms at fixed budget gives the optimal allocation , yielding an overall rate of
Compare for conventional (non-nested) Monte Carlo. Two undesirable properties: (i) NMC is biased for any finite (a nonlinear of unbiased inner estimates is biased), and (ii) it is expensive because cost scales as .
Importance-sampled NMC
Replacing the simple inner average with an importance-sampling estimate using a proposal improves the constants and reduces finite-sample bias (Rainforth 2023, Eq. 8):
Learning a good amortized proposal is precisely what the variational NMC estimator does — standard NMC is the special case .
Two routes past the wall
The review (Rainforth 2023 §3) frames modern progress as two complementary families:
- Debiasing schemes (Multi-Level Monte Carlo). Goda et al. (2022) express the EIG as a telescoping sum of NMC estimators and use randomized MLMC with antithetic coupling to produce a fully unbiased, finite-variance estimator of the EIG and its gradient. With a randomization distribution (), it recovers the standard rate and removes the variational family’s approximation error — at higher per-sample cost. See The Computational Revolution in EIG Estimation.
- Functional / variational approximation. Learn an amortized approximation to the intractable density ( or ) once and reuse it across outcomes, sharing information instead of re-estimating per . This drops the cost from to and yields estimators (Variational BOED - Overview). A learned normalized approximation also automatically gives a variational bound on the EIG.
Examples
Why the makes NMC biased
For fixed , is an unbiased estimate of . But by Jensen’s inequality, so the inner estimate is negatively biased in -space and the overall NMC EIG is biased upward. The bias is and vanishes only as — the root cause of both the slow rate and the need for .
Connections
- Motivates every fast estimator in this topic: Variational Posterior Estimator (Barber-Agakov), Variational Marginal Estimator, Variational NMC Estimator, and the contrastive bounds Adaptive Contrastive Estimation (ACE) / Prior Contrastive Estimation (PCE).
- Shared machinery with general nested-expectation problems and with simulation-based inference more broadly (Introduction to Bayesian Computation).
- The contrastive bounds use a finite number of inner (“contrastive”) samples on purpose — turning the NMC bias into a controlled bound rather than an error to be eliminated.
See Also
- Expected Information Gain — the nested expectation being estimated
- The Computational Revolution in EIG Estimation — debiasing (MLMC) vs variational, side by side
- Variational NMC Estimator — NMC with a learned proposal; asymptotically consistent
- Convergence Rates and Estimator Selection — the guarantee that beats NMC