Nested Estimation and Nested Monte Carlo

Summary

Because the EIG is a nested expectation — an outer expectation over $(θ, y)$ of a log-ratio whose inner term ( $p (y ∣ ξ)$ or $p (θ ∣ y, ξ)$ ) is itself an intractable expectation — it cannot be estimated by conventional Monte Carlo. The standard tool is the nested Monte Carlo (NMC) estimator, which is biased for finite inner sample size $M$ , costs $C = NM$ , and converges only at $O (C^{- 1/3})$ (optimally $M \propto N$ ), far slower than the $O (C^{- 1/2})$ of ordinary MC. This slow rate is the bottleneck that variational, debiasing (MLMC), and gradient methods exist to break.

Overview

The intractability of the EIG (Expected Information Gain) is double: both the posterior $p (θ ∣ y, ξ)$ and the marginal likelihood $p (y ∣ ξ) = E_{p (θ)} [p (y ∣ θ, ξ)]$ are unavailable in closed form. Whichever form of the EIG we use, the integrand contains a term that is both intractable and varies between realizations of $y$ , so we must estimate a fresh inner integral for every outer sample. This is the defining feature of nested estimation and the source of its poor convergence.

Main Content

Definition: Nested Monte Carlo EIG estimator (Rainforth 2023 Eq. 7 / Foster 2019 Eq. 4)

Approximate the inner marginal $p (y_{n} ∣ ξ)$ with an $M$ -sample average over fresh prior draws $θ_{n, m} \sim p (θ)$ :
$\overset{μ}{^}_{NMC} (ξ) := \frac{1}{N} n = 1 \sum N lo g \frac{p ( y _{n} ∣ θ _{n, 0} , ξ )}{\frac{1}{M} \sum _{m = 1}^{M} p ( y _{n} ∣ θ _{n, m} , ξ )}, θ_{n, 0}, θ_{n, m} \sim i.i.d. p (θ), y_{n} \sim p (y ∣ θ_{n, 0}, ξ)$
Total computational cost is $C = NM$ .

Convergence rate of NMC (Rainforth et al. 2018)

The NMC estimator has asymptotic mean-squared error $MSE = O (\frac{a}{N} + \frac{b}{M ^{2}})$ for model-dependent constants $a, b$ , and is consistent as $N, M \to \infty$ (under weak conditions). Balancing the two error terms at fixed budget $C = NM$ gives the optimal allocation $M \propto N$ , yielding an overall rate of
$RMSE = O (C^{- 1/3}) .$
Compare $O (C^{- 1/2})$ for conventional (non-nested) Monte Carlo. Two undesirable properties: (i) NMC is biased for any finite $M$ (a nonlinear $lo g$ of unbiased inner estimates is biased), and (ii) it is expensive because cost scales as $C = NM$ .

Importance-sampled NMC

Replacing the simple inner average with an importance-sampling estimate using a proposal $q (θ ∣ y, ξ)$ improves the constants $a, b$ and reduces finite-sample bias (Rainforth 2023, Eq. 8):

\overset{μ}{^}_{NMC, q} (ξ) := \frac{1}{N} n = 1 \sum N lo g \frac{p ( y _{n} ∣ θ _{n} , ξ )}{\frac{1}{M} \sum _{m = 1}^{M} \frac{p ( y _{n} ∣ θ _{m}^{'} , ξ ) p ( θ _{m}^{'} )}{q ( θ _{m}^{'} ∣ y _{n} )}}, θ_{m}^{'} \sim q (θ_{m}^{'} ∣ y_{n}) .

Learning a good amortized proposal $q$ is precisely what the variational NMC estimator does — standard NMC is the special case $q = p (θ)$ .

Two routes past the $O (C^{- 1/3})$ wall

The review (Rainforth 2023 §3) frames modern progress as two complementary families:

Debiasing schemes (Multi-Level Monte Carlo). Goda et al. (2022) express the EIG as a telescoping sum of NMC estimators and use randomized MLMC with antithetic coupling to produce a fully unbiased, finite-variance estimator of the EIG and its gradient. With a randomization distribution $r (ℓ) \propto 2^{- τ ℓ}$ ( $1 < τ < 2$ ), it recovers the standard $O (C^{- 1/2})$ rate and removes the variational family’s approximation error — at higher per-sample cost. See The Computational Revolution in EIG Estimation.
Functional / variational approximation. Learn an amortized approximation to the intractable density ( $p (y ∣ ξ)$ or $p (θ ∣ y, ξ)$ ) once and reuse it across outcomes, sharing information instead of re-estimating per $y$ . This drops the cost from $O (NM)$ to $O (N + M)$ and yields $O (T^{- 1/2})$ estimators (Variational BOED - Overview). A learned normalized approximation also automatically gives a variational bound on the EIG.

Examples

Why the $lo g$ makes NMC biased

For fixed $y_{n}$ , $\frac{1}{M} \sum_{m} p (y_{n} ∣ θ_{n, m}, ξ)$ is an unbiased estimate of $p (y_{n} ∣ ξ)$ . But $E [lo g (\overset{p}{^})] \leq lo g (E [\overset{p}{^}]) = lo g p (y_{n} ∣ ξ)$ by Jensen’s inequality, so the inner estimate is negatively biased in $lo g$ -space and the overall NMC EIG is biased upward. The bias is $O (1/ M)$ and vanishes only as $M \to \infty$ — the root cause of both the slow rate and the need for $M \propto N$ .

Connections

Motivates every fast estimator in this topic: Variational Posterior Estimator (Barber-Agakov), Variational Marginal Estimator, Variational NMC Estimator, and the contrastive bounds Adaptive Contrastive Estimation (ACE) / Prior Contrastive Estimation (PCE).
Shared machinery with general nested-expectation problems and with simulation-based inference more broadly (Introduction to Bayesian Computation).
The contrastive bounds use a finite number of inner (“contrastive”) samples on purpose — turning the NMC bias into a controlled bound rather than an error to be eliminated.

Second Brain

Explorer

Nested Estimation and Nested Monte Carlo

Nested Estimation and Nested Monte Carlo

Overview

Main Content

Importance-sampled NMC

Two routes past the $O (C^{- 1/3})$ wall

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks

Second Brain

Explorer

Nested Estimation and Nested Monte Carlo

Nested Estimation and Nested Monte Carlo

Overview

Main Content

Importance-sampled NMC

Two routes past the O(C−1/3) wall

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks

Two routes past the $O (C^{- 1/3})$ wall