The Computational Revolution in EIG Estimation

Summary

The review’s §3 organizes the recent breakthroughs in EIG estimation into three threads, all aimed at escaping nested Monte Carlo’s biased, $O (C^{- 1/3})$ trap. (1) Debiasing via Multi-Level Monte Carlo (MLMC) — Goda et al. (2022) produce a fully unbiased, finite-variance EIG (and gradient) estimator recovering the $O (C^{- 1/2})$ rate. (2) Functional / variational approximation — learn an amortized approximation to the intractable density; a normalized one automatically gives a variational bound. (3) Implicit-likelihood estimation — bounds that need only samples of $y ∣ θ$ , enabling simulator-based models.

Overview

Whichever form of the EIG we use, we hit a doubly-intractable nested expectation (Nested Estimation and Nested Monte Carlo). The traditional fixes — nested Laplace approximations (biased) and nested Monte Carlo (consistent but slow, biased at finite $M$ , cost $NM$ ) — both have serious drawbacks. The review frames modern progress as two largely complementary families (debiasing vs functional approximation), plus the special case of implicit models.

Main Content

Thread 1 — Debiasing schemes (Multi-Level Monte Carlo)

Goda et al. (2022) unbiased MLMC EIG (Rainforth 2023, Eqs. 9–11)

Express the EIG as the expectation of the $N = 1$ , $M = \infty$ NMC estimator, then write that as a telescoping sum:
$EIG_{θ} (ξ) = E [\overset{μ}{^}_{1, \infty, q}] = E [ℓ = 0 \sum \infty Δ_{ℓ}], Δ_{ℓ} := \overset{μ}{^}_{1, M_{0} 2^{ℓ}, q} - \frac{1}{2} (\overset{μ}{^}_{1, M_{0} 2^{ℓ - 1}, q}^{(a)} + \overset{μ}{^}_{1, M_{0} 2^{ℓ - 1}, q}^{(b)}),$
where the level- $ℓ$ inner samples are split into two antithetically coupled halves $(a), (b)$ . An importance sampler over levels, $r (ℓ) \propto 2^{- τ ℓ}$ with $1 < τ < 2$ , produces an unbiased estimate of the infinite sum from a single sampled term:
$EIG_{θ} (ξ) = E_{ℓ \sim r, Δ_{ℓ}} [Δ_{ℓ} / r (ℓ)] .$
The antithetic coupling gives the estimator (and its $ξ$ -gradient) finite expected variance and cost, recovering the standard (unnested) Monte Carlo rate $O (C^{- 1/2})$ . Cost per sample can still be significant ( $M_{0} 2^{ℓ} + 1$ likelihood evaluations), but it needs no variational family — and so removes any family-misspecification bias.

Thread 2 — Functional and variational approximation

Rather than re-estimate the nested term $p (y ∣ ξ)$ from scratch for each $y$ , exploit its smoothness and learn a functional approximation $q (y ∣ ξ) \approx p (y ∣ ξ)$ , then plug it into the EIG via standard Monte Carlo. Costs become additive ( $O (C^{- 1/2})$ achievable), not multiplicative.

Variational bounds from normalized approximations (Rainforth 2023, Eqs. 12–14)

If $q (y ∣ ξ)$ is a valid normalized density, it produces a variational upper bound: $EIG_{θ} (ξ) \leq E_{p (θ) p (y ∣ θ, ξ)} [lo g p (y ∣ θ, ξ) - lo g q (y ∣ ξ)]$ , equality iff $q = p (y ∣ ξ)$ . An amortized inference network $q (θ ∣ y, ξ) \approx p (θ ∣ y, ξ)$ instead gives a lower bound: $EIG_{θ} (ξ) \geq E_{p (θ) p (y ∣ θ, ξ)} [lo g q (θ ∣ y, ξ) - lo g p (θ)]$ , equality iff $q$ is exact (this is exactly the classical Barber–Agakov MI bound). The expectation of the importance-sampled NMC estimator is itself a variational upper bound $\leq E [\overset{μ}{^}_{1, M, q}]$ , tightenable by increasing $M$ — and the learned $q$ can also serve as the NMC proposal.

These are precisely the Foster 2019 estimators ( $\overset{μ}{^}_{marg}$ ↔ upper, $\overset{μ}{^}_{post}$ ↔ lower) and the ACE / PCE family, since the EIG is a mutual information and any MI bound applies.

Thread 3 — Estimation for implicit models (§3.3.2)

When $y ∣ θ, ξ$ can be sampled but $p (y ∣ θ, ξ)$ cannot be evaluated, an extra intractable term appears. Approaches: approximate that density in isolation; estimate the ratio $p (y ∣ θ, ξ) / p (y ∣ ξ)$ by logistic regression (LFIRE-style); or use variational bounds that allow implicit likelihoods ([[Implicit Likelihood Estimator| $\overset{μ}{^}_{m + ℓ}$ ]], likelihood-free ACE). Implicit priors are easier than implicit likelihoods — formulations based on the likelihood form of the EIG (Eqs. 3, 7, 12) avoid the prior density.

Debiasing vs variational — the trade-off

	MLMC debiasing	Variational/functional
Bias	none (unbiased)	family-misspecification bias (unless $L, M \to \infty$ )
Rate	$O (C^{- 1/2})$	$O (C^{- 1/2})$ (when family contains target)
Per-sample cost	higher ( $M_{0} 2^{ℓ} + 1$ evals)	lower
Needs variational family?	no	yes
Gives a usable gradient?	yes ( $\nabla_{ξ} EIG$ directly)	yes (differentiate the bound)

Connections

Directly extends Nested Estimation and Nested Monte Carlo — the two threads are its two escape routes.
Subsumes Foster 2019 (variational bounds) and feeds Optimization and Gradient Schemes for BED (which uses these bounds’ gradients).
MLMC debiasing is an alternative to variational families that the review notes “has not yet been empirically compared” to them at scale (an open question).

Second Brain

Explorer

The Computational Revolution in EIG Estimation

The Computational Revolution in EIG Estimation

Overview

Main Content

Thread 1 — Debiasing schemes (Multi-Level Monte Carlo)

Thread 2 — Functional and variational approximation

Thread 3 — Estimation for implicit models (§3.3.2)

Debiasing vs variational — the trade-off

Connections

See Also

Graph View

Table of Contents

Backlinks