Convergence Rates and Estimator Selection
Summary
The variational EIG estimators converge at in total cost — matching ordinary Monte Carlo and beating NMC’s — when the variational family contains the target. The error decomposes into MC variance (term I, ), optimization gap (term II, ), and an irreducible family-misspecification bias (term III). This note states the convergence theorem, summarizes the empirical bias²/variance comparison, and gives the practical rules for choosing among the four estimators.
Overview
Foster 2019 §4 breaks the total error of a variational EIG estimator into three terms via the triangle inequality, where is the bound (e.g. , , ) and its optimal parameters:
Term I shrinks as (LLN); term II shrinks as (stochastic optimization); term III is a constant removable only by enlarging the variational family (or, for VNMC, by increasing ).
Main Content
Theorem 1 — convergence (Foster 2019, §4)
Let be a measurable space, a convex subset of a finite-dimensional inner-product space, , and measurable. For and , if then . If additionally is the unique minimizer and (Assumption 1) holds, then after steps of Polyak–Ruppert-averaged SGD, , so
Applies directly to , , and (with ); via Lemma 2 to . Total cost , versus NMC’s .
VNMC asymptotic debiasing (Foster 2019, §4)
Because as , term III can be driven to zero without enlarging the family. Train with fixed at rate until the family gap dominates, then increase with so converges at . Total cost : a fast variational stage plus a slower NMC refinement.
Empirical comparison (Foster 2019, Table 2)
Bias² and variance over 5 runs on four benchmarks (lower MSE in bold):
| Estimator | A/B test | Preference | Mixed effects | Extrapolation |
|---|---|---|---|---|
| good | good | — | best var | |
| — | best | — | — | |
| good | good | — | — | |
| — | — | best | best | |
| NMC (baseline) | poor | poor | — | — |
| Laplace (baseline) | best (exact) | — | — | — |
Takeaways: Laplace wins only on the Gaussian A/B model where it is exact; all variational estimators outperform NMC; the implicit estimators () win on the implicit-likelihood problems.
Estimator-selection rules (Foster 2019 §5, Table 1)
- Dimension. and approximate a distribution over — prefer them when . and approximate a distribution over — prefer when .
- Explicit vs implicit likelihood. and require an explicit likelihood; and do not. With an explicit likelihood, prefer over (use the known likelihood).
- Consistency vs speed. is the only one guaranteed to converge to the true EIG even when the variational family is wrong — prefer it when compute is not constrained; prefer the others when it is.
- Bounds. Pair a lower () and an upper (/) bound to sandwich the true EIG of competing designs.
Examples
Optimal budget split (Foster 2019, Fig. 1d)
Fixing total budget and sweeping the ratio , RMSE is minimized for between roughly 0.5 and 0.9 — i.e. spend a majority of the budget on variational optimization, the rest on the final MC estimate. Setting recovers the theoretical rate (Fig. 1c).
Connections
- Quantifies the payoff over NMC: vs .
- Selection rules are referenced throughout Variational BOED - Overview and inherited by Foster 2020, which adds the design gradient.
- Strong assumptions (Assumption 1 for SGD convergence); in practice converges to a local optimum , adding to term III.
See Also
- Variational Posterior Estimator (Barber-Agakov) · Variational Marginal Estimator · Variational NMC Estimator · Implicit Likelihood Estimator — the four estimators
- Unified SGD BOED - Overview — the next step: differentiate the bound in the design too