Convergence Rates and Estimator Selection

Summary

The variational EIG estimators converge at in total cost — matching ordinary Monte Carlo and beating NMC’s when the variational family contains the target. The error decomposes into MC variance (term I, ), optimization gap (term II, ), and an irreducible family-misspecification bias (term III). This note states the convergence theorem, summarizes the empirical bias²/variance comparison, and gives the practical rules for choosing among the four estimators.

Overview

Foster 2019 §4 breaks the total error of a variational EIG estimator into three terms via the triangle inequality, where is the bound (e.g. , , ) and its optimal parameters:

Term I shrinks as (LLN); term II shrinks as (stochastic optimization); term III is a constant removable only by enlarging the variational family (or, for VNMC, by increasing ).

Main Content

Theorem 1 — convergence (Foster 2019, §4)

Let be a measurable space, a convex subset of a finite-dimensional inner-product space, , and measurable. For and , if then . If additionally is the unique minimizer and (Assumption 1) holds, then after steps of Polyak–Ruppert-averaged SGD, , so

Applies directly to , , and (with ); via Lemma 2 to . Total cost , versus NMC’s .

VNMC asymptotic debiasing (Foster 2019, §4)

Because as , term III can be driven to zero without enlarging the family. Train with fixed at rate until the family gap dominates, then increase with so converges at . Total cost : a fast variational stage plus a slower NMC refinement.

Empirical comparison (Foster 2019, Table 2)

Bias² and variance over 5 runs on four benchmarks (lower MSE in bold):

EstimatorA/B testPreferenceMixed effectsExtrapolation
goodgoodbest var
best
goodgood
bestbest
NMC (baseline)poorpoor
Laplace (baseline)best (exact)

Takeaways: Laplace wins only on the Gaussian A/B model where it is exact; all variational estimators outperform NMC; the implicit estimators () win on the implicit-likelihood problems.

Estimator-selection rules (Foster 2019 §5, Table 1)

  1. Dimension. and approximate a distribution over — prefer them when . and approximate a distribution over — prefer when .
  2. Explicit vs implicit likelihood. and require an explicit likelihood; and do not. With an explicit likelihood, prefer over (use the known likelihood).
  3. Consistency vs speed. is the only one guaranteed to converge to the true EIG even when the variational family is wrong — prefer it when compute is not constrained; prefer the others when it is.
  4. Bounds. Pair a lower () and an upper (/) bound to sandwich the true EIG of competing designs.

Examples

Optimal budget split (Foster 2019, Fig. 1d)

Fixing total budget and sweeping the ratio , RMSE is minimized for between roughly 0.5 and 0.9 — i.e. spend a majority of the budget on variational optimization, the rest on the final MC estimate. Setting recovers the theoretical rate (Fig. 1c).

Connections

  • Quantifies the payoff over NMC: vs .
  • Selection rules are referenced throughout Variational BOED - Overview and inherited by Foster 2020, which adds the design gradient.
  • Strong assumptions (Assumption 1 for SGD convergence); in practice converges to a local optimum , adding to term III.

See Also