Convergence Rates and Estimator Selection

Summary

The variational EIG estimators converge at $O (T^{- 1/2})$ in total cost $T = O (N + K)$ — matching ordinary Monte Carlo and beating NMC’s $O (T^{- 1/3})$ — when the variational family contains the target. The error decomposes into MC variance (term I, $\propto N^{- 1/2}$ ), optimization gap (term II, $\propto K^{- 1/2}$ ), and an irreducible family-misspecification bias (term III). This note states the convergence theorem, summarizes the empirical bias²/variance comparison, and gives the practical rules for choosing among the four estimators.

Overview

Foster 2019 §4 breaks the total error of a variational EIG estimator $\overset{μ}{^} (d, ϕ_{K})$ into three terms via the triangle inequality, where $B (d, ϕ)$ is the bound (e.g. $L_{post}$ , $U_{marg}$ , $U_{VNMC}$ ) and $ϕ^{\*}$ its optimal parameters:

total error ∥ \overset{μ}{^} (d, ϕ_{K}) - EIG (d) ∥_{2} \leq I: MC variance ∥ \overset{μ}{^} (d, ϕ_{K}) - B (d, ϕ_{K}) ∥_{2} + II: optimization ∥ B (d, ϕ_{K}) - B (d, ϕ^{\*}) ∥_{2} + III: family gap ∣ B (d, ϕ^{\*}) - EIG (d) ∣ .

Term I shrinks as $N^{- 1/2}$ (LLN); term II shrinks as $K^{- 1/2}$ (stochastic optimization); term III is a constant removable only by enlarging the variational family (or, for VNMC, by increasing $L$ ).

Main Content

Theorem 1 — $O (T^{- 1/2})$ convergence (Foster 2019, §4)

Let $X$ be a measurable space, $Φ$ a convex subset of a finite-dimensional inner-product space, $X_{1}, X_{2}, \dots \sim i.i.d. X$ , and $f : X \times Φ \to R$ measurable. For $μ (ϕ) := E [f (X_{1}, ϕ)]$ and $\overset{μ}{^}_{N} (ϕ) := \frac{1}{N} \sum_{n} f (X_{n}, ϕ)$ , if $sup_{ϕ \in Φ} ∥ f (X_{1}, ϕ) ∥_{2} < \infty$ then $sup_{ϕ} ∥ \overset{μ}{^}_{N} (ϕ) - μ (ϕ) ∥_{2} = O (N^{- 1/2})$ . If additionally $ϕ^{\*}$ is the unique minimizer and (Assumption 1) holds, then after $K$ steps of Polyak–Ruppert-averaged SGD, $∥ μ (ϕ_{K}) - μ (ϕ^{\*}) ∥_{2} = O (K^{- 1/2})$ , so
$∥ \overset{μ}{^}_{N} (ϕ_{K}) - μ (ϕ^{\*}) ∥_{2} = O (N^{- 1/2} + K^{- 1/2}) = O (T^{- 1/2}) if N \propto K .$
Applies directly to $\overset{μ}{^}_{marg}$ , $- \overset{μ}{^}_{post}$ , and $\overset{μ}{^}_{VNMC}$ (with $M = L$ ); via Lemma 2 to $\overset{μ}{^}_{m + ℓ}$ . Total cost $T = O (N + K)$ , versus NMC’s $T = O (NM)$ .

VNMC asymptotic debiasing (Foster 2019, §4)

Because $U_{VNMC} (d, L) \to EIG (d)$ as $L \to \infty$ , term III can be driven to zero without enlarging the family. Train $ϕ$ with fixed $L$ at rate $O (K^{- 1/2})$ until the family gap dominates, then increase $N, M$ with $M \propto N$ so $\overset{μ}{^}_{VNMC}$ converges at $O ((NM)^{- 1/3})$ . Total cost $T = O (K L + NM)$ : a fast variational stage plus a slower NMC refinement.

Empirical comparison (Foster 2019, Table 2)

Bias² and variance over 5 runs on four benchmarks (lower MSE in bold):

Estimator	A/B test	Preference	Mixed effects	Extrapolation
$\overset{μ}{^}_{post}$	good	good	—	best var
$\overset{μ}{^}_{marg}$	—	best	—	—
$\overset{μ}{^}_{VNMC}$	good	good	—	—
$\overset{μ}{^}_{m + ℓ}$	—	—	best	best
NMC (baseline)	poor	poor	—	—
Laplace (baseline)	best (exact)	—	—	—

Takeaways: Laplace wins only on the Gaussian A/B model where it is exact; all variational estimators outperform NMC; the implicit estimators ( $\overset{μ}{^}_{m + ℓ}$ ) win on the implicit-likelihood problems.

Estimator-selection rules (Foster 2019 §5, Table 1)

Dimension. $\overset{μ}{^}_{marg}$ and $\overset{μ}{^}_{m + ℓ}$ approximate a distribution over $y$ — prefer them when $dim (y) ≪ dim (θ)$ . $\overset{μ}{^}_{post}$ and $\overset{μ}{^}_{VNMC}$ approximate a distribution over $θ$ — prefer when $dim (θ) ≪ dim (y)$ .
Explicit vs implicit likelihood. $\overset{μ}{^}_{marg}$ and $\overset{μ}{^}_{VNMC}$ require an explicit likelihood; $\overset{μ}{^}_{post}$ and $\overset{μ}{^}_{m + ℓ}$ do not. With an explicit likelihood, prefer $\overset{μ}{^}_{m + ℓ}$ over $\overset{μ}{^}_{marg}$ (use the known likelihood).
Consistency vs speed. $\overset{μ}{^}_{VNMC}$ is the only one guaranteed to converge to the true EIG even when the variational family is wrong — prefer it when compute is not constrained; prefer the others when it is.
Bounds. Pair a lower ( $\overset{μ}{^}_{post}$ ) and an upper ( $\overset{μ}{^}_{marg}$ / $\overset{μ}{^}_{VNMC}$ ) bound to sandwich the true EIG of competing designs.

Examples

Optimal budget split (Foster 2019, Fig. 1d)

Fixing total budget $T = N + K$ and sweeping the ratio $K / T$ , RMSE is minimized for $K / T$ between roughly 0.5 and 0.9 — i.e. spend a majority of the budget on variational optimization, the rest on the final MC estimate. Setting $N \propto K$ recovers the theoretical $O (T^{- 1/2})$ rate (Fig. 1c).

Connections

Quantifies the payoff over NMC: $O (T^{- 1/2})$ vs $O (T^{- 1/3})$ .
Selection rules are referenced throughout Variational BOED - Overview and inherited by Foster 2020, which adds the design gradient.
Strong assumptions (Assumption 1 for SGD convergence); in practice $ϕ$ converges to a local optimum $ϕ^{†}$ , adding $∣ B (d, ϕ^{†}) - B (d, ϕ^{\*}) ∣$ to term III.

Second Brain

Explorer

Convergence Rates and Estimator Selection

Convergence Rates and Estimator Selection

Overview

Main Content

Empirical comparison (Foster 2019, Table 2)

Estimator-selection rules (Foster 2019 §5, Table 1)

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks