Adaptive Contrastive Estimation (ACE)
Summary
Adaptive Contrastive Estimation (ACE) is Foster 2020’s recommended EIG lower bound. It augments the Barber–Agakov bound with contrastive samples in the denominator, alongside the original sample from which was drawn. The resulting bound is tight in two complementary regimes — when the inference network is good, or when — and is optimized jointly over design and parameters by stochastic gradient ascent.
Overview
The Barber–Agakov bound is tight only if the inference network can represent the true posterior. When it cannot, is loose. ACE fixes this adaptively: it borrows VNMC’s idea of contrastive/importance samples, but arranges them to keep a valid lower bound (VNMC is an upper bound, unusable for joint maximization over ). Including the original sample in the denominator prevents the catastrophic under-estimation of that pure contrastive samples would cause.
Main Content
Definition: ACE lower bound (Foster 2020, Eq. 11)
With , , and contrastive samples :
expectation over . The denominator is a self-normalized importance estimate of the marginal using the contrasts plus .
Theorem 1 — Properties of ACE (Foster 2020)
For any model and inference network :
- Lower bound with KL error: , where .
- Asymptotic exactness: .
- Monotone in : for .
- Exact with perfect network: if then for all .
Why ACE beats BA
(Foster 2020, Eq. 7) is recovered as the case. ACE adds a second route to tightness: even a poor gives an accurate bound if is large (property 2). This is the same adaptive-tightening logic as VNMC, but oriented to give a lower bound suitable for joint maximization. Foster 2020 reports that across all five experiments ACE generally does at least as well as the better of BA and PCE, hence the recommendation to use it as the default.
Connection to InfoNCE
ACE generalizes the InfoNCE mutual-information bound of representation learning: with and , the contrastive denominator is the InfoNCE critic. PCE (Prior Contrastive Estimation (PCE)) is the special case using the prior as the contrastive distribution.
Examples
Death process trajectory (Foster 2020 §4.2, Figs. 1–2)
On the 2-D death-process design (, measure infected counts at two times), ACE’s SGA trajectory climbs the known EIG surface to the optimum, reaching final EIG — beating BA (0.9822), PCE (0.9822), and Bayesian optimization + NMC (0.9732), and converging faster in wall-clock time.
Connections
- Improves BA ( special case) by adaptive contrastive tightening.
- Lower-bound dual of VNMC (which is the analogous upper bound); pairing ACE-lower with VNMC-upper traps the true EIG to verify designs (High-Dimensional Design Applications).
- Specializes to Prior Contrastive Estimation (PCE) when is replaced by the prior; both are gradient-optimized via Likelihood-Free ACE and Gradient Estimation.
See Also
- Prior Contrastive Estimation (PCE) — the no-learning contrastive variant
- Likelihood-Free ACE and Gradient Estimation — Theorem 2 and the estimators
- Variational NMC Estimator — the upper-bound counterpart from Foster 2019