Adaptive Contrastive Estimation (ACE)

Summary

Adaptive Contrastive Estimation (ACE) is Foster 2020’s recommended EIG lower bound. It augments the Barber–Agakov bound with $L$ contrastive samples $θ_{1 : L} \sim q_{ϕ} (θ ∣ y)$ in the denominator, alongside the original sample $θ_{0}$ from which $y$ was drawn. The resulting bound $I_{A CE} (ξ, ϕ, L)$ is tight in two complementary regimes — when the inference network $q_{ϕ}$ is good, or when $L \to \infty$ — and is optimized jointly over design $ξ$ and parameters $ϕ$ by stochastic gradient ascent.

Overview

The Barber–Agakov bound $I_{B A}$ is tight only if the inference network $q_{ϕ} (θ ∣ y)$ can represent the true posterior. When it cannot, $I_{B A}$ is loose. ACE fixes this adaptively: it borrows VNMC’s idea of contrastive/importance samples, but arranges them to keep a valid lower bound (VNMC is an upper bound, unusable for joint maximization over $ξ$ ). Including the original sample $θ_{0}$ in the denominator prevents the catastrophic under-estimation of $p (y ∣ ξ)$ that pure contrastive samples would cause.

Main Content

Definition: ACE lower bound (Foster 2020, Eq. 11)

With $θ_{0} \sim p (θ)$ , $y \sim p (y ∣ θ_{0}, ξ)$ , and contrastive samples $θ_{1 : L} \sim q_{ϕ} (θ ∣ y)$ :
$I_{A CE} (ξ, ϕ, L) = E lo g \frac{p ( y ∣ θ _{0} , ξ )}{\frac{1}{L + 1} \sum _{ℓ = 0}^{L} \frac{p ( θ _{ℓ} ) p ( y ∣ θ _{ℓ} , ξ )}{q _{ϕ} ( θ _{ℓ} ∣ y )}},$
expectation over $p (θ_{0}) p (y ∣ θ_{0}, ξ) q_{ϕ} (θ_{1 : L} ∣ y)$ . The denominator is a self-normalized importance estimate of the marginal $p (y ∣ ξ)$ using the contrasts plus $θ_{0}$ .

Theorem 1 — Properties of ACE (Foster 2020)

For any model $p (θ) p (y ∣ θ, ξ)$ and inference network $q_{ϕ} (θ ∣ y)$ :

Lower bound with KL error: $I (ξ) - I_{A CE} (ξ, ϕ, L) = E_{p (y ∣ ξ)} [KL (P (θ_{0 : L} ∣ y) \prod_{ℓ} q_{ϕ} (θ_{ℓ} ∣ y))] \geq 0$ , where $P (θ_{0 : L} ∣ y) = \frac{1}{L + 1} \sum_{ℓ = 0}^{L} p (θ_{ℓ} ∣ y, ξ) \prod_{k \neq = ℓ} q_{ϕ} (θ_{k} ∣ y)$ .

Asymptotic exactness: $lim_{L \to \infty} I_{A CE} (ξ, ϕ, L) = I (ξ)$ .

Monotone in $L$ : $I_{A CE} (ξ, ϕ, L_{2}) \geq I_{A CE} (ξ, ϕ, L_{1})$ for $L_{2} \geq L_{1} \geq 0$ .

Exact with perfect network: if $q_{ϕ} (θ ∣ y) = p (θ ∣ y, ξ)$ then $I_{A CE} (ξ, ϕ, L) = I (ξ)$ for all $L$ .

Why ACE beats BA

$I_{B A}$ (Foster 2020, Eq. 7) is recovered as the $L = 0$ case. ACE adds a second route to tightness: even a poor $q_{ϕ}$ gives an accurate bound if $L$ is large (property 2). This is the same adaptive-tightening logic as VNMC, but oriented to give a lower bound suitable for joint $(ξ, ϕ)$ maximization. Foster 2020 reports that across all five experiments ACE generally does at least as well as the better of BA and PCE, hence the recommendation to use it as the default.

Connection to InfoNCE

ACE generalizes the InfoNCE mutual-information bound of representation learning: with $θ \leftrightarrow x$ and $y \leftrightarrow z$ , the contrastive denominator is the InfoNCE critic. PCE (Prior Contrastive Estimation (PCE)) is the special case using the prior as the contrastive distribution.

Examples

Death process trajectory (Foster 2020 §4.2, Figs. 1–2)

On the 2-D death-process design ( $ξ_{1}, ξ_{2} \geq 0$ , measure infected counts at two times), ACE’s SGA trajectory climbs the known EIG surface to the optimum, reaching final EIG $0.9830 \pm 0.0001$ — beating BA (0.9822), PCE (0.9822), and Bayesian optimization + NMC (0.9732), and converging faster in wall-clock time.

Connections

Improves BA ( $L = 0$ special case) by adaptive contrastive tightening.
Lower-bound dual of VNMC (which is the analogous upper bound); pairing ACE-lower with VNMC-upper traps the true EIG to verify designs (High-Dimensional Design Applications).
Specializes to Prior Contrastive Estimation (PCE) when $q_{ϕ}$ is replaced by the prior; both are gradient-optimized via Likelihood-Free ACE and Gradient Estimation.

Second Brain

Explorer

Adaptive Contrastive Estimation (ACE)

Adaptive Contrastive Estimation (ACE)

Overview

Main Content

Why ACE beats BA

Connection to InfoNCE

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks