Likelihood-Free ACE and Gradient Estimation
Summary
Two extensions that make ACE deployable. (1) Likelihood-free ACE (Theorem 2): when the likelihood is implicit, replacing it with an unnormalized approximation — trained jointly with — still yields a valid EIG lower bound, so design and inference are learned in one optimization. (2) Gradient estimators for : the high-variance score-function (REINFORCE) form, the lower-variance reparameterization form, and Rao–Blackwellization for discrete outcomes.
Overview
ACE/PCE as stated need a pointwise likelihood and gradients of the bound in . Section 3.4 removes the first requirement (implicit models); Section 3.6 supplies the second (how to actually compute and with low variance). Together they let a single SGA loop optimize design, inference network, and (if needed) a likelihood surrogate simultaneously.
Main Content
Likelihood-free ACE
Theorem 2 — Unnormalized likelihood still bounds the EIG (Foster 2020)
Consider a model and inference network . Let be an unnormalized likelihood approximation. Then
expectation over . So one can train (likelihood surrogate), (inference network), and (design) jointly by maximizing a single lower bound — the basis for ACE on implicit-likelihood models such as random-effects models.
This is the contrastive-bound analogue of Foster 2019’s [[Implicit Likelihood Estimator|]], but crucially it preserves the lower-bound property (whereas only bounds the error), so it can be safely maximized over .
Gradient estimation for ACE (Foster 2020 §3.6)
Write for the integrand.
Score-function (REINFORCE) gradient (Eqs. 16–17)
expectation over . Unbiased but high variance.
Reparameterized gradient (Eq. 18)
Introduce noise variables independent of with and . Then
expectation over . Typically much lower variance than the score-function estimator — important for hard design problems.
Rao–Blackwellized gradient for discrete (Eq. 19)
When is discrete, sum over outcomes instead of sampling:
expectation over . Used for the death-process experiment (66 discrete outcomes).
The -gradient is handled analogously — if the contrastive are reparameterizable, use the double-reparameterization of Tucker et al. (2018).
Examples
Where each gradient is used (Foster 2020 §4)
Rao–Blackwellization drives the discrete death process (66 outcomes). Reparameterization is the workhorse for the continuous high-dimensional designs (400-D regression, 100-D docking). Likelihood-free ACE (Theorem 2) is what allows the gradient methods to be applied to implicit-likelihood models like the mixed-effects / CES settings.
Connections
- Extends Adaptive Contrastive Estimation (ACE) and Prior Contrastive Estimation (PCE) to implicit models and supplies their training gradients.
- Bound-preserving analogue of [[Implicit Likelihood Estimator|]] (Foster 2019), improving it from an error bound to a valid EIG lower bound.
- Independent parallel work: Kleinegesse & Gutmann (2020) showed the MINE-style MI bound can likewise be used in implicit settings, collapsing posterior and likelihood approximations into one critic — see Optimization and Gradient Schemes for BED.
See Also
- Adaptive Contrastive Estimation (ACE) — the bound being differentiated/extended
- High-Dimensional Design Applications — the experiments these gradients enable
- Optimization and Gradient Schemes for BED — Rainforth 2023’s view of stochastic-gradient design