Likelihood-Free ACE and Gradient Estimation

Summary

Two extensions that make ACE deployable. (1) Likelihood-free ACE (Theorem 2): when the likelihood is implicit, replacing it with an unnormalized approximation — trained jointly with still yields a valid EIG lower bound, so design and inference are learned in one optimization. (2) Gradient estimators for : the high-variance score-function (REINFORCE) form, the lower-variance reparameterization form, and Rao–Blackwellization for discrete outcomes.

Overview

ACE/PCE as stated need a pointwise likelihood and gradients of the bound in . Section 3.4 removes the first requirement (implicit models); Section 3.6 supplies the second (how to actually compute and with low variance). Together they let a single SGA loop optimize design, inference network, and (if needed) a likelihood surrogate simultaneously.

Main Content

Likelihood-free ACE

Theorem 2 — Unnormalized likelihood still bounds the EIG (Foster 2020)

Consider a model and inference network . Let be an unnormalized likelihood approximation. Then

expectation over . So one can train (likelihood surrogate), (inference network), and (design) jointly by maximizing a single lower bound — the basis for ACE on implicit-likelihood models such as random-effects models.

This is the contrastive-bound analogue of Foster 2019’s [[Implicit Likelihood Estimator|]], but crucially it preserves the lower-bound property (whereas only bounds the error), so it can be safely maximized over .

Gradient estimation for ACE (Foster 2020 §3.6)

Write for the integrand.

Score-function (REINFORCE) gradient (Eqs. 16–17)

expectation over . Unbiased but high variance.

Reparameterized gradient (Eq. 18)

Introduce noise variables independent of with and . Then

expectation over . Typically much lower variance than the score-function estimator — important for hard design problems.

Rao–Blackwellized gradient for discrete (Eq. 19)

When is discrete, sum over outcomes instead of sampling:

expectation over . Used for the death-process experiment (66 discrete outcomes).

The -gradient is handled analogously — if the contrastive are reparameterizable, use the double-reparameterization of Tucker et al. (2018).

Examples

Where each gradient is used (Foster 2020 §4)

Rao–Blackwellization drives the discrete death process (66 outcomes). Reparameterization is the workhorse for the continuous high-dimensional designs (400-D regression, 100-D docking). Likelihood-free ACE (Theorem 2) is what allows the gradient methods to be applied to implicit-likelihood models like the mixed-effects / CES settings.

Connections

See Also