Likelihood-Free ACE and Gradient Estimation

Summary

Two extensions that make ACE deployable. (1) Likelihood-free ACE (Theorem 2): when the likelihood $p (y ∣ θ, ξ)$ is implicit, replacing it with an unnormalized approximation $f_{ψ} (θ, y) \geq 0$ — trained jointly with $(ξ, ϕ)$ — still yields a valid EIG lower bound, so design and inference are learned in one optimization. (2) Gradient estimators for $\partial I_{A CE} / \partial ξ$ : the high-variance score-function (REINFORCE) form, the lower-variance reparameterization form, and Rao–Blackwellization for discrete outcomes.

Overview

ACE/PCE as stated need a pointwise likelihood and gradients of the bound in $ξ$ . Section 3.4 removes the first requirement (implicit models); Section 3.6 supplies the second (how to actually compute $\partial I_{A CE} / \partial ξ$ and $\partial I_{A CE} / \partial ϕ$ with low variance). Together they let a single SGA loop optimize design, inference network, and (if needed) a likelihood surrogate simultaneously.

Main Content

Likelihood-free ACE

Theorem 2 — Unnormalized likelihood still bounds the EIG (Foster 2020)

Consider a model $p (θ) p (y ∣ θ, ξ)$ and inference network $q_{ϕ} (θ ∣ y)$ . Let $f_{ψ} (θ, y) \geq 0$ be an unnormalized likelihood approximation. Then
$I (ξ) \geq E lo g \frac{f _{ψ} ( θ _{0} , y )}{\frac{1}{L + 1} \sum _{ℓ = 0}^{L} \frac{p ( θ _{ℓ} ) f _{ψ} ( θ _{ℓ} , y )}{q _{ϕ} ( θ _{ℓ} ∣ y )}},$
expectation over $p (θ_{0}) p (y ∣ θ_{0}, ξ) q_{ϕ} (θ_{1 : L} ∣ y)$ . So one can train $ψ$ (likelihood surrogate), $ϕ$ (inference network), and $ξ$ (design) jointly by maximizing a single lower bound — the basis for ACE on implicit-likelihood models such as random-effects models.

This is the contrastive-bound analogue of Foster 2019’s [[Implicit Likelihood Estimator| $\overset{μ}{^}_{m + ℓ}$ ]], but crucially it preserves the lower-bound property (whereas $\overset{μ}{^}_{m + ℓ}$ only bounds the error), so it can be safely maximized over $ξ$ .

Gradient estimation for ACE (Foster 2020 §3.6)

Write $g (y, θ_{0 : L}, ϕ, ξ) := lo g \frac{p ( y ∣ θ _{0} , ξ )}{\frac{1}{L + 1} \sum _{ℓ = 0}^{L} \frac{p ( θ _{ℓ} ) p ( y ∣ θ _{ℓ} , ξ )}{q _{ϕ} ( θ _{ℓ} ∣ y )}}$ for the integrand.

Score-function (REINFORCE) gradient (Eqs. 16–17)

$\frac{\partial I _{A CE}}{\partial ξ} = E [\frac{\partial g}{\partial ξ} + g \cdot \frac{\partial}{\partial ξ} lo g p (y ∣ θ_{0}, ξ)],$
expectation over $p (θ_{0}) p (y ∣ θ_{0}, ξ) q_{ϕ} (θ_{1 : L} ∣ y)$ . Unbiased but high variance.

Reparameterized gradient (Eq. 18)

Introduce noise variables $ϵ, ϵ_{1 : L}^{'}$ independent of $(ξ, ϕ)$ with $y = y (θ_{0}, ξ, ϵ)$ and $θ_{ℓ} = θ (y, ϕ, ϵ_{ℓ}^{'})$ . Then
$\frac{\partial I _{A CE}}{\partial ξ} = E [\frac{\partial g}{\partial ξ} + \frac{\partial g}{\partial y} \frac{\partial y}{\partial ξ} + ℓ = 1 \sum L \frac{\partial g}{\partial θ _{ℓ}} \frac{\partial θ _{ℓ}}{\partial y} \frac{\partial y}{\partial ξ}],$
expectation over $p (ϵ) p (ϵ_{1 : L}^{'})$ . Typically much lower variance than the score-function estimator — important for hard design problems.

Rao–Blackwellized gradient for discrete $y$ (Eq. 19)

When $y$ is discrete, sum over outcomes instead of sampling:
$\frac{\partial I _{A CE}}{\partial ξ} = y \in Y \sum E [\frac{\partial g}{\partial ξ} p (y ∣ θ_{0}, ξ) + g \frac{\partial}{\partial ξ} p (y ∣ θ_{0}, ξ)],$
expectation over $p (θ_{0}) \prod_{ℓ} q_{ϕ} (θ_{ℓ} ∣ y)$ . Used for the death-process experiment (66 discrete outcomes).

The $ϕ$ -gradient $\partial I_{A CE} / \partial ϕ$ is handled analogously — if the contrastive $θ_{1 : L}$ are reparameterizable, use the double-reparameterization of Tucker et al. (2018).

Examples

Where each gradient is used (Foster 2020 §4)

Rao–Blackwellization drives the discrete death process (66 outcomes). Reparameterization is the workhorse for the continuous high-dimensional designs (400-D regression, 100-D docking). Likelihood-free ACE (Theorem 2) is what allows the gradient methods to be applied to implicit-likelihood models like the mixed-effects / CES settings.

Connections

Extends Adaptive Contrastive Estimation (ACE) and Prior Contrastive Estimation (PCE) to implicit models and supplies their training gradients.
Bound-preserving analogue of [[Implicit Likelihood Estimator| $\overset{μ}{^}_{m + ℓ}$ ]] (Foster 2019), improving it from an error bound to a valid EIG lower bound.
Independent parallel work: Kleinegesse & Gutmann (2020) showed the MINE-style MI bound can likewise be used in implicit settings, collapsing posterior and likelihood approximations into one critic — see Optimization and Gradient Schemes for BED.

Second Brain

Explorer

Likelihood-Free ACE and Gradient Estimation

Likelihood-Free ACE and Gradient Estimation

Overview

Main Content

Likelihood-free ACE

Gradient estimation for ACE (Foster 2020 §3.6)

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks