Implicit Likelihood Estimator

Summary

The implicit-likelihood estimator $\overset{μ}{^}_{m + ℓ}$ handles models whose likelihood $p (y ∣ θ, d)$ can be sampled but not evaluated (e.g. random-effects / nuisance-variable models, where $p (y ∣ θ, d) = E_{p (ψ ∣ θ)} [p (y ∣ θ, ψ, d)]$ is intractable). It learns two approximations — a marginal $q_{m} (y ∣ d)$ and a likelihood $q_{ℓ} (y ∣ θ, d)$ — and plugs both into the EIG. Unlike the other three, it is not a bound on the EIG, but Lemma 2 bounds its error, so minimizing that bound trains it.

Overview

The posterior, marginal, and VNMC estimators all assume the likelihood $p (y ∣ θ, d)$ can be evaluated pointwise. Many important models cannot: introduce nuisance latents $ψ$ (random effects, latent confounders) and the likelihood becomes an intractable integral, even though you can still simulate $y$ by sampling $ψ$ then $y$ . The variational posterior $\overset{μ}{^}_{post}$ works here unchanged (it needs only samples), but $\overset{μ}{^}_{marg}$ does not — it contains $p (y ∣ θ, d)$ explicitly. The fix is to approximate the likelihood too.

Main Content

Definition: Implicit-likelihood estimator (Foster 2019, Eq. 12)

Using a marginal approximation $q_{m} (y ∣ d)$ and a likelihood approximation $q_{ℓ} (y ∣ θ, d)$ :
$EIG (d) \approx I_{m + ℓ} (d) := E_{p (y, θ ∣ d)} [lo g \frac{q _{ℓ} ( y ∣ θ , d )}{q _{m} ( y ∣ d )}] \approx \overset{μ}{^}_{m + ℓ} (d) := \frac{1}{N} n = 1 \sum N lo g \frac{q _{ℓ} ( y _{n} ∣ θ _{n} , d )}{q _{m} ( y _{n} ∣ d )} .$
This is not a bound on the EIG (unlike the other three estimators).

Lemma 2 — EIG estimation error bound (Foster 2019)

For any valid $q_{m} (y ∣ d)$ and $q_{ℓ} (y ∣ θ, d)$ , the EIG estimation error is bounded:
$∣ I_{m + ℓ} (d) - EIG (d) ∣ \leq - E_{p (y, θ ∣ d)} [lo g q_{m} (y ∣ d) + lo g q_{ℓ} (y ∣ θ, d)] + C,$
where $C = - H [p (y ∣ d)] - E_{p (θ)} [H [p (y ∣ θ, d)]]$ does not depend on $q_{m}$ or $q_{ℓ}$ . The RHS is $0$ iff $q_{m} = p (y ∣ d)$ and $q_{ℓ} = p (y ∣ θ, d)$ for almost all $y, θ$ .

Training implication

Lemma 2 says: learn $q_{m}$ and $q_{ℓ}$ by maximizing $E_{p (y, θ ∣ d)} [lo g q_{m} (y ∣ d) + lo g q_{ℓ} (y ∣ θ, d)]$ via stochastic gradient ascent (two ordinary maximum-likelihood density-estimation problems), then substitute into Eq. 12. In general $q_{m}$ and $q_{ℓ}$ are learned separately with no weight sharing; Foster 2019 §A.4 discusses the coupled case $q_{m} (y ∣ d) = E_{p (θ)} [q_{ℓ} (y ∣ θ, d)]$ .

Where it sits among the four

From Foster 2019, Table 1: $\overset{μ}{^}_{m + ℓ}$ is the only estimator marked implicit-likelihood ✓ besides $\overset{μ}{^}_{post}$ , and it relies on approximating a distribution over $y$ (so prefer it, like $\overset{μ}{^}_{marg}$ , when $dim (y) ≪ dim (θ)$ ). It gave the lowest empirical MSE on the mixed-effects benchmark (the implicit-likelihood problem) in Table 2.

Examples

Mixed-effects / item-response model (Foster 2019 §6.1, §6.3)

A psychology item-response model has common fixed effects $θ$ (of interest) and per-participant random effects (nuisance $ψ$ ), making $p (y ∣ θ, d)$ implicit. $\overset{μ}{^}_{m + ℓ}$ gives the best EIG accuracy here and is used to drive the online adaptive face-perception experiment on Mechanical Turk, producing lower-entropy posteriors than random design. See Sequential and Adaptive BED.

Connections

Generalizes [[Variational Marginal Estimator| $\overset{μ}{^}_{marg}$ ]] (recover it when the likelihood is explicit, $q_{ℓ} = p (y ∣ θ, d)$ ).
Anticipates likelihood-free ACE (Foster 2020, Theorem 2), which instead keeps the contrastive structure and replaces the likelihood with an unnormalized approximation while preserving a valid lower bound.
Related to simulation-based / likelihood-free inference (LFIRE, ABC) for implicit models.

Second Brain

Explorer

Implicit Likelihood Estimator

Implicit Likelihood Estimator

Overview

Main Content

Training implication

Where it sits among the four

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks