Prior Contrastive Estimation (PCE)
Summary
Prior Contrastive Estimation (PCE) is the simplification of ACE that draws the contrastive samples from the prior instead of a learned inference network . This removes the need to learn any variational parameters — there is no — so it is cheaper and simpler to train, at the cost of only being tight as (it loses ACE’s “good network” route to tightness). PCE is essentially the InfoNCE mutual-information bound applied to experimental design.
Overview
ACE’s inference network must be learned, adding parameters and training cost. If the prior is already a reasonable proposal for estimating the marginal , we can skip the network and use prior draws as contrasts. The bound then depends only on the design , so optimization is a pure design problem.
Main Content
Definition: PCE lower bound (Foster 2020, Eq. 12)
With , , and contrastive samples drawn from the prior:
expectation over . This is the special case of ACE, so it inherits Theorem 1: it is a valid lower bound, monotone in , and tight as (but only case-2 tightness — no “perfect network” route).
Connection to InfoNCE (Foster 2020, Eq. 13)
The InfoNCE / information-noise-contrastive-estimation bound from representation learning (van den Oord et al. 2018) is, for data , representations , and critic :
Writing for and for , PCE is the case where the optimal critic is known (it is the likelihood) — so PCE is the experimental-design instance of InfoNCE with a known critic.
Unnormalized prior densities
A practical bonus: PCE (and ACE) only need the prior up to proportionality. If with independent of and an unnormalized density, then (Foster 2020, Eq. 15)
and the derivatives of vanish. This matters in iterated design, where the prior at step is the previous posterior , known only up to its normalizing constant.
When to prefer PCE vs ACE
- PCE: prior is an adequate proposal for ; no variational training wanted; low-to-moderate dimension. PCE performed well in low dimensions but degraded as dimension increased (the prior becomes an inefficient proposal).
- ACE: the inference network can closely approximate the posterior, or sampling from the prior is inefficient (high dimension) — ACE/BA learn adaptive proposals and avoid the under-estimation.
Connections
- Special case of Adaptive Contrastive Estimation (ACE) (set ).
- InfoNCE / NCE lineage: ties BOED to contrastive representation learning and noise-contrastive estimation.
- In Rainforth 2023, PCE is one of the “contrastive bounds” cited as enabling consistent stochastic-gradient design optimization (their §3.3.1).
See Also
- Adaptive Contrastive Estimation (ACE) — the adaptive (learned-proposal) generalization
- Likelihood-Free ACE and Gradient Estimation — gradient estimation that applies to PCE too
- High-Dimensional Design Applications — where PCE wins (low-D) and loses (high-D)