Expected Information Gain

Summary

The expected information gain (EIG) is the objective function of Bayesian experimental design: the expected reduction in entropy (uncertainty) about the latent variable from running an experiment with design , averaged over not-yet-observed outcomes . It equals the mutual information . The Bayesian optimal design is . This note gives the four equivalent forms of the EIG and why it is hard to compute.

Overview

We hold a prior over a latent quantity of interest (model parameters, a function optimum, a future prediction — anything) and a model for the outcome of an experiment run under design . After observing , Bayes’ rule updates us to the posterior .

The information gain of a particular realized outcome is the drop in Shannon entropy from prior to posterior. Before running the experiment we do not know , so to score a design we take the expectation over outcomes — giving the EIG.

Main Content

Definition: Information Gain (Rainforth 2023, Eq. 1)

For a hypothetical outcome under design , the information gain in is the reduction in Shannon entropy from the prior to the posterior:

Because is unknown at design time, this cannot be optimized directly.

Definition: Expected Information Gain (Rainforth 2023, Eqs. 2–3)

The EIG averages information gain over outcomes via the marginal predictive :

The Bayesian optimal design is .

Equivalent forms of the EIG (mutual information)

The EIG can be written four equivalent ways, each suggesting a different estimator. Writing the joint as :

The middle form shows the EIG is the mutual information between latent and outcome. The right form (a “likelihood” form) is convenient when ; the left (“posterior”) form when .

Why the EIG is hard: double intractability

Every form contains an intractable normalizing density:

  • the posterior (left form), and/or
  • the marginal likelihood (right form),

neither of which is generally available in closed form. A naive Monte Carlo estimator of, e.g., the likelihood form,

fails because each is itself an intractable integral. This makes the EIG a nested (doubly-intractable) expectation requiring nested estimation, whose conventional estimators converge slowly (). Overcoming this is the entire technical program of variational EIG estimation and the unified gradient approach.

Decision-theoretic reading

The EIG is the expected utility of an experiment when utility is the information / log-score utility . More general BED replaces this with any utility that is a functional of the posterior (Bernardo 1979; Chaloner & Verdinelli 1995) — but the KL/entropy utility is the most common and typically best-performing choice. See Information-Theoretic Design Objectives and Decision Analysis.

Examples

Discrete-outcome (Rao–Blackwellized) EIG

When takes a small number of discrete values , the inner marginal can be enumerated rather than sampled, giving a lower-variance estimator (Rainforth 2023, Eq. 6):

This is exactly the trick used for the death-process experiment in High-Dimensional Design Applications.

Connections

See Also