Expected Information Gain
Summary
The expected information gain (EIG) is the objective function of Bayesian experimental design: the expected reduction in entropy (uncertainty) about the latent variable from running an experiment with design , averaged over not-yet-observed outcomes . It equals the mutual information . The Bayesian optimal design is . This note gives the four equivalent forms of the EIG and why it is hard to compute.
Overview
We hold a prior over a latent quantity of interest (model parameters, a function optimum, a future prediction — anything) and a model for the outcome of an experiment run under design . After observing , Bayes’ rule updates us to the posterior .
The information gain of a particular realized outcome is the drop in Shannon entropy from prior to posterior. Before running the experiment we do not know , so to score a design we take the expectation over outcomes — giving the EIG.
Main Content
Definition: Information Gain (Rainforth 2023, Eq. 1)
For a hypothetical outcome under design , the information gain in is the reduction in Shannon entropy from the prior to the posterior:
Because is unknown at design time, this cannot be optimized directly.
Definition: Expected Information Gain (Rainforth 2023, Eqs. 2–3)
The EIG averages information gain over outcomes via the marginal predictive :
The Bayesian optimal design is .
Equivalent forms of the EIG (mutual information)
The EIG can be written four equivalent ways, each suggesting a different estimator. Writing the joint as :
The middle form shows the EIG is the mutual information between latent and outcome. The right form (a “likelihood” form) is convenient when ; the left (“posterior”) form when .
Why the EIG is hard: double intractability
Every form contains an intractable normalizing density:
- the posterior (left form), and/or
- the marginal likelihood (right form),
neither of which is generally available in closed form. A naive Monte Carlo estimator of, e.g., the likelihood form,
fails because each is itself an intractable integral. This makes the EIG a nested (doubly-intractable) expectation requiring nested estimation, whose conventional estimators converge slowly (). Overcoming this is the entire technical program of variational EIG estimation and the unified gradient approach.
Decision-theoretic reading
The EIG is the expected utility of an experiment when utility is the information / log-score utility . More general BED replaces this with any utility that is a functional of the posterior (Bernardo 1979; Chaloner & Verdinelli 1995) — but the KL/entropy utility is the most common and typically best-performing choice. See Information-Theoretic Design Objectives and Decision Analysis.
Examples
Discrete-outcome (Rao–Blackwellized) EIG
When takes a small number of discrete values , the inner marginal can be enumerated rather than sampled, giving a lower-variance estimator (Rainforth 2023, Eq. 6):
This is exactly the trick used for the death-process experiment in High-Dimensional Design Applications.
Connections
- Generalizes to sequential settings via the incremental EIG conditioned on history — see Sequential and Adaptive BED.
- Is a mutual information, so any MI lower/upper bound (Barber–Agakov, InfoNCE/PCE, NWJ, MINE) becomes an EIG estimator — the basis of Variational BOED - Overview and Adaptive Contrastive Estimation (ACE).
- Contrasts with the Fisher-information / alphabetic-optimality criteria of classical design, which approximate or replace the EIG — see Information-Theoretic Design Objectives.
See Also
- Lindley’s Information Measure — the 1956 origin of this measure (Definitions 1–2, additivity, the design rule)
- Nested Estimation and Nested Monte Carlo — why and how the EIG is estimated
- Decision Analysis — expected-utility framing of experimentation
- Probability and Bayesian Inference — the prior→posterior update underlying information gain