From Designs to Policies (Deep Adaptive Design)

Summary

The review’s signature “what’s new” contribution: replace per-step design optimization with a pre-trained design policy. Deep adaptive design (DAD) learns a policy network that maps the experiment history directly to the next design , trained offline by maximizing the total EIG over all steps. At deployment a single forward pass gives each design — no inference, no optimization, in real time — and the learned policy is non-myopic, fixing both flaws of traditional BAD.

Overview

Even with a fast EIG estimator and optimizer, traditional Bayesian adaptive design is expensive and myopic: it must (i) run Bayesian inference to update the model at every step, and (ii) greedily maximize only the next step’s incremental EIG, ignoring downstream effects. DAD removes both by shifting all computation offline into training a policy, deployable near-instantly.

Main Content

The policy idea

Definition: Design policy and total EIG (Rainforth 2023 §4, Eqs. 16–17)

A design policy is a network with mapping history to the next design via a single forward pass. It is trained to maximize the total expected information gain over all steps:

where is generated autoregressively (, ). Because incremental EIGs are additive, incremental EIGs (Eq. 17), so optimizing the total EIG is exactly the non-myopic objective.

Why it works (Figure 2 of the review)

  • Offline training, live deployment. All cost is paid once, upfront, learning . During the experiment, design decisions are single forward passes — real-time adaptation, no per-step inference or optimization.
  • Amortization across realizations. The same learned serves many runs of the experiment (e.g. many survey participants), amortizing the training cost.
  • Non-myopic. Optimizing the total EIG accounts for how each design influences information gathered at future steps — traditional BAD’s greedy per-step rule is the sub-optimal .
  • Justified generality. A policy as a function of history loses no generality: for a given model, all design-relevant information is contained in the history (the posterior belief state is determined by model + history).

Lineage

  • Huan & Marzouk (2016) first framed adaptive design via dynamic programming / RL, learning a map from posterior representations (the RL “state”) to designs — but still required posterior updates at each step.
  • DAD (Foster et al. 2021) maps histories directly to designs, removing inference at deployment; trained with variational EIG bounds (The Computational Revolution in EIG Estimation) and stochastic gradients (Optimization and Gradient Schemes for BED), with a custom permutation-invariant policy architecture.
  • iDAD generalized DAD to implicit-likelihood models and refined the policy architecture.
  • Subsequent work uses reinforcement learning to learn the policy, reflecting that BAD is a Bayes-adaptive Markov decision process with the incremental EIG as reward.

Empirical payoff

Beyond the transformative speed gains, DAD has been found to also improve the quality of designs versus traditional greedy strategies — attributed to learning non-myopic policies and avoiding errors from approximate per-step inference.

Connections

See Also