Dynamic Treatment Regimes

Routing Summary

Estimating optimal dynamic treatment regimes (DTRs) — sequences of decision rules for personalized, multi-stage treatment — from clinical-trial or observational data (Schulte, Tsiatis, Laber & Davidian 2014). Covers the two main methods, Q-learning and A-learning, and the bias–variance / robustness trade-off between them. Contains 5 notes.

Concept Map

ConceptNoteTypeDepends OnKey Result
DTR review; Q vs A-learningQ- and A-learning - OverviewoverviewPotential Outcomes FrameworkQ-learning more efficient when correct; A-learning more robust
Sequential potential outcomes; optimal regime; assumptionsDynamic Treatment Regimes FrameworkdefinitionCausal Estimands identified under consistency + sequential randomization + positivity
Q-functions, value functions, backward inductionOptimal Regime via Dynamic ProgrammingtheoremDynamic Treatment Regimes Framework; observed = potential under assumptions; presentation-invariant
Backward-regression estimation of Q-functionsQ-learningconceptOptimal Regime via Dynamic ProgrammingConsistent only if all correct; value-to-go response is nonlinear
Contrast/advantage functions, g-estimation, double robustnessA-learning and RobustnessconceptQ-learningConsistent if contrast + (propensity OR nuisance) correct; robust but less efficient

Notes

  • Q- and A-learning - Overview — CONTAINS: research question, Q-vs-A comparison table, three key findings, relation to metalearners/g-computation.
  • Dynamic Treatment Regimes Framework — CONTAINS: notation; Def. sequential potential outcomes (Eq. 1); Def. DTR & optimal regime (Eqs. 3-4); the 4 identification assumptions; feasible regimes; SMART vs. observational designs.
  • Optimal Regime via Dynamic Programming — CONTAINS: backward-induction recursion (Eqs. 5-8); Def. Q- and value functions (Eqs. 9-14); identification theorem (Eq. 19); midstream presentation-invariance theorem (Eq. 25).
  • Q-learning — CONTAINS: backward WLS/OLS estimating equations (Eqs. 26-28); linear two-decision example (Eq. 29); nonlinearity/misspecification of the value-to-go response; flexible-model remedies.
  • A-learning and Robustness — CONTAINS: Def. contrast/advantage function; g-estimation equations (Eqs. 30-31); double-robustness property; efficiency-vs-robustness trade-off and simulation findings (Figs. 1-6).

Sources

  • q- and a- learning.pdf — Schulte, Tsiatis, Laber & Davidian (2014), “Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes”, Statistical Science 29(4):640–661. Demonstrated on the STAR*D depression study (§7).

See Also