Dynamic Treatment Regimes Framework

Summary

The conceptual framework for optimal dynamic treatment regimes (DTRs). With ordered decision points, potential outcomes are defined over all possible treatment histories . A regime assigns, at each decision , a treatment as a function of the realized history . The optimal regime maximizes the population mean potential outcome. Estimating it from observed data requires three assumptions: consistency (SUTVA), sequential randomization (no unmeasured confounders at each decision), and positivity (every treatment option in the regime class is represented in the data).

Overview

To define and estimate an optimal regime we need a careful potential-outcomes setup for sequential decisions. Large values of a final outcome are preferred; may be measured after the th decision or be a function of the whole trajectory. This note states the estimand and the assumptions under which it is identifiable; the methods that estimate it are in Optimal Regime via Dynamic Programming, Q-learning, and A-learning and Robustness.

Main Content

Notation

  • ordered decision points ; at each, a finite set of treatment options.
  • : baseline covariates; (): covariate information accrued between decisions and .
  • : a treatment history; .
  • : the observed (recorded) treatment at decision ; : observed outcome.
  • : the set of treatment options permitted for a patient with that history (encodes ethical/feasibility/policy restrictions); the regime class is -specific.

Definition: Potential outcomes for sequential treatments (Robins 1986; §2, Eq. 1)

The full set of potential outcomes is

where is the covariate value that would arise between decisions and had the patient received history , and is the outcome that would result under the full treatment history .

Definition: Dynamic treatment regime and optimal regime (§2-3, Eqs. 3-4)

A dynamic treatment regime is a set of rules where rule maps the realized history to a treatment. Writing for the potential outcome under regime , the regime is optimal if

Optimality is predicated on the chosen class (the restrictions ); the class is conceived from scientific/policy objectives, not from the available data.

Identification assumptions

Assumptions for identifying from observed data (§2)

  1. Consistency (SUTVA part 1): the observed covariates/outcome equal the potential ones under the treatments actually received — and .
  2. Stable Unit Treatment Value Assumption (Rubin 1978): a patient’s covariates/outcome are unaffected by how treatments are allocated to other patients.
  3. Sequential randomization / no unmeasured confounders (Robins 1994): at each decision, the observed treatment is conditionally independent of the future potential outcomes given the history — , . Satisfied by design in a SMART; unverifiable in observational data.
  4. Positivity (§3, Eq. 15): every permitted treatment option occurs with positive probability in the data — for histories in and .

Feasible regimes. Estimability of requires the treatment options in to be represented in the data; the largest class so representable is the class of feasible regimes (Robins 2004). If , the class of interest must be revised or new data found.

Study designs

  • Observational study: treatment follows routine clinical practice; sequential randomization is an untestable assumption.
  • SMART (Sequential Multiple Assignment Randomized Trial; Lavori & Dawson 2000; Murphy 2005): participants are re-randomized at each decision point (randomization probabilities may depend on history), making sequential randomization hold by design — the gold standard for DTR data.

Connections

See Also