Dynamic Treatment Regimes Framework

Summary

The conceptual framework for optimal dynamic treatment regimes (DTRs). With $K$ ordered decision points, potential outcomes are defined over all possible treatment histories $\overset{a}{ˉ}_{K} = (a_{1}, \dots, a_{K})$ . A regime $d = (d_{1}, \dots, d_{K})$ assigns, at each decision $k$ , a treatment as a function of the realized history $(\overset{s}{ˉ}_{k}, \overset{a}{ˉ}_{k - 1})$ . The optimal regime $d^{opt}$ maximizes the population mean potential outcome. Estimating it from observed data requires three assumptions: consistency (SUTVA), sequential randomization (no unmeasured confounders at each decision), and positivity (every treatment option in the regime class is represented in the data).

Overview

To define and estimate an optimal regime we need a careful potential-outcomes setup for sequential decisions. Large values of a final outcome $Y$ are preferred; $Y$ may be measured after the $K$ th decision or be a function of the whole trajectory. This note states the estimand and the assumptions under which it is identifiable; the methods that estimate it are in Optimal Regime via Dynamic Programming, Q-learning, and A-learning and Robustness.

Main Content

Notation

$K$ ordered decision points $k = 1, \dots, K$ ; at each, a finite set $A_{k}$ of treatment options.
$S_{1}$ : baseline covariates; $S_{k}$ ( $k \geq 2$ ): covariate information accrued between decisions $k - 1$ and $k$ .
$\overset{a}{ˉ}_{k} = (a_{1}, \dots, a_{k})$ : a treatment history; $\overset{ˉ}{S}_{k} = (S_{1}, \dots, S_{k})$ .
$A_{k}$ : the observed (recorded) treatment at decision $k$ ; $Y$ : observed outcome.
$Ψ_{k} (\overset{s}{ˉ}_{k}, \overset{a}{ˉ}_{k - 1}) \subseteq A_{k}$ : the set of treatment options permitted for a patient with that history (encodes ethical/feasibility/policy restrictions); the regime class $D$ is $Ψ$ -specific.

Definition: Potential outcomes for sequential treatments (Robins 1986; §2, Eq. 1)

The full set of potential outcomes is
$W^{*} = {S_{2}^{*} (a_{1}), S_{3}^{*} (\overset{a}{ˉ}_{2}), \dots, S_{K}^{*} (\overset{a}{ˉ}_{K - 1}), Y^{*} (\overset{a}{ˉ}_{K}) for all \overset{a}{ˉ}_{K} \in \overset{ˉ}{A}_{K}},$
where $S_{k}^{*} (\overset{a}{ˉ}_{k - 1})$ is the covariate value that would arise between decisions $k - 1$ and $k$ had the patient received history $\overset{a}{ˉ}_{k - 1}$ , and $Y^{*} (\overset{a}{ˉ}_{K})$ is the outcome that would result under the full treatment history $\overset{a}{ˉ}_{K}$ .

Definition: Dynamic treatment regime and optimal regime (§2-3, Eqs. 3-4)

A dynamic treatment regime $d = (d_{1}, \dots, d_{K})$ is a set of rules where rule $d_{k} (\overset{s}{ˉ}_{k}, \overset{a}{ˉ}_{k - 1}) \in Ψ_{k} (\overset{s}{ˉ}_{k}, \overset{a}{ˉ}_{k - 1})$ maps the realized history to a treatment. Writing $Y^{*} (d)$ for the potential outcome under regime $d$ , the regime $d^{opt} \in D$ is optimal if
$E {Y^{*} (d) ∣ S_{1} = s_{1}} \leq E {Y^{*} (d^{opt}) ∣ S_{1} = s_{1}} for all d \in D and all s_{1} \in S_{1} .$
Optimality is predicated on the chosen class $D$ (the restrictions $Ψ$ ); the class is conceived from scientific/policy objectives, not from the available data.

Identification assumptions

Assumptions for identifying $d^{opt}$ from observed data (§2)

Consistency (SUTVA part 1): the observed covariates/outcome equal the potential ones under the treatments actually received — $S_{k} = S_{k}^{*} (\overset{ˉ}{A}_{k - 1})$ and $Y = Y^{*} (\overset{ˉ}{A}_{K})$ .

Stable Unit Treatment Value Assumption (Rubin 1978): a patient’s covariates/outcome are unaffected by how treatments are allocated to other patients.

Sequential randomization / no unmeasured confounders (Robins 1994): at each decision, the observed treatment is conditionally independent of the future potential outcomes given the history — $A_{k} ⊥ ⊥ W^{*} ∣ \overset{ˉ}{S}_{k}, \overset{ˉ}{A}_{k - 1}$ , $k = 1, \dots, K$ . Satisfied by design in a SMART; unverifiable in observational data.

Positivity (§3, Eq. 15): every permitted treatment option occurs with positive probability in the data — $pr (A_{k} = a_{k} ∣ \overset{ˉ}{S}_{k} = \overset{s}{ˉ}_{k}, \overset{ˉ}{A}_{k - 1} = \overset{a}{ˉ}_{k - 1}) > 0$ for histories in $Γ_{k}$ and $a_{k} \in Ψ_{k}$ .

Feasible regimes. Estimability of $d^{opt}$ requires the treatment options in $Ψ_{k}$ to be represented in the data; the largest class so representable is the class of feasible regimes $Ψ^{m a x}$ (Robins 2004). If $Ψ \neq \subseteq Ψ^{m a x}$ , the class of interest must be revised or new data found.

Study designs

Observational study: treatment follows routine clinical practice; sequential randomization is an untestable assumption.
SMART (Sequential Multiple Assignment Randomized Trial; Lavori & Dawson 2000; Murphy 2005): participants are re-randomized at each decision point (randomization probabilities may depend on history), making sequential randomization hold by design — the gold standard for DTR data.

Connections

A sequential extension of the Potential Outcomes Framework; the estimand generalizes the single-stage average treatment effect (Causal Estimands).
The sequential-randomization assumption is the multi-stage analog of unconfoundedness used in g-computation.
Provides the estimand that Optimal Regime via Dynamic Programming characterizes and Q-learning/A-learning and Robustness estimate.

Second Brain

Explorer

Dynamic Treatment Regimes Framework

Dynamic Treatment Regimes Framework

Overview

Main Content

Notation

Identification assumptions

Study designs

Connections

See Also

Graph View

Table of Contents

Backlinks