T-Learner and Minimax Rate

Summary

The T-learner (two-learner) fits separate base learners for the treatment and control response functions, then estimates CATE as their difference. It achieves the minimax optimal rate for CATE estimation under standard smoothness conditions. However, it is suboptimal for unbalanced treatment groups — the smaller group’s response function is estimated with higher variance, which propagates into the CATE estimate.

Definition and Algorithm

Definition: T-Learner

Step 1 (first stage): Fit separate response functions on each arm:
$\overset{μ}{^}_{0} (x) = E [Y (0) ∣ X = x] estimated on control units {i : W_{i} = 0}$ $\overset{μ}{^}_{1} (x) = E [Y (1) ∣ X = x] estimated on treated units {i : W_{i} = 1}$
Step 2: Estimate CATE as:
$\overset{τ}{^}^{T} (x) = \overset{μ}{^}_{1} (x) - \overset{μ}{^}_{0} (x)$

Minimax Rate Theorem

Theorem 1: Minimax Rate of T-Learner (Künzel et al. 2019)

For a family of superpopulations $P$ from $S (a_{0}, a_{τ})$ (where $a_{0}$ controls base function smoothness and $a_{τ}$ controls CATE smoothness), there exist base learners for the T-learner such that:
$P \in S (a_{0}, a_{τ}) sup EMSE (P, \overset{τ}{^}^{T}) \leq C (m^{- a_{0}} + n^{- a_{0}})$
where $m$ is the total number of units, $n$ is the number of treated units, and $C$ is a constant.

Interpretation: The T-learner rate is limited by the smaller of the two sample sizes ( $n$ treated vs. $m - n$ control). When groups are balanced, both converge at rate $N^{- a_{0}}$ , which is minimax optimal if the response functions and CATE have the same smoothness.

Key limitation: If the treatment group is much smaller ( $n ≪ m - n$ ), the T-learner is limited by $n^{- a_{0}}$ — it cannot exploit the large control group to improve estimation of the treatment response.

When T-Learner Fails

Consider an experiment where:

Control group: $m - n = 1000$ observations
Treatment group: $n = 100$ observations
True CATE: $τ (x) \approx$ constant

The T-learner fits $\overset{μ}{^}_{1}$ on only 100 observations → high variance → the CATE estimate inherits that variance. The large control group provides no benefit.

The X-learner addresses precisely this failure mode — see X-Learner.

Properties

Key advantage:

Completely separates treatment and control estimation → no interference between groups
Theorem 1 guarantees minimax optimality when groups are balanced and $a_{0} = a_{τ}$

Key weakness:

Suboptimal when $n ≪ m - n$ (or vice versa): limited by smaller group’s sample size
Cannot exploit cross-group information (unlike X-learner)

Connections

Extends Metalearners for CATE framework
Limitation motivates X-Learner design
Rate result uses def-family families $S (a)$

Second Brain

Explorer

T-Learner and Minimax Rate

T-Learner and Minimax Rate

Definition and Algorithm

Minimax Rate Theorem

When T-Learner Fails

Properties

Connections

See Also

Graph View

Table of Contents

Backlinks