X-Learner
Summary
The X-learner is a two-stage metalearner that imputes individual treatment effects (ITEs) by cross-applying fitted response functions, then regresses the imputed ITEs on covariates. It is particularly powerful when treatment and control groups are unbalanced, as it can exploit the large group to improve estimation in the small group. It achieves a minimax optimal rate that adapts to both CATE smoothness and response function smoothness.
Overview
The X-learner solves the key failure mode of the T-learner: when treatment groups are unbalanced, one response function is estimated precisely and the other poorly — but the T-learner cannot cross-use information.
The X-learner crosses the group information in Stage 2: it uses the well-estimated control model to impute counterfactuals for treated units, and vice versa.
Algorithm
Definition: X-Learner (Three Steps)
Stage 1 — Estimate response functions (same as T-learner):
Stage 2 — Impute individual treatment effects:
For treated units :
(observed treated outcome minus imputed control outcome using )
For control units :
(imputed treatment outcome using minus observed control outcome)
Stage 2 — Regress imputed ITEs:
Stage 3 — Combine with propensity score weights:
where is a weighting function, often set to the propensity score .
Intuition
The X-learner uses the large group (say, control with many observations) to improve estimation for the small group (treatment):
- is estimated precisely from many control observations
- This precise is cross-applied to treated units to impute their counterfactual
- The imputed ITE is then a cleaner signal for regressing the CATE on
In the second stage, is estimated from treated units with imputed ITEs — these have reduced variance because is very accurate.
Minimax Rate Theorem
Theorem 2: Minimax Optimality of X-Learner
Assume we observe control and treated units, with (unbalanced design). For families satisfying Conditions 1-6 (Lipschitz continuity, bounded propensity score, bounded moments):
where is the total sample size and is the smaller group size.
Key insight: The X-learner rate is . If (CATE is smoother than the response functions), the X-learner can achieve — the full-data rate — rather than being bottlenecked by the small group.
Contrast with T-learner: T-learner is bounded by regardless. X-learner additionally exploits when the CATE function is smooth.
Conditions for X-Learner Advantage
The X-learner outperforms T-learner when:
- Unbalanced groups: One arm has far more observations than the other
- Smooth CATE: — treatment effect is simpler than the response functions
- Large control group: Can impute good counterfactuals for treated units
When the CATE is constant (or near-zero), the X-learner advantage is largest because (constant is infinitely smooth).
Propensity Score as Weight
The weighting function balances and . Using :
- When is small (few treated units), weight is put on (estimated from the larger control group)
- When is large (many treated), weight is put on
This ensures that the better-estimated CATE component dominates.
Connections
- Extends T-Learner and Minimax Rate by adding Stage 2 imputation
- Uses propensity score — see Propensity Score in Bayesian CI for Bayesian treatment
- Applied to Metalearner Simulation Results real experiments
- Software: hte R library implements X-, T-, S-learner with confidence intervals
See Also
- Metalearners for CATE — general framework
- T-Learner and Minimax Rate — foundation that X-learner improves on
- S-Learner — simpler alternative
- Metalearner Simulation Results — empirical performance