Metalearners for CATE
Summary
A metalearner (or meta-algorithm) is any algorithm that takes base ML learners as inputs and combines them to estimate the CATE. The framework decouples the structural problem of CATE estimation from the choice of base learner, enabling use of any supervised ML method (random forests, BART, neural nets) as a drop-in component.
Overview
Why metalearners? Estimating CATE directly is hard because one never observes both potential outcomes for the same unit. Metalearners exploit the structure of the problem — splitting it into subproblems where standard supervised ML excels.
Setup and Notation
Definition: Potential Outcomes Setup
For each unit with covariates :
- = potential outcome under control
- = potential outcome under treatment
- = treatment indicator
- Observed outcome:
CATE:
ATE:
Definition: Metalearner
A metalearner (or metaalgorithm) is an algorithm that:
- Takes one or more supervised learning base learners (or ) as inputs
- Uses these base learners to estimate response functions ,
- Combines the estimates to produce
The base learner can be any supervised ML method that minimizes expected squared error (regression) or any analogous loss.
Superpopulation Model
Units are drawn i.i.d. from a superpopulation over . The treatment indicator where is the propensity score.
Families of Distributions and Minimax Rate
Definition: Family with Bounded Minimax Rate
For , the family is the set of families with a minimax rate :
for some constant , where is the best estimator using samples.
- — families where we can estimate response at the parametric rate
- — nonparametric regression on requires rate
Key implication for CATE: Since CATE is a difference of two conditional means, its estimation rate depends on the smoothness of both response functions and the CATE function itself. The X-learner exploits the case where the CATE is smoother than the response functions.
EMSE for CATE
Definition: EMSE for CATE Estimator
The Expected Mean Squared Error for a CATE estimator over observations with treated units:
where the are importance weights ensuring the loss is meaningful when treatment groups are unequal.
Three Metalearners
| Learner | Strategy | Key Advantage | Key Weakness |
|---|---|---|---|
| S-Learner | Single model on | Borrows strength across groups | Treatment indicator may be regularized to zero |
| T-Learner | Separate models for and | Clean separation | Suboptimal for unbalanced groups |
| X-Learner | Two-stage: impute ITEs, then regress | Best for unbalanced treatment | More complex; requires propensity score |
Connections
- Builds on Causal Estimands — CATE is the target quantity
- Potential Outcomes Framework — the theoretical foundation
- Propensity Score in Bayesian CI — propensity score used by X-learner as weighting function
- Nonparametric Causal Inference — BART is a common base learner for metalearners
See Also
- Künzel 2019 - Overview — paper context
- S-Learner, T-Learner and Minimax Rate, X-Learner — the three metalearners