Metalearners for CATE

Summary

A metalearner (or meta-algorithm) is any algorithm that takes base ML learners as inputs and combines them to estimate the CATE. The framework decouples the structural problem of CATE estimation from the choice of base learner, enabling use of any supervised ML method (random forests, BART, neural nets) as a drop-in component.

Overview

Why metalearners? Estimating CATE directly is hard because one never observes both potential outcomes for the same unit. Metalearners exploit the structure of the problem — splitting it into subproblems where standard supervised ML excels.

Setup and Notation

Definition: Potential Outcomes Setup

For each unit with covariates :

  • = potential outcome under control
  • = potential outcome under treatment
  • = treatment indicator
  • Observed outcome:

CATE:

ATE:

Definition: Metalearner

A metalearner (or metaalgorithm) is an algorithm that:

  1. Takes one or more supervised learning base learners (or ) as inputs
  2. Uses these base learners to estimate response functions ,
  3. Combines the estimates to produce

The base learner can be any supervised ML method that minimizes expected squared error (regression) or any analogous loss.

Superpopulation Model

Units are drawn i.i.d. from a superpopulation over . The treatment indicator where is the propensity score.

Families of Distributions and Minimax Rate

Definition: Family with Bounded Minimax Rate

For , the family is the set of families with a minimax rate :

for some constant , where is the best estimator using samples.

  • — families where we can estimate response at the parametric rate
  • — nonparametric regression on requires rate

Key implication for CATE: Since CATE is a difference of two conditional means, its estimation rate depends on the smoothness of both response functions and the CATE function itself. The X-learner exploits the case where the CATE is smoother than the response functions.

EMSE for CATE

Definition: EMSE for CATE Estimator

The Expected Mean Squared Error for a CATE estimator over observations with treated units:

where the are importance weights ensuring the loss is meaningful when treatment groups are unequal.

Three Metalearners

LearnerStrategyKey AdvantageKey Weakness
S-LearnerSingle model on Borrows strength across groupsTreatment indicator may be regularized to zero
T-LearnerSeparate models for and Clean separationSuboptimal for unbalanced groups
X-LearnerTwo-stage: impute ITEs, then regressBest for unbalanced treatmentMore complex; requires propensity score

Connections

See Also