S-Learner

Summary

The S-learner (single learner) estimates the CATE by fitting a single regression model on all data with the treatment indicator included as a feature. The CATE estimate is the difference in predictions with vs. . Simple to implement but may underperform when base learners regularize the treatment indicator toward zero.

Overview

The S-learner is the most straightforward metalearner. It treats treatment status as just another covariate and delegates all structure learning to the base learner.

Definition and Algorithm

Definition: S-Learner

Step 1: Fit a single response function using all observed data:

using any supervised learning method that estimates the conditional mean.

Step 2: Estimate the CATE as the difference in predictions:

Properties and Performance

Key advantage:

  • Borrows strength across treatment and control groups — all data used in one model
  • With linear base learner, gives (constant ATE)

Key weakness:

  • Many ML algorithms (e.g., random forests) regularize features equally. If the treatment effect is small relative to other variation, the treatment indicator may effectively be shrunk toward zero → even when effects are heterogeneous
  • RF as base learner: Treatment indicator assigned the same split probability as covariates, so it is selected of the time → treatment effect underestimated proportionally

When S-learner performs well:

  • Treatment effect is constant (or close to constant) and truly small
  • The CATE function is simpler than either response function
  • Large datasets where regularization pressure is low

When it fails:

  • Highly heterogeneous CATE
  • Base learner regularizes features to zero (random forests, LASSO)
  • Propensity score far from 0.5

Illustration (Fig. 1A, 1B)

In a simple example with one covariate and piecewise linear :

  • (blue) fits the control group — matches data well
  • (dashed) fits the treated group — but without borrowing, it is relatively poor
  • The S-learner combines both, producing smoother but potentially biased

Connections

See Also