S-Learner
Summary
The S-learner (single learner) estimates the CATE by fitting a single regression model on all data with the treatment indicator included as a feature. The CATE estimate is the difference in predictions with vs. . Simple to implement but may underperform when base learners regularize the treatment indicator toward zero.
Overview
The S-learner is the most straightforward metalearner. It treats treatment status as just another covariate and delegates all structure learning to the base learner.
Definition and Algorithm
Definition: S-Learner
Step 1: Fit a single response function using all observed data:
using any supervised learning method that estimates the conditional mean.
Step 2: Estimate the CATE as the difference in predictions:
Properties and Performance
Key advantage:
- Borrows strength across treatment and control groups — all data used in one model
- With linear base learner, gives (constant ATE)
Key weakness:
- Many ML algorithms (e.g., random forests) regularize features equally. If the treatment effect is small relative to other variation, the treatment indicator may effectively be shrunk toward zero → even when effects are heterogeneous
- RF as base learner: Treatment indicator assigned the same split probability as covariates, so it is selected of the time → treatment effect underestimated proportionally
When S-learner performs well:
- Treatment effect is constant (or close to constant) and truly small
- The CATE function is simpler than either response function
- Large datasets where regularization pressure is low
When it fails:
- Highly heterogeneous CATE
- Base learner regularizes features to zero (random forests, LASSO)
- Propensity score far from 0.5
Illustration (Fig. 1A, 1B)
In a simple example with one covariate and piecewise linear :
- (blue) fits the control group — matches data well
- (dashed) fits the treated group — but without borrowing, it is relatively poor
- The S-learner combines both, producing smoother but potentially biased
Connections
- Part of Metalearners for CATE framework
- Contrast with T-Learner and Minimax Rate (separate models) and X-Learner (two-stage imputation)
- Nonparametric Causal Inference — BART S-learner is common in Bayesian causal inference
See Also
- Metalearners for CATE — framework context
- T-Learner and Minimax Rate — next level: separate models per arm
- X-Learner — most sophisticated metalearner