Propensity Score and the Balancing Property

Summary

The propensity score is the probability of receiving treatment given the observed covariates. Rosenbaum and Rubin (1983) proved two key properties: it is a balancing score (conditional on the propensity score, the distribution of is the same in treated and control groups), and under strong ignorability it is sufficient to remove all bias from observed covariates — so matching/conditioning on the scalar propensity score is equivalent to conditioning on the full covariate vector.

Overview

The fundamental problem of causal inference is that for each individual only one of the potential outcomes , is observed. In a randomized experiment, assignment is independent of potential outcomes by design. In an observational study we must posit an assignment mechanism; the key assumption is strong ignorability (Rosenbaum and Rubin, 1983). The propensity score collapses a high-dimensional matching problem (Chapin’s curse of dimensionality) into a one-dimensional one while preserving the ability to balance covariates.

Main Content

Propensity score ^def-propensity

The propensity score for individual is the probability of receiving the treatment given the observed covariates:

In practice the true score is unknown outside randomized experiments and must be estimated, most commonly by logistic regression, or by nonparametric methods such as boosted CART / generalized boosted models (gbm).

Strong ignorability of treatment assignment ^thm-ignorability

Treatment assignment is strongly ignorable given covariates if:

  1. Unconfoundedness / “no hidden bias”: , and
  2. Positivity / overlap: for all .

The first component is also called “ignorable,” “no hidden bias,” or “unconfounded.” It is more plausible than it first sounds: matching/controlling for the observed covariates also controls for unobserved covariates insofar as they are correlated with observed ones, so the only unobserved covariates of concern are those unrelated to the observed ones. Sensitivity analysis assesses departures (see Covariate Balance Diagnostics).

Balancing property of the propensity score (Rosenbaum-Rubin 1983) ^thm-balancing

The propensity score is a balancing score: at each value of the propensity score, the distribution of the covariates that define it is the same in the treated and control groups,

Thus grouping individuals with similar propensity scores replicates a mini-randomized experiment with respect to the observed covariates.

Ignorability given the propensity score ^thm-ps-ignorability

If treatment assignment is strongly ignorable given , then it is also ignorable given the propensity score . Consequently, the difference in mean outcomes between treated and control individuals at a particular propensity-score value is an unbiased estimate of the treatment effect at that value. This justifies matching/conditioning on the scalar propensity score rather than on the full multivariate .

Variable selection for the propensity model ^def-variable-selection

Include all variables related to both treatment assignment and the outcome; researchers should be liberal in including potential confounders, since excluding a confounder is costly in bias while including an irrelevant variable costs only a small variance increase (in small samples, prioritize variables related to the outcome). The diagnostic target is covariate balance, not the logistic-regression coefficients — so standard model-fit statistics (c-statistic, stepwise selection, collinearity concerns) do not apply. Never include a variable that may have been affected by the treatment. Misestimating the propensity score matters less than misspecifying the outcome model, because the score is only a tool to obtain balance.

Linear propensity score ^def-linear-ps

Matching is often done on the linear (logit) propensity score rather than itself; Rosenbaum and Rubin (1985) and others found this particularly effective at reducing bias (the logit is closer to normally distributed, improving the affinely-invariant distance behavior).

Examples

Variance ratio guideline for calipers: if the variance of the linear propensity score in the treated group is twice that of the controls, a caliper of 0.2 SD of the linear propensity score removes about 98% of the bias of a normally distributed covariate (Rosenbaum and Rubin, 1985). When the treated-group variance is much larger, smaller calipers are needed; a default of 0.25 SD of the linear propensity score is generally suggested.

Connections

See Also