Propensity Score and the Balancing Property
Summary
The propensity score is the probability of receiving treatment given the observed covariates. Rosenbaum and Rubin (1983) proved two key properties: it is a balancing score (conditional on the propensity score, the distribution of is the same in treated and control groups), and under strong ignorability it is sufficient to remove all bias from observed covariates — so matching/conditioning on the scalar propensity score is equivalent to conditioning on the full covariate vector.
Overview
The fundamental problem of causal inference is that for each individual only one of the potential outcomes , is observed. In a randomized experiment, assignment is independent of potential outcomes by design. In an observational study we must posit an assignment mechanism; the key assumption is strong ignorability (Rosenbaum and Rubin, 1983). The propensity score collapses a high-dimensional matching problem (Chapin’s curse of dimensionality) into a one-dimensional one while preserving the ability to balance covariates.
Main Content
Propensity score ^def-propensity
The propensity score for individual is the probability of receiving the treatment given the observed covariates:
In practice the true score is unknown outside randomized experiments and must be estimated, most commonly by logistic regression, or by nonparametric methods such as boosted CART / generalized boosted models (gbm).
Strong ignorability of treatment assignment ^thm-ignorability
Treatment assignment is strongly ignorable given covariates if:
- Unconfoundedness / “no hidden bias”: , and
- Positivity / overlap: for all .
The first component is also called “ignorable,” “no hidden bias,” or “unconfounded.” It is more plausible than it first sounds: matching/controlling for the observed covariates also controls for unobserved covariates insofar as they are correlated with observed ones, so the only unobserved covariates of concern are those unrelated to the observed ones. Sensitivity analysis assesses departures (see Covariate Balance Diagnostics).
Balancing property of the propensity score (Rosenbaum-Rubin 1983) ^thm-balancing
The propensity score is a balancing score: at each value of the propensity score, the distribution of the covariates that define it is the same in the treated and control groups,
Thus grouping individuals with similar propensity scores replicates a mini-randomized experiment with respect to the observed covariates.
Ignorability given the propensity score ^thm-ps-ignorability
If treatment assignment is strongly ignorable given , then it is also ignorable given the propensity score . Consequently, the difference in mean outcomes between treated and control individuals at a particular propensity-score value is an unbiased estimate of the treatment effect at that value. This justifies matching/conditioning on the scalar propensity score rather than on the full multivariate .
Variable selection for the propensity model ^def-variable-selection
Include all variables related to both treatment assignment and the outcome; researchers should be liberal in including potential confounders, since excluding a confounder is costly in bias while including an irrelevant variable costs only a small variance increase (in small samples, prioritize variables related to the outcome). The diagnostic target is covariate balance, not the logistic-regression coefficients — so standard model-fit statistics (c-statistic, stepwise selection, collinearity concerns) do not apply. Never include a variable that may have been affected by the treatment. Misestimating the propensity score matters less than misspecifying the outcome model, because the score is only a tool to obtain balance.
Linear propensity score ^def-linear-ps
Matching is often done on the linear (logit) propensity score rather than itself; Rosenbaum and Rubin (1985) and others found this particularly effective at reducing bias (the logit is closer to normally distributed, improving the affinely-invariant distance behavior).
Examples
Variance ratio guideline for calipers: if the variance of the linear propensity score in the treated group is twice that of the controls, a caliper of 0.2 SD of the linear propensity score removes about 98% of the bias of a normally distributed covariate (Rosenbaum and Rubin, 1985). When the treated-group variance is much larger, smaller calipers are needed; a default of 0.25 SD of the linear propensity score is generally suggested.
Connections
- Foundational layer for Propensity Score Matching - Overview.
- The unconfoundedness component is exactly the Conditional Independence Assumption; defined over the Potential Outcomes Framework.
- The positivity component motivates Common Support and Overlap.
- The balancing property is what is checked in Covariate Balance Diagnostics and exploited in Matching Methods and Distance Measures.
- Matching on observed confounders addresses Omitted Variables Bias and The Selection Problem only to the extent ignorability holds.
- The same score underlies weighting estimators: Bayesian Propensity Score Weighting, Bayesian Inverse Probability Weighting, Frequentist Causal Estimation.