Sensitivity Analysis in Observational Studies

Summary

Unconfoundedness is untestable in observational studies. Sensitivity analysis assesses how robust causal conclusions are to unmeasured confounding. Two main parametrization classes: (1) distributions involving unmeasured confounders , and (2) distributions of potential outcomes directly. The E-value (VanderWeele & Ding 2017) provides a model-free, easy-to-compute measure of robustness; copula-based methods provide a flexible Bayesian approach.

Overview

Unconfoundedness (no unmeasured confounding, ^def-ignorability) holds by design in randomized experiments but is fundamentally untestable in observational studies. Sensitivity analysis asks: How would conclusions change if there is an unmeasured confounder?

Two broad classes of sensitivity analysis methods differ in how they parametrize the confounding. Both can be used in a Bayesian framework.

Parametrization via Unmeasured Confounders ()

Setup (Cornfield et al. 1959; Rosenbaum & Rubin 1983)

Let be an unmeasured binary confounder such that conditional on , unconfoundedness holds: .

The joint distribution of all variables factorizes as:

Sensitivity parameters: the association between and treatment (), and between and outcome (). The observed-data distribution is identified; the sensitivity parameters are not.

Cornfield Inequality

Theorem: Cornfield Inequality

If a hidden confounder could explain away the observed association between treatment and outcome , the association between and , and between and , must both be at least as large as the observed association between and .

Originally motivated by the debate over whether smoking causes lung cancer — a hidden genetic factor would need an implausibly large association with both smoking and cancer to explain away the observed effect.

Rosenbaum & Rubin (1983) Logistic Model

For a binary and binary with binary (as a stratification variable):

  • Logistic model for , logistic for , Bernoulli for
  • Sensitivity parameters: log-odds ratio for and
  • Treat as fixed (Frequentist) or as priors (Bayesian); compute over a plausible range

Bayesian analogue (Dorie et al. 2016): Straightforward — place priors on sensitivity parameters, use data augmentation to impute , obtain posterior for .

E-Value (VanderWeele & Ding 2017)

Definition: E-Value

The E-value is the minimum strength of association (on the risk ratio scale) that an unmeasured confounder would need to have with both the treatment and outcome — above and beyond measured covariates — to fully explain away the observed treatment-outcome association.

Mathematically, define the sensitivity parameters as the treatment-confounder () association and the outcome-confounder () association. Based on Ding & VanderWeele (2016), the resulting threshold is the E-value.

Advantages:

  • Model-free — avoids specifying a model for the unmeasured confounder
  • Simple to calculate from summary statistics
  • Intuitive: a larger E-value means more robust conclusions
  • Avoids the “repeating the analysis” requirement of other sensitivity methods

Limitation: The analysis must make additional (arguably stronger) assumptions than the original analysis to assess unmeasured confounding.

Parametrization via Distributions of Potential Outcomes

Motivation

An alternative parametrization, motivated by an alternative mathematical expression of unconfoundedness:

This says the distributions of potential outcomes in the two treatment arms are comparable (for the same ). Sensitivity analysis in this class models the difference between and , rather than modeling an unobserved .

Copula-Based Sensitivity (Franks, D’Amour, Feller 2020)

Definition: Copula-Based Sensitivity Analysis

Franks et al. (2020) used a copula to connect the two identifiable marginal distributions of outcomes:

  • — identifiable from treated units
  • — identifiable from control units

The copula parameters are the sensitivity parameters — they parametrize the non-identifiable joint distribution. Bayesian inference places priors on the copula parameters.

  • Separates identifiable from non-identifiable parameters clearly — transparent parametrization
  • Bayesian framework naturally handles the non-identified copula parameters as sensitivity priors
  • Connects to Copula Estimation vault note

Rosenbaum’s Sensitivity Parameter

Rosenbaum’s original framework uses as the sensitivity parameter: the ratio of the odds of treatment for two units with the same observed covariates but potentially different unmeasured factors.

  • For a sharp null hypothesis of no treatment effect, repeat the Fisher randomization test with a matched sample to find the threshold at which the p-value crosses significance
  • Larger implies more robust conclusions
  • Grounded in Fisherian randomization inference; no natural Bayesian analogue

Comparison of Methods

MethodParametrizationBayesian-friendly?Key feature
Cornfield/Rosenbaum & RubinHidden binary Yes (prior on parameters)Directly models confounder
E-valueTreatment/outcome associationsYes (Bayesian E-value)Model-free; easy to report
Copula (Franks et al. 2020)Potential outcome distributionsYes (prior on copula params)Transparent parametrization
Rosenbaum’s Odds ratio for treatmentDifficultFisher randomization basis

Identifiability and Transparent Parametrization

A key insight from §6: sensitivity analysis is a form of transparent parametrization — explicitly separating identified parameters (which the data inform) from non-identified parameters (sensitivity parameters, which require prior information or a range of values).

This connects to the broader theme of identifiability in Bayesian causal inference (see ^warn-prior-dogmatism): even non-identified parameters have posteriors in the Bayesian framework, making it especially important to be explicit about what the data can and cannot tell us.

Connections

See Also