Bayesian Difference in Differences

Summary

Difference in differences (DiD) is re-framed as a Bayesian linear model, yielding a full posterior distribution over the treatment effect and enabling principled counterfactual prediction. The core assumptions (parallel trends, no spillovers) are unchanged; the Bayesian approach adds uncertainty quantification and natural counterfactual inference.

When to Use DiD

See the frequentist treatment at Differences-in-Differences. DiD is appropriate when:

  • You have pre/post measurements
  • You have treatment and control groups (quasi-experimental — not randomised)
  • The parallel trends assumption holds: absent treatment, both groups would have evolved similarly

The Model

Expected outcome for observation :

ParameterMeaning
Control group intercept
Treatment group intercept offset from control
Common time trend (parallel trends assumption embodied here)
Causal impact of treatment (the quantity of interest)

Parallel trends assumption

The model forces a single shared slope for both groups. This is the parallel trends assumption in parametric form. If trends differ by group, the model is misspecified and is biased.

Note: treated is a binary variable that is 1 only for the treatment group after the intervention — it equals (t > t_intervention) * group.

PyMC Implementation

with pm.Model() as model:
    t       = pm.MutableData("t",       df["t"].values)
    treated = pm.MutableData("treated", df["treated"].values)
    group   = pm.MutableData("group",   df["group"].values)
 
    control_intercept     = pm.Normal("control_intercept", 0, 5)
    treat_intercept_delta = pm.Normal("treat_intercept_delta", 0, 1)
    trend                 = pm.Normal("trend", 0, 5)
    Δ                     = pm.Normal("Δ", 0, 1)
    sigma                 = pm.HalfNormal("sigma", 1)
 
    mu = control_intercept + (treat_intercept_delta * group) \
       + (trend * t) +* treated * group)
    pm.Normal("obs", mu, sigma, observed=df["y"])

Counterfactual Inference

A key advantage of the Bayesian approach: explicit counterfactual predictions.

# Counterfactual: treatment group NOT treated after intervention
with model:
    pm.set_data({
        "t":       t_post,
        "group":   [1]*len(t_post),
        "treated": [0]*len(t_post),   # <-- the counterfactual intervention
    })
    ppc_counterfactual = pm.sample_posterior_predictive(idata, var_names=["mu"])

The gap between the counterfactual prediction and observed post-treatment outcomes is , now expressed as a full posterior distribution.

Posterior of the Treatment Effect

az.plot_posterior(idata.posterior["Δ"], ref_val=true_delta)

Unlike the frequentist approach (which gives only a point estimate + CI), the Bayesian posterior can be used directly:

  • Probability that
  • Decision-theoretic loss functions
  • Full propagation into downstream calculations

Classic Application: Card & Krueger (1992)

New Jersey raised minimum wage; Pennsylvania did not. DiD estimates:

The Bayesian version gives a posterior over the employment effect, propagating uncertainty from both pre/post estimates.

Connections

See Also

Source

  • Difference in differences — PyMC example by Benjamin T. Vincent (2022); Card & Krueger minimum wage example
  • Recommended textbooks: The Effect (Huntington-Klein) and Causal Inference: The Mixtape (Cunningham)