Bayesian Difference in Differences

Summary

Difference in differences (DiD) is re-framed as a Bayesian linear model, yielding a full posterior distribution over the treatment effect $Δ$ and enabling principled counterfactual prediction. The core assumptions (parallel trends, no spillovers) are unchanged; the Bayesian approach adds uncertainty quantification and natural counterfactual inference.

When to Use DiD

See the frequentist treatment at Differences-in-Differences. DiD is appropriate when:

You have pre/post measurements
You have treatment and control groups (quasi-experimental — not randomised)
The parallel trends assumption holds: absent treatment, both groups would have evolved similarly

The Model

Expected outcome for observation $i$ :

μ_{i} = β_{c} + (β_{Δ} \cdot group_{i}) + (trend \cdot t_{i}) + (Δ \cdot treated_{i} \cdot group_{i})

Parameter	Meaning
$β_{c}$	Control group intercept
$β_{Δ}$	Treatment group intercept offset from control
$trend$	Common time trend (parallel trends assumption embodied here)
$Δ$	Causal impact of treatment (the quantity of interest)

Parallel trends assumption

The model forces a single shared slope for both groups. This is the parallel trends assumption in parametric form. If trends differ by group, the model is misspecified and $Δ$ is biased.

Note: treated is a binary variable that is 1 only for the treatment group after the intervention — it equals (t > t_intervention) * group.

PyMC Implementation

with pm.Model() as model:
    t       = pm.MutableData("t",       df["t"].values)
    treated = pm.MutableData("treated", df["treated"].values)
    group   = pm.MutableData("group",   df["group"].values)
 
    control_intercept     = pm.Normal("control_intercept", 0, 5)
    treat_intercept_delta = pm.Normal("treat_intercept_delta", 0, 1)
    trend                 = pm.Normal("trend", 0, 5)
    Δ                     = pm.Normal("Δ", 0, 1)
    sigma                 = pm.HalfNormal("sigma", 1)
 
    mu = control_intercept + (treat_intercept_delta * group) \
       + (trend * t) + (Δ * treated * group)
    pm.Normal("obs", mu, sigma, observed=df["y"])

Counterfactual Inference

A key advantage of the Bayesian approach: explicit counterfactual predictions.

# Counterfactual: treatment group NOT treated after intervention
with model:
    pm.set_data({
        "t":       t_post,
        "group":   [1]*len(t_post),
        "treated": [0]*len(t_post),   # <-- the counterfactual intervention
    })
    ppc_counterfactual = pm.sample_posterior_predictive(idata, var_names=["mu"])

The gap between the counterfactual prediction and observed post-treatment outcomes is $Δ$ , now expressed as a full posterior distribution.

Posterior of the Treatment Effect

az.plot_posterior(idata.posterior["Δ"], ref_val=true_delta)

Unlike the frequentist approach (which gives only a point estimate + CI), the Bayesian $\hat{Δ}$ posterior can be used directly:

Probability that $Δ > 0$
Decision-theoretic loss functions
Full propagation into downstream calculations

Classic Application: Card & Krueger (1992)

New Jersey raised minimum wage; Pennsylvania did not. DiD estimates:

\hat{Δ}_{DiD} = (\overset{y}{ˉ}_{NJ,post} - \overset{y}{ˉ}_{NJ,pre}) - (\overset{y}{ˉ}_{PA,post} - \overset{y}{ˉ}_{PA,pre})

The Bayesian version gives a posterior over the employment effect, propagating uncertainty from both pre/post estimates.

Connections

Frequentist DiD: Differences-in-Differences — fixed effects, common trends, classical CI
Counterfactual Inference — related counterfactual framework (pre/post, same group, no control)
Bayesian Non-parametric Causal Inference — non-parametric alternative when parallel trends is implausible
The Experimental Ideal and The Selection Problem — why we need quasi-experimental designs

Source

Difference in differences — PyMC example by Benjamin T. Vincent (2022); Card & Krueger minimum wage example
Recommended textbooks: The Effect (Huntington-Klein) and Causal Inference: The Mixtape (Cunningham)

Second Brain

Explorer

Bayesian Difference in Differences

Bayesian Difference in Differences

When to Use DiD

The Model

PyMC Implementation

Counterfactual Inference

Posterior of the Treatment Effect

Classic Application: Card & Krueger (1992)

Connections

See Also

Source

Graph View

Table of Contents

Backlinks