Directed Acyclic Graphs

Summary

A Directed Acyclic Graph (DAG) is a causal diagram representing proposed cause-and-effect relationships between variables. DAGs make confounding explicit and provide algorithmic tools (backdoor adjustment, d-separation) to identify valid adjustment sets — the minimum set of variables to condition on to isolate the causal effect of a treatment on an outcome.

Overview

Causal inference requires more than data — it requires a model of the causal structure. A DAG supplements data with encoded assumptions about which variables cause which, enabling us to:

  1. Identify confounding sources (backdoor paths)
  2. Determine the optimal adjustment set (which variables to control for)
  3. Derive the do-calculus formula for the intervention distribution

Causal Inference

Causal inference is the process of reasoning and drawing conclusions about cause-and-effect relationships between variables while accounting for potential confounding factors and biases. It asks: “What is the effect of on , independent of other variables that affect both?”

DAG Structure

Directed Acyclic Graph (DAG)

A DAG consists of:

  • Nodes: variables (treatments, outcomes, confounders, mediators)
  • Directed edges (arrows): causal relationships means ” causes
  • Acyclic: no variable can cause itself (no cycles)

Key roles:

  • Treatment (exposure): the intervention or variable whose causal effect we want to measure
  • Outcome: the variable we measure the effect on
  • Confounder: a variable that causes both treatment and outcome, creating spurious association

Confounders and the Identification Problem

Confounder

A confounder is a variable that causes both the treatment and the outcome (i.e., ). Its presence creates a spurious association between and that does not reflect a direct causal effect.

Example: Gender (G) affects both whether someone takes a drug (D) and their natural recovery (R). Simply observing D and R gives a biased estimate of the drug’s effect.

Approaches to de-confounding:

MethodApplicableMechanism
Randomized Controlled Trial (RCT)Future studiesBreaks by random assignment
StratificationObserved dataCompute effects separately per stratum of ; weight-average
Conditioning/Backdoor AdjustmentObserved data (DAG-based)Mathematical formula using observational data
Inverse Probability WeightingObserved dataSee Bayesian Inverse Probability Weighting
Instrumental VariablesObserved dataSee Instrumental Variables
Difference-in-DifferencesPanel dataSee Differences-in-Differences

Paths and Junctions

Path

A path is a sequence of edges connecting treatment and outcome in a DAG, regardless of the direction of the arrows. Example in a DAG : one path is .

Three Junction Patterns

Fork

A fork at node :

is a common cause of and . Without conditioning on , there is a spurious correlation between and (e.g., age → shoe size AND age → reading ability → shoe size correlates with reading ability).

Rule: Conditioning on blocks the path. Not conditioning leaves it open.

Chain

A chain at node :

mediates the effect of on . If we condition on the intermediate variable , we “freeze” it, blocking the flow of information from to .

Rule: Conditioning on blocks the path. Not conditioning leaves it open. (Same rule as forks — counterintuitive but true)

Collider

A collider at node :

is jointly caused by and . A collider is naturally blocked (no spurious association between and ). BUT conditioning on opens the path and creates a spurious association.

Rule: Conditioning on unblocks the path. Not conditioning leaves it blocked. (Opposite of forks and chains!)

Collider in Action: Sports College

  • Sporting ability (S) → Bursary (B) ← Academic ability (A)
  • is a collider: without conditioning, S and A are uncorrelated (athletes are not necessarily more/less academic)
  • After conditioning on (studying only bursary recipients): S and A become negatively correlated! Students with low sports ability must have high academic ability to receive the bursary, and vice versa.
  • This is Berkson’s bias (selection bias from conditioning on a collider).

Three Rules for Paths

Path Blocking Rules

  1. Fork (): conditioning on blocks the path; leaving unconditioned leaves it open
  2. Chain (): conditioning on blocks the path; leaving unconditioned leaves it open
  3. Collider (): conditioning on opens the path; leaving unconditioned leaves it blocked

Additionally: if a descendant of a collider is conditioned on, the same effect occurs (the path is opened).

Simpson’s Paradox

When a confounder creates a spurious correlation, we see an aggregate correlation that reverses or changes within subgroups. Fork structures generate Simpson’s paradox: an association that exists “overall” disappears when you condition on the fork node.

Backdoor Criterion and Adjustment Sets

Backdoor Path

A backdoor path from treatment to outcome is any path that starts with an arrow pointing into (i.e., a path that goes “backwards” from ). Equivalently, any path containing a fork with the fork pointing into .

More precisely (Pearl): A back-door path is any path from X to Y that starts with an arrow pointing into X. (The Book of Why, p158)

Front-door paths start with an arrow pointing out of .

Valid Adjustment Set

A valid adjustment set is any set of nodes such that, when conditioned on:

  1. All backdoor paths from to are blocked
  2. At least one front-door path from to remains open
  3. No new spurious paths are created

Pearl’s official rules:

  1. Block all spurious paths between and
  2. Leave all directed paths from to unperturbed (or open them if needed)
  3. Create no new spurious paths

There can be zero, one, or multiple valid adjustment sets. The optimal adjustment set minimizes the number of variables conditioned on.

Finding Adjustment Sets in a Complex DAG

Given paths:

  1. (backdoor, via fork at )
  2. (complex — is collider here!)
  3. (backdoor, via fork at )
  4. (backdoor)
  5. (front-door, but is a collider — naturally blocked)

Analysis:

  • Condition on : blocks paths 1, 3, 4. BUT opens path 2 (since is a collider in path 2).
  • Add conditioning on or : closes path 2.
  • Path 5 (): is a collider. Must condition on to open this front-door path!

Valid adjustment sets: and and Optimal: or (minimum size)

Backdoor Adjustment Formula

Backdoor Adjustment (do-Calculus)

If is a valid adjustment set, the causal effect of intervention on is:

The left side is an interventional distribution (what would happen if we set ). The right side is expressed entirely in terms of observational distributions (what we can measure from data).

This is what makes conditioning on a valid adjustment set equivalent to (simulating) a randomized controlled trial on historical observational data.

d-Separation and d-Connection

d-Separation

A path is d-separated by a set of conditioning nodes if and only if:

  1. contains a fork or chain such that middle node , OR
  2. contains a collider such that and no descendant of is in

d-separation = the path is blocked by the conditioning set

d-Connection

A path is d-connected by (i.e., unblocked) when:

  1. It contains a chain/fork and the middle node is NOT in , OR
  2. It contains a collider and the collider OR a descendant is in

Note: the empty conditioning set is valid — a path with only chains/forks and no colliders is naturally d-connected without conditioning.

Glossary of Additional Terms

TermDefinition
Exogenous variableA node with no incoming arrows; its causes are outside the model. Denoted in structural causal models
Endogenous variableA node with incoming arrows; its causes are inside the model. Denoted
Unobserved confounderA confounder not measured or included; shown as a dashed U-node with bidirected arrows to both treatment and outcome
Unconditional dependenceA path is naturally open with no conditioning
Unconditional independenceA path is naturally blocked (collider) with no conditioning
Conditional dependenceA path that has been opened by conditioning (on a collider)
Conditional independenceA path that has been closed by conditioning (on a fork/chain middle node)

Key Insights from DAG Analysis

  1. More conditioning is not always better: conditioning on a collider opens a spurious path; conditioning on a mediator blocks the causal pathway you want to measure
  2. The optimal adjustment set is minimal: condition on just enough to block all backdoors, no more
  3. DAGs make assumptions explicit: writing down a DAG requires committing to a causal story, making the assumptions falsifiable
  4. Data alone cannot reveal causality: DAGs must supplement the data with domain knowledge

Connections

See Also