Directed Acyclic Graphs
Summary
A Directed Acyclic Graph (DAG) is a causal diagram representing proposed cause-and-effect relationships between variables. DAGs make confounding explicit and provide algorithmic tools (backdoor adjustment, d-separation) to identify valid adjustment sets — the minimum set of variables to condition on to isolate the causal effect of a treatment on an outcome.
Overview
Causal inference requires more than data — it requires a model of the causal structure. A DAG supplements data with encoded assumptions about which variables cause which, enabling us to:
- Identify confounding sources (backdoor paths)
- Determine the optimal adjustment set (which variables to control for)
- Derive the do-calculus formula for the intervention distribution
Causal Inference
Causal inference is the process of reasoning and drawing conclusions about cause-and-effect relationships between variables while accounting for potential confounding factors and biases. It asks: “What is the effect of on , independent of other variables that affect both?”
DAG Structure
Directed Acyclic Graph (DAG)
A DAG consists of:
- Nodes: variables (treatments, outcomes, confounders, mediators)
- Directed edges (arrows): causal relationships means ” causes ”
- Acyclic: no variable can cause itself (no cycles)
Key roles:
- Treatment (exposure): the intervention or variable whose causal effect we want to measure
- Outcome: the variable we measure the effect on
- Confounder: a variable that causes both treatment and outcome, creating spurious association
Confounders and the Identification Problem
Confounder
A confounder is a variable that causes both the treatment and the outcome (i.e., ). Its presence creates a spurious association between and that does not reflect a direct causal effect.
Example: Gender (G) affects both whether someone takes a drug (D) and their natural recovery (R). Simply observing D and R gives a biased estimate of the drug’s effect.
Approaches to de-confounding:
| Method | Applicable | Mechanism |
|---|---|---|
| Randomized Controlled Trial (RCT) | Future studies | Breaks by random assignment |
| Stratification | Observed data | Compute effects separately per stratum of ; weight-average |
| Conditioning/Backdoor Adjustment | Observed data (DAG-based) | Mathematical formula using observational data |
| Inverse Probability Weighting | Observed data | See Bayesian Inverse Probability Weighting |
| Instrumental Variables | Observed data | See Instrumental Variables |
| Difference-in-Differences | Panel data | See Differences-in-Differences |
Paths and Junctions
Path
A path is a sequence of edges connecting treatment and outcome in a DAG, regardless of the direction of the arrows. Example in a DAG : one path is .
Three Junction Patterns
Fork
A fork at node :
is a common cause of and . Without conditioning on , there is a spurious correlation between and (e.g., age → shoe size AND age → reading ability → shoe size correlates with reading ability).
Rule: Conditioning on blocks the path. Not conditioning leaves it open.
Chain
A chain at node :
mediates the effect of on . If we condition on the intermediate variable , we “freeze” it, blocking the flow of information from to .
Rule: Conditioning on blocks the path. Not conditioning leaves it open. (Same rule as forks — counterintuitive but true)
Collider
A collider at node :
is jointly caused by and . A collider is naturally blocked (no spurious association between and ). BUT conditioning on opens the path and creates a spurious association.
Rule: Conditioning on unblocks the path. Not conditioning leaves it blocked. (Opposite of forks and chains!)
Collider in Action: Sports College
- Sporting ability (S) → Bursary (B) ← Academic ability (A)
- is a collider: without conditioning, S and A are uncorrelated (athletes are not necessarily more/less academic)
- After conditioning on (studying only bursary recipients): S and A become negatively correlated! Students with low sports ability must have high academic ability to receive the bursary, and vice versa.
- This is Berkson’s bias (selection bias from conditioning on a collider).
Three Rules for Paths
Path Blocking Rules
- Fork (): conditioning on blocks the path; leaving unconditioned leaves it open
- Chain (): conditioning on blocks the path; leaving unconditioned leaves it open
- Collider (): conditioning on opens the path; leaving unconditioned leaves it blocked
Additionally: if a descendant of a collider is conditioned on, the same effect occurs (the path is opened).
Simpson’s Paradox
When a confounder creates a spurious correlation, we see an aggregate correlation that reverses or changes within subgroups. Fork structures generate Simpson’s paradox: an association that exists “overall” disappears when you condition on the fork node.
Backdoor Criterion and Adjustment Sets
Backdoor Path
A backdoor path from treatment to outcome is any path that starts with an arrow pointing into (i.e., a path that goes “backwards” from ). Equivalently, any path containing a fork with the fork pointing into .
More precisely (Pearl): A back-door path is any path from X to Y that starts with an arrow pointing into X. (The Book of Why, p158)
Front-door paths start with an arrow pointing out of .
Valid Adjustment Set
A valid adjustment set is any set of nodes such that, when conditioned on:
- All backdoor paths from to are blocked
- At least one front-door path from to remains open
- No new spurious paths are created
Pearl’s official rules:
- Block all spurious paths between and
- Leave all directed paths from to unperturbed (or open them if needed)
- Create no new spurious paths
There can be zero, one, or multiple valid adjustment sets. The optimal adjustment set minimizes the number of variables conditioned on.
Finding Adjustment Sets in a Complex DAG
Given paths:
- (backdoor, via fork at )
- (complex — is collider here!)
- (backdoor, via fork at )
- (backdoor)
- (front-door, but is a collider — naturally blocked)
Analysis:
- Condition on : blocks paths 1, 3, 4. BUT opens path 2 (since is a collider in path 2).
- Add conditioning on or : closes path 2.
- Path 5 (): is a collider. Must condition on to open this front-door path!
Valid adjustment sets: and and Optimal: or (minimum size)
Backdoor Adjustment Formula
Backdoor Adjustment (do-Calculus)
If is a valid adjustment set, the causal effect of intervention on is:
The left side is an interventional distribution (what would happen if we set ). The right side is expressed entirely in terms of observational distributions (what we can measure from data).
This is what makes conditioning on a valid adjustment set equivalent to (simulating) a randomized controlled trial on historical observational data.
d-Separation and d-Connection
d-Separation
A path is d-separated by a set of conditioning nodes if and only if:
- contains a fork or chain such that middle node , OR
- contains a collider such that and no descendant of is in
d-separation = the path is blocked by the conditioning set
d-Connection
A path is d-connected by (i.e., unblocked) when:
- It contains a chain/fork and the middle node is NOT in , OR
- It contains a collider and the collider OR a descendant is in
Note: the empty conditioning set is valid — a path with only chains/forks and no colliders is naturally d-connected without conditioning.
Glossary of Additional Terms
| Term | Definition |
|---|---|
| Exogenous variable | A node with no incoming arrows; its causes are outside the model. Denoted in structural causal models |
| Endogenous variable | A node with incoming arrows; its causes are inside the model. Denoted |
| Unobserved confounder | A confounder not measured or included; shown as a dashed U-node with bidirected arrows to both treatment and outcome |
| Unconditional dependence | A path is naturally open with no conditioning |
| Unconditional independence | A path is naturally blocked (collider) with no conditioning |
| Conditional dependence | A path that has been opened by conditioning (on a collider) |
| Conditional independence | A path that has been closed by conditioning (on a fork/chain middle node) |
Key Insights from DAG Analysis
- More conditioning is not always better: conditioning on a collider opens a spurious path; conditioning on a mediator blocks the causal pathway you want to measure
- The optimal adjustment set is minimal: condition on just enough to block all backdoors, no more
- DAGs make assumptions explicit: writing down a DAG requires committing to a causal story, making the assumptions falsifiable
- Data alone cannot reveal causality: DAGs must supplement the data with domain knowledge
Connections
- The Selection Problem — DAGs formalize the problem of selection bias
- The Experimental Ideal — RCTs break all backdoor paths; DAGs show why
- Bayesian Inverse Probability Weighting — IPW uses the adjustment set from a DAG
- Nonparametric Causal Inference — BART + propensity scores also uses DAGs to identify confounders
- Instrumental Variables — IVs are variables that affect treatment but have no direct arrow to the outcome
- Differences-in-Differences — DiD adjusts for time-invariant confounders not in the DAG
See Also
- The Selection Problem — Potential outcomes framework for the same problem
- Bayesian Inverse Probability Weighting — Practical application of DAG adjustment sets
- Instrumental Variables — When backdoor adjustment is insufficient (unobserved confounders)
- Regression and the CEF — regression is the estimator applied once a valid adjustment set is identified from the DAG
- Conditional Independence Assumption — CIA is the statistical assumption that a valid DAG adjustment set justifies
- PC Algorithm and Constraint-Based Discovery — learning the DAG from data (vs reasoning about a given DAG)