DAGs and Causal Identification
Summary
Directed Acyclic Graphs (DAGs) are the primary tool for representing, visualizing, and reasoning about causal structures in observational data. They encode causal assumptions, identify confounders, and guide which variables to condition on to isolate treatment effects. Key concepts include forks, chains, colliders, d-separation, and the backdoor adjustment formula.
Overview
Causal inference cannot be established from data alone — data must be supplemented by a causal model that encodes assumptions about the direction and nature of relationships between variables. DAGs provide this model in a compact, visual, and mathematically precise way.
Causal Inference
Causal inference is the process of reasoning and the application of conclusions drawn from cause-and-effect relationships between variables while taking into account potential confounding factors and biases.
Core vocabulary:
- Treatment (): The action or intervention whose effect we want to measure (the “independent variable”).
- Outcome (): The variable we want to understand (the “dependent variable”).
- Confounder: A variable that causally affects both treatment and outcome, creating spurious association between them.
Basic DAG Structure
A DAG consists of nodes (variables) connected by directed edges (arrows indicating causal direction). The graph is acyclic — no variable can cause itself through a chain of relationships.
Path in a DAG
A path is any sequence of causal links (arrows) connecting the treatment and outcome, regardless of arrow direction. In the DAG , the path is a valid path even though arrows point in multiple directions.
Confounding and De-confounding
Confounders
A confounder is a variable that causes both treatment and outcome : . The confounder creates a spurious association between and — observing the relationship without adjusting for will yield a biased estimate of the causal effect.
Approaches to de-confounding:
| Approach | When applicable | Mechanism |
|---|---|---|
| Randomized Control Trial (RCT) | Prospective study | Random assignment breaks link |
| Stratification | Observational, discrete confounders | Compute effect within strata of , then average |
| Conditioning / Backdoor adjustment | Observational, any | Mathematical adjustment using Pearl’s formula |
| Controlling | Real-world trial | Hold fixed experimentally |
Conditioning vs. Controlling
- Conditioning: Achieves the effect of an RCT mathematically on historical data using backdoor adjustment.
- Controlling: Literally holds a variable constant in a real-world trial.
Key insight from Pearl: We should condition only on confounders — conditioning on too many variables can introduce new biases (see: colliders below).
The Three Junction Patterns
Every node in the interior of a path (not treatment or outcome endpoints) participates in exactly one of three junction patterns:
Fork
is a common cause of and — a confounder. Conditioning on blocks the path; leaving unconditioned leaves the path open.
Shoe Size and Reading Ability
Age () causes both shoe size () and reading ability (): .
- Unconditioned: Strong spurious correlation between and (larger shoes → better readers — confounded by age).
- Conditioned on age: Flat regression line — no correlation between and within a fixed age group. This is Simpson’s Paradox: an association that appears or disappears when you aggregate/disaggregate.
Rule: Conditioning on the intermediate node in a fork blocks the path.
Chain
is a mediator — it transmits the causal effect of on . Conditioning on blocks the path (kills the causal transmission); leaving unconditioned leaves the path open.
Drug → Blood Pressure → Recovery
: the drug affects recovery through blood pressure.
- Unconditioned: Strong correlation between and .
- Conditioned on : and become uncorrelated — the mediating pathway is blocked.
Rule: Conditioning on the intermediate node in a chain blocks the path.
Note: Chains and forks produce identical data patterns — you cannot tell them apart from data alone, which is why DAGs are necessary.
Collider
is a collider — two causal arrows collide at . This is the critical pattern where intuition fails:
- Unconditioned: path is blocked (no spurious association between and ).
- Conditioned on : path is unblocked — conditioning on the collider creates a spurious association.
Sports Ability, Academic Ability, and Bursaries
: both sporting () and academic ability () cause bursary awards ().
- Unconditioned: No correlation between and (high ability in either is rare independently).
- Conditioned on : At a fixed bursary score, students with low sports ability tend to have high academic ability and vice versa — a negative correlation is induced.
Rule: Conditioning on the intermediate node in a collider unblocks the path.
Summary of Conditioning Rules
| Junction | Unconditioned | Conditioned |
|---|---|---|
| Fork | Open (spurious) | Blocked |
| Chain | Open | Blocked |
| Collider | Blocked | Open (spurious!) |
Backdoor Paths and Backdoor Adjustment
Backdoor Path
A backdoor path from treatment to outcome is any path starting with an arrow pointing into (i.e., ). These paths carry spurious information and must be blocked. A front-door path starts with an arrow out of (i.e., ). These paths carry the causal effect and must remain open.
Backdoor Adjustment Formula
To estimate the causal effect of on with confounder :
The operator represents an intervention (setting to a value), not mere observation. The right-hand side is expressed entirely in observable quantities.
Intuition: Stratify by , compute the → effect within each stratum, then average over the population distribution of .
Valid Adjustment Sets
Valid Adjustment Set
A set of nodes is a valid adjustment set if, when conditioned on, it:
- Blocks and closes all backdoor paths from to , and
- Leaves at least one front-door path from to unblocked and open.
Finding valid adjustment sets in practice: Use dagitty (R) or dowhy (Python):
library(dagitty)
adjustmentSets(dag) # returns all valid adjustment setsThe optimal adjustment set is a valid adjustment set with the minimum number of nodes.
Worked Example
For the DAG with paths:
- (backdoor)
- (backdoor; is a collider here)
- (backdoor; is a fork)
- (backdoor)
- (front-door; is a collider — must condition on to open this path)
Valid adjustment sets: , , .
d-Separation and d-Connection
d-Separation
A path is d-separated (blocked) by conditioning set if and only if:
- contains a fork or chain where , or
- contains a collider where and no descendant of is in . If none of these apply, the path is d-connected (unblocked).
Practical note: The descendant-of-collider condition (condition 2) is rare in practice but appears frequently in the technical literature. Conditioning on a descendant of a collider has the same effect as conditioning on the collider itself.
Terminology Reference
| Term | Meaning |
|---|---|
| Exogenous variable | Has no incoming arrows — causes others but is not caused within the model |
| Endogenous variable | Has at least one incoming arrow — its value is explained within the model |
| Unobserved confounder | A confounder that is not measured; creates an unresolvable backdoor path |
| Unconditional dependence | A path that is open without any conditioning |
| Conditional independence | A path that is blocked by conditioning on a set of nodes |
| Mediator | Middle node of a chain; transmits causal effect |
| Covariate | A variable that affects the outcome but is not of primary interest; may be added to improve precision but is not required for identification |
Connections
- The Selection Problem — DAGs make selection bias explicit by showing which variables confound treatment assignment
- The Experimental Ideal — RCTs eliminate backdoor paths by randomizing treatment; DAGs show why this works
- Bayesian Propensity Score Weighting — Uses DAGs to identify which confounders to include in the propensity score model
- Missing Data Models — Uses DAGs to distinguish MCAR/MAR/MNAR and identify valid imputation strategies
- Nonparametric Causal Inference — BART + propensity scores uses DAG-identified confounders
- Differences-in-Differences — Common trends assumption can be encoded as a DAG restriction
See Also
- Synthetic Control — Uses the potential outcomes framework; DAGs clarify the no-interference assumption
- Instrumental Variables — An instrument must be exogenous in the DAG (no backdoor path from to except through )
- Bayesian Propensity Score Weighting — Bayesian implementation of backdoor adjustment via IPTW