Causal Discovery - Overview

Summary

Causal discovery (a.k.a. causal structure search/learning) infers causal relations among variables by analyzing the statistical properties of purely observational data, when controlled interventions or randomized experiments are too costly, slow, unethical, or impossible. The output is a directed graphical causal model (DGCM) — usually not a single DAG but an equivalence class of DAGs (a CPDAG or PAG). Glymour, Zhang & Spirtes (2019) review three families: constraint-based methods (PC, FCI), score-based methods (GES), and methods built on functional causal models (PNL).

Overview

Almost all of science is about identifying causal relations. Two procedures have existed since the 17th century: (1) manipulation — intervene and see what changes; and (2) observation — watch natural variation without intervening. Randomized experiments are the gold standard for the first, but are frequently infeasible. Causal discovery pursues the second route computationally, recovering causal structure from observational (or mixed observational/experimental) data.

The objects recovered are directed graphical causal models (DGCMs), equivalent to causal Bayesian networks / structural equation models (SEMs) / functional causal models (FCMs). A DGCM has three components:

a set of random variables (nodes);
a set of directed edges, where $X_{i} \to X_{j}$ asserts that fixing all other variables and exogenously varying $X_{i}$ would change $X_{j}$ — i.e., $X_{i}$ is a direct cause of $X_{j}$ ;
a joint distribution over the variables that satisfies the Markov condition relative to the graph.

A key conceptual point: causal discovery is “nothing but statistical estimation of parameters describing a graphical causal structure,” but with the twist that the structure itself (which edges exist and their orientation) is what is being estimated. Because several distinct DAGs can imply exactly the same conditional independencies, observational data alone generally cannot pin down a unique DAG — only a Markov equivalence class.

Main Content

Goal of causal structure learning

Given i.i.d. samples from an unknown joint distribution $P$ generated by an unknown DGCM $G$ , recover as much of $G$ as the data permit — typically a CPDAG (when no latent confounders, via PC/GES), a PAG (when latent confounders are possible, via FCI), or even a fully oriented DAG plus functional model (under FCM identifiability conditions, via LiNGAM/ANM/PNL).

Three families of methods

Constraint-based — exploit conditional independence constraints in the data to recover the equivalence class. Examples: PC (assumes no latent confounders) and FCI (tolerates latent confounders). Asymptotically correct given a reliable CI test.

Score-based — search the space of equivalence classes to optimize a model-fit score (e.g., BIC). Example: Greedy Equivalence Search (GES).

Functional causal model (FCM) based — assume $Y = f (X, ε)$ with $ε ⊥ X$ and exploit noise asymmetries to identify causal direction beyond the equivalence class. Examples: LiNGAM, ANM, post-nonlinear (PNL).

Comparison of the fundamental methods (Table 1)

Property PC FCI GES LiNGAM/PNL/ANM
Faithfulness required? Yes Yes Weaker condition No
Special assumptions on data distribution? No No Yes (often linear-Gaussian or multinomial) Yes (e.g., non-Gaussian / nonlinear)
Handles confounders? No Yes No No
Output Markov equivalence class (CPDAG) Partial ancestral graph (PAG) Markov equivalence class (CPDAG) DAG + causal model (under identifiability)

In the large-sample limit, PC and GES converge to the same Markov equivalence class.

Property	PC	FCI	GES	LiNGAM/PNL/ANM
Faithfulness required?	Yes	Yes	Weaker condition	No
Special assumptions on data distribution?	No	No	Yes (often linear-Gaussian or multinomial)	Yes (e.g., non-Gaussian / nonlinear)
Handles confounders?	No	Yes	No	No
Output	Markov equivalence class (CPDAG)	Partial ancestral graph (PAG)	Markov equivalence class (CPDAG)	DAG + causal model (under identifiability)

Practical challenges

Reliable causal discovery must also confront: causality in time series (subsampling/aggregation, Granger causality limits), measurement error, nonstationary/heterogeneous data, selection bias, missing data, and the deterministic case. The paper notes there is no consensus on parameter (penalty) choice and that bootstrapping edge frequencies is a useful diagnostic of stability — though stable output is not necessarily correct.

Examples

The authors illustrate biological applications: recovering the protein-signaling network of Sachs et al. (2005) with FASK, and ranking Arabidopsis flowering-time genes with a PC + IDA “Causal Stability” pipeline (top 25 genes contained 5 known causes plus 4 newly confirmed novel causes).
The contrast with NOTEARS - Overview: NOTEARS recasts DAG learning as a smooth continuous optimization with an algebraic acyclicity constraint, whereas the methods here are combinatorial (CI-test search or greedy score search). Both target the same underlying structure-learning problem (vault gap #9).

Connections

Builds on Directed Acyclic Graphs and Summary Causal DAGs (the DAG / d-separation machinery).
Foundational assumptions detailed in Markov and Faithfulness Assumptions.
Method deep-dives: PC Algorithm and Constraint-Based Discovery, GES and Score-Based Discovery, Functional Causal Models (LiNGAM, ANM).
Continuous-optimization alternative: NOTEARS - Overview.
Relation to expert/knowledge-driven structure building: BN Construction Methods Comparison.

Second Brain

Explorer

Causal Discovery - Overview

Causal Discovery - Overview

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks