Functional Causal Models (LiNGAM, ANM)

Summary

Constraint-based and score-based methods can only recover a Markov equivalence class — they cannot orient an edge between two variables when no conditioning set distinguishes the directions. Functional causal model (FCM) methods break this tie by modeling each effect as with noise independent of cause , and exploiting a noise asymmetry: the independence holds for the true direction but is violated for the reverse. This identifies causal direction beyond the equivalence class. Key models: LiNGAM (linear, non-Gaussian), ANM (nonlinear additive noise), and PNL (post-nonlinear, the most general).

Overview

Recall the limitation: with only two variables and no usable conditional-independence relation, CI-based discovery gives no direction at all. FCM methods add assumptions about the functional form and noise so that the data distribution carries a footprint of direction. The general FCM:

where lies in a constrained function class and the map is invertible so can be recovered. The decisive property: the independence between estimated noise and the hypothesized cause holds for only one direction (under the model’s identifiability conditions), so one fits the FCM both ways, tests , and picks the direction giving independence.

Main Content

The independence-of-noise test for direction

Given two variables believed directly causally related with no confounder, fit the FCM for both directions and ; for each, test independence between the estimated noise and the hypothesized cause. The direction yielding an independent noise term is the plausible causal direction. Without extra assumptions on , an independent-noise representation exists for both directions (Hyvärinen-Pajunen; Zhang et al.), so a constrained function class is essential.

LiNGAM — Linear Non-Gaussian Acyclic Model

Two-variable form with . In matrix form , where can be permuted to strictly lower-triangular (acyclicity) and has independent components; equivalently .

  • Identifiability (Darmois–Skitovich / ICA). If at most one of is Gaussian, the causal direction is identifiable. Pure linear-Gaussian is the atypical case where the asymmetry vanishes (regression residuals are independent of the predictor in both directions).
  • Estimation. ICA-LiNGAM applies ICA then permutes/rescales to recover . As grows ICA may hit local optima; remedies impose sparsity on , or use DirectLiNGAM (recursive regression + independence tests for ordering), or the Two-Step method. Overcomplete ICA can even estimate latent confounders in the linear case.

ANM — nonlinear Additive Noise Model

with a (generally nonlinear) function and additive independent noise. Nonlinearity itself becomes a source of identifiability: for a nonlinear the additive-noise model typically holds in only one direction.

PNL — Post-Nonlinear model (most general)

with nonlinear, independent noise, and an invertible post-nonlinear distortion (modeling sensor/measurement nonlinearity). PNL subsumes LiNGAM and ANM as special cases and is identifiable in the generic case except for five specified situations (Zhang & Hyvärinen, 2009b) — the most notable non-identifiable case being the linear-Gaussian one.

Why FCMs beat the equivalence class — and their cost

FCMs add assumptions on the data distribution / functional form that CI relations do not, so they can output a fully oriented DAG (Table 1) under their identifiability conditions — orienting edges PC/GES must leave undirected. The tradeoffs: (1) they assume no confounder between the pair; (2) results can be misleading if the assumed function class is too restrictive to approximate the true mechanism; (3) nonlinear FCMs are less computationally efficient than the linear case; (4) discretization of continuous data tends to destroy the asymmetry, making discrete-case direction hard. A common hybrid: estimate the MEC with CI tests (even kernel-based nonparametric ones), then apply FCMs to orient the remaining undirected edges.

Examples

  • Figure 3 (linear case). 1,000 points of . Regressing on vs. on : when are Gaussian (case 1) residuals look independent both ways — direction not identifiable. When are uniform (case 2) or super-Gaussian/Laplace (case 3), the regression residual is independent of the predictor only in the correct (causal) direction, exposing the asymmetry .
  • Cramér’s decomposition (theoretical backing): a sum of independent real variables is Gaussian only if every summand is Gaussian — so non-Gaussian noise is “generic,” supporting LiNGAM’s applicability while warning that near-Gaussian errors make direction hard to call.
  • Biology (FASK, Two-Step). These procedures get an initial skeleton from adjacency search, then use non-Gaussian signal features to direct edges (FASK allows cycles/2-cycles), recovering much of the Sachs protein-signaling network.

Connections

See Also