NOTEARS - Overview

Summary

NOTEARS (Zheng, Aragam, Ravikumar & Xing, 2018) reformulates the NP-hard problem of learning the structure of a directed acyclic graph (DAG / Bayesian network) from a combinatorial search over graphs into a purely continuous optimization problem over real matrices. The key device is a smooth, exact algebraic characterization of acyclicity — — which replaces the discrete acyclicity constraint with one equality constraint. The resulting program is solved with off-the-shelf numerical solvers (augmented Lagrangian + L-BFGS) in ~50 lines of Python, and matches or beats specialized state-of-the-art structure-learning algorithms.

Overview

Learning DAGs from data is NP-hard ([Chickering, 1996; Chickering et al., 2004]), mainly because of the acyclicity constraint: the space of DAGs on nodes is combinatorial and grows superexponentially in . Essentially all prior score-based methods cope with this by local heuristics — order search, greedy edge addition, coordinate descent — that add/remove one edge at a time and check acyclicity incrementally, often requiring structural assumptions (bounded in-degree or treewidth) that are impossible to verify in real data.

NOTEARS makes a fundamentally different move. It converts the combinatorial program (left) into a continuous one (right):

Here is the weighted adjacency matrix of a linear structural equation model (SEM), is a (regularized least-squares) score, and is the smooth acyclicity function. Because the two programs are equivalent, NOTEARS eliminates the need for any graph-specialized search machinery — it just runs a numerical solver over .

Main Content

The four contributions

  1. Smooth acyclicity function with computable derivatives over , replacing the combinatorial constraint with a smooth equality constraint. See Smooth Characterization of Acyclicity.
  2. An equality-constrained program (ECP) that jointly estimates the structure and parameters of a sparse DAG from (possibly high-dimensional) data, solvable by standard numerical solvers to stationarity. See NOTEARS Algorithm.
  3. Empirical effectiveness against state-of-the-art baselines (FGS, PC, GES, LiNGAM). See NOTEARS Experiments.
  4. Comparison to the exact global minimizer (GOBNILP): NOTEARS attains scores close to global optimality in practice despite only being guaranteed to find stationary points.

Why the name

NOTEARS = Non-combinatorial Optimization via Trace Exponential

and Augmented lagRangian for Structure learning. The method is deliberately simple — implementable in ~50 lines of Python — and requires no background in graphical models. Code: https://github.com/xunzheng/notears.

Intellectual lineage / analogy

The authors frame NOTEARS by analogy to undirected graphical models: there, recasting structure learning as a continuous log-det convex program ([Banerjee et al., 2008]) sparked rapid progress. DAG learning had never benefited similarly because of the acyclicity constraint. NOTEARS supplies the missing “closed-form, continuous program” for the directed case — though unlike the undirected case the resulting program is nonconvex.

Connections

  • Generalizes the modeling target: operates on the continuous space rather than the discrete DAG space — see DAG Structure Learning Problem.
  • Contrast with local search: prior methods (FGS, GES, hill-climbing, MMHC) do local edge-at-a-time updates; NOTEARS does global updates of the entire matrix each step.
  • Relates to linear SEM: the weighted adjacency matrix is the coefficient matrix of a linear SEM — see Confirmatory Factor Analysis and SEM for the Bayesian-SEM treatment of structural equation models, and Spurious Association and Confounds for DAG semantics in causal inference.

See Also