Smooth Characterization of Acyclicity
Summary
The central technical contribution of NOTEARS. It constructs a function whose zero level set exactly characterizes acyclic graphs and which is smooth with an easily computed gradient:
The construction proceeds in two steps — first for binary matrices via the matrix exponential (Prop. 2), then extended to real weighted matrices via the Hadamard product (Theorem 1). This replaces the combinatorial DAG constraint with a single smooth equality constraint, making program (3) amenable to black-box optimization.
Overview
To make program (3) (see DAG Structure Learning Problem) solvable by standard optimizers, NOTEARS replaces the combinatorial constraint with one smooth equality constraint . The function must satisfy four desiderata:
Desiderata for the acyclicity function (NOTEARS §3)
(a) if and only if is acyclic (); (b) the values of quantify the “DAG-ness” of the graph (how far from acyclic); (c) is smooth; (d) and its derivatives are easy to compute.
Property (b) is useful for diagnostics. Naïve “distance to ” notions (e.g. min distance, or sum of edge weights on cyclic paths) violate (c) [non-smooth] or (d) [hard].
The build-up: Prop. 1 (infinite series, conceptually clean but needs spectral radius ) → Prop. 2 (matrix exponential, numerically stable, but binary/discrete domain) → Theorem 1 (real matrices, satisfies all of (a)–(d)).
Main Content
Step 1 — Binary adjacency matrices
Proposition 1: Infinite-series characterization (NOTEARS §3.1, Eq. 5)
Suppose and its spectral radius . Then is a DAG if and only if
Proof. The fact that counts the number of length- closed walks in the directed graph. An acyclic graph has for all . So has no cycles iff . Then since the Neumann series converges:
The result follows.
Why Prop. 1 is impractical
The condition is strong — automatically true for DAGs but false in general, and projecting onto it is nontrivial. Using a finite series avoids the spectral-radius condition but is numerically unstable: entries of can exceed machine precision for even small . A characterization that is both universally valid and numerically stable is needed.
Proposition 2: Matrix-exponential characterization (NOTEARS §3.1, Eq. 6)
A binary matrix is a DAG if and only if
Proof. As in Prop. 1, has no cycles iff for all and all , which holds iff .
Why the matrix exponential is the right tool
The matrix exponential is well-defined for all square matrices (everywhere convergent). The added bonus over : the factor re-weights the count of length- closed walks, taming the rapid (ill-conditioned) growth of as the number of edges grows with . Still, Prop. 2 is defined on a discrete domain , so it is not yet smooth.
Step 2 — Real weighted matrices (the main result)
The characterization fails for an arbitrary real , but holds for any nonnegative weighted matrix (the same proof goes through). To handle both signs, replace by the Hadamard product (entrywise ):
Theorem 1: Smooth acyclicity characterization (NOTEARS §3.2, Eqs. 7-8)
A matrix is a DAG if and only if
where is the Hadamard (entrywise) product and is the matrix exponential. Moreover, has the simple gradient
and satisfies all desiderata (a)–(d).
Proof idea & interpretation
The proof of (7) mirrors (6); desiderata (c)–(d) follow from the gradient (8). For (b): the power series of counts closed walks in ; replacing with counts weighted closed walks where each edge has weight . Thus larger means either (a) has more cycles than , or (b) the cycles in are more heavily weighted. Also for all (every series term is nonnegative), so the space of DAGs is exactly the set of global minima of .
Computational and theoretical takeaways
- Cost: evaluating and only requires a matrix exponential, computable in via well-studied scaling-and-squaring algorithms ([Al-Mohy & Higham, 2009]), available in any scientific-computing library. This cost is the main computational bottleneck of NOTEARS.
- Stationarity caveat: because is nonconvex, is not equivalent to the first-order condition — so a black-box solver finds stationary points, not guaranteed global minima.
- Novelty: the link between traces of matrix powers and graph cycles is classical ([Harary & Manvel, 1971]), but the authors note this smooth characterization of acyclicity had not appeared in the DAG-learning literature before.
Examples
Why and not (sign cancellation)
Setup. Consider a 2-cycle with weights , .
Problem with raw . A length-2 closed walk contributes to . Negative and positive cycle contributions can cancel, so could hold even when a cycle exists — the characterization breaks for signed matrices.
Fix. Using the same walk contributes . All cyclic contributions are strictly positive, so exactly when a cycle is present and exactly when acyclic.
Interpretation. Squaring the entries makes the “weighted closed-walk count” a faithful, sign- insensitive cycle detector — recovering desideratum (a) over all of .
Connections
- Enables the optimization: with smooth, the DAG constraint becomes the equality-constrained program (ECP) solved in NOTEARS Algorithm.
- Quantifies DAG-ness: property (b) underwrites the thresholding step in the algorithm — near-machine-precision leaves only tiny cycle-inducing edges, removable by a small threshold.
- Foundational influence: this trace-exponential trick seeded a family of later “differentiable DAG learning” methods (DAG-GNN, GraN-DAG, GOLEM) — see NOTEARS - Overview.
See Also
- DAG Structure Learning Problem — the constraint this note smooths
- NOTEARS Algorithm — uses and inside an augmented Lagrangian
- NOTEARS - Overview — paper-level context