Do-Calculus in Summary Causal DAGs: Soundness, Completeness, and ATE Computation

Summary

Section 6 proves that Pearl’s do-calculus rules remain sound (Theorem 6.1) and complete (Theorem 6.2) when applied to summary causal DAGs. This means causal effect identification — including ATE computation — can be performed directly on the summary DAG rather than the original. The proof relies on the equivalence between the RB of the summary DAG and its canonical DAG (Theorem 4.1). ATE computation on summary DAGs uses the same backdoor/do-calculus machinery as on original DAGs, with adjustment sets derived from s-separation.

Overview

The fundamental utility of causal DAGs is enabling causal effect identification via do-calculus. If summarization invalidated do-calculus, summary DAGs would be useless for inference. Section 6 shows this does not happen: the do-calculus rules remain sound and complete for summary causal DAGs, making them directly usable for causal inference.

Main Content

Background: Do-Calculus

Pearl’s do-calculus consists of three rules for manipulating interventional distributions $P (\cdot ∣ do (\cdot))$ . These rules allow deriving interventional distributions from observational data when identification is possible.

ATE via Do-Calculus

The Average Treatment Effect (ATE) of treatment $T$ on outcome $O$ is:
$ATE (T, O) = E [O ∣ do (T = 1)] - E [O ∣ do (T = 0)]$
To estimate ATE from observational data, one must control for confounding variables — variables that affect both $T$ and $O$ — using the backdoor criterion or do-calculus.

The backdoor criterion requires finding a set $Z$ of variables that blocks all backdoor paths from $T$ to $O$ (paths that start with an edge into $T$ ). If such $Z$ exists:
$P (O ∣ do (T)) = Z \sum P (O ∣ T, Z) P (Z)$

6.1 Do-Calculus Soundness in Summary Causal DAGs

Theorem 6.1 — Soundness of Do-Calculus for Summary Causal DAGs

Let $(H, f)$ be a summary causal DAG for $G$ , and let $G_{H}$ be the corresponding canonical causal DAG. Let $P (\cdot ∣ do (\cdot))$ denote the interventional distribution compatible with the summary causal DAG $(H, f)$ .

Pearl’s do-calculus rules are sound for summary causal DAGs: any causal relationship derived by applying the three do-calculus rules to $(H, f)$ is valid — i.e., it holds in the interventional distribution compatible with $(H, f)$ .

Proof sketch: The do-calculus rules are defined in terms of d-separation in the manipulated graph. By s-separation (Theorem 4.2), all CIs derived from $(H, f)$ hold in every compatible DAG $G \in {G_{i}}_{H}$ . Therefore, applying do-calculus to $(H, f)$ produces results that are valid across all compatible DAGs.

6.2 Do-Calculus Completeness in Summary Causal DAGs

Theorem 6.2 — Completeness of Do-Calculus for Summary Causal DAGs

Let $(H, f)$ be a summary causal DAG for $G$ , and let $X, Y \subseteq V (H)$ be disjoint sets. Let $Z (W)$ be the set of nodes that are not ancestors of any node in $W$ .

If $Y$ is d-connected to $Z$ in $(H, f)$ given $f^{- 1} (Z)$ in $G_{H}^{\overline{f (X)}}$ (the manipulated graph with edges into $f (X)$ removed), then there exists a causal DAG $G^{*} \in {G_{i}}_{H}$ compatible with $(H, f)$ , such that $f (Y)$ is d-connected to $f (Z)$ in $G^{* \overline{X}}$ . More formally:
$P (Y ∣ do (X)) is identifiable from (H, f) ⟺ P (f (Y) ∣ do (f (X))) is identifiable from G_{H}$
Significance: Do-calculus is also complete for summary DAGs — any identifiable causal effect in the summary DAG is also identifiable in the canonical causal DAG (and vice versa). This means the summary DAG does not lose identifiability relative to the canonical representation.

Proof sketch: Relies on the equivalence between the RB of $H$ and $G_{H}$ (Theorem 4.1) and the Shpitser-Pearl completeness of do-calculus for DAGs.

ATE Computation on Summary Causal DAGs

Practical causal inference using a summary DAG follows the same steps as on the original DAG:

ATE from Summary DAG (REDSHIFT Example)

Original DAG: 12 nodes, 23 edges (GPT-4 constructed, with 21 correct + 1 inverted + 1 missed edge).

Query: What is the causal effect of Query Template on Elapsed Time?

Using CaGReS summary ( $k = 5$ ):

Compute the summary DAG with CaGReS — 5 cluster nodes, ~9 edges.

Identify the treatment cluster containing Query Template and outcome cluster containing Elapsed Time.

Find the adjustment set $Z$ using the backdoor criterion on the summary DAG (via s-separation).

Estimate $ATE = E [O ∣ do (T = 1), Z] - E [O ∣ do (T = 0), Z]$ .

Robustness advantage: The summary DAG subsumes the extraneous edges added by GPT-4 (5 random edges). Estimating ATE on the 28-edge erroneous DAG would require adjusting for incorrect confounders. The summary DAG filters this noise: the erroneous edges from the original DAG’s monitoring view (e.g., Plan Time → Lock Wait Time) are subsumed by groupings that prioritize the core causal paths.

The Minimization Principle for ATE

ATE Minimization (Robustness)

To minimize the adjustment set $U$ in causal estimations from the summary DAG $(H, f)$ , the adjustment set should be ordered before all nodes in the treatment cluster in $G_{H}$ . Alternatively, an upper and lower bound on the adjustment set can be derived by considering all subsets of $U$ ‘s cluster in $H$ .

Connections

The soundness result validates CaGReS as a practical tool for causal inference: the summary DAG it produces is guaranteed to give valid causal conclusions.
Directly applies Frequentist Causal Estimation adjustment methods (backdoor criterion, IPW, outcome modeling) — these work identically on summary DAGs by Theorems 6.1–6.2.
The completeness result implies that if you can identify a causal effect in the original DAG, you can identify it in the summary DAG — no loss of identifiability from summarization.
Connects to Potential Outcomes Framework — the ATE formula $E [O ∣ do (T = 1)] - E [O ∣ do (T = 0)]$ is the do-calculus version of the potential outcomes ATE $E [Y (1) - Y (0)]$ .

Second Brain

Explorer

Do-Calculus in Summary Causal DAGs: Soundness, Completeness, and ATE Computation

Do-Calculus in Summary Causal DAGs: Soundness, Completeness, and ATE Computation

Overview

Main Content

Background: Do-Calculus

6.1 Do-Calculus Soundness in Summary Causal DAGs

6.2 Do-Calculus Completeness in Summary Causal DAGs

ATE Computation on Summary Causal DAGs

The Minimization Principle for ATE

Connections

See Also

Graph View

Table of Contents

Backlinks