Entropy-Based BN Evaluation

Summary

Entropy of each node’s posterior marginal distribution is used as an information-theoretic measure of structural quality across Bayesian networks. Lower entropy indicates more structured, less uncertain predictions. LLM-generated BNs achieve lower mean and minimum entropy than both BIC-based and human expert BNs.

Overview

When comparing BN structures generated by different methods, one needs an objective metric. The authors use Shannon entropy of the marginal posterior distribution at each node, then summarize across nodes.

Main Content

Definition: Node Entropy in a BN

For a discrete random variable with marginal probability distribution , the entropy is:

This is computed from the posterior marginal distribution at each node, given the observed data.

Interpretation:

  • Low entropy → the node’s distribution is sharply peaked → the BN effectively learns structure, identifying clear patterns and dependencies
  • High entropy → the node’s distribution is flat/uncertain → the BN captures less signal; more randomness in the model

Note: Entropy computed from the posterior marginal, not the prior — it reflects how the BN propagates evidence.

Summary Statistics Compared

Five descriptive statistics are computed per BN (across all nodes):

StatisticMeaning
MeanAverage entropy across all nodes
MinNode with lowest entropy (most certain)
25th percentileLower quartile
Median (50%)Middle node
75th percentileUpper quartile
MaxNode with highest entropy (most uncertain)

Results (Table 5 in paper)

LLMBICExpert
Mean1.42371.47701.4775
Min0.88970.91190.9282
25%1.16541.19191.1473
50%1.28841.32261.2075
75%1.48821.54101.5555
Max2.98553.01443.2357

Key finding: LLM-BN has the lowest mean and minimum entropy, meaning it produces the most structured overall model. The Expert BN has a slightly lower median than BIC, suggesting that in some nodes the human expert is more precise, but the LLM is uniformly better across the distribution.

Interpretation

  • The LLM-based approach leads to models with overall lower uncertainty — consistent with a “strong performer” against traditional methods
  • BIC graphs show more logical inconsistencies (revealed by expert review) which likely inflate entropy by introducing backward or confounded edges
  • The LLM’s ability to reason about causal direction (not just statistical association) produces a more coherent structure

Limitations of This Metric

  • Entropy measured on a single small dataset (400 rows); results may not generalize
  • A BN with overly confident (low-entropy) CPTs could be overfitting, not genuinely more informative
  • Does not directly measure causal accuracy — a correctly structured BN could still have high entropy if the domain has genuine uncertainty

Connections

See Also