Entropy-Based BN Evaluation
Summary
Entropy of each node’s posterior marginal distribution is used as an information-theoretic measure of structural quality across Bayesian networks. Lower entropy indicates more structured, less uncertain predictions. LLM-generated BNs achieve lower mean and minimum entropy than both BIC-based and human expert BNs.
Overview
When comparing BN structures generated by different methods, one needs an objective metric. The authors use Shannon entropy of the marginal posterior distribution at each node, then summarize across nodes.
Main Content
Definition: Node Entropy in a BN
For a discrete random variable with marginal probability distribution , the entropy is:
This is computed from the posterior marginal distribution at each node, given the observed data.
Interpretation:
- Low entropy → the node’s distribution is sharply peaked → the BN effectively learns structure, identifying clear patterns and dependencies
- High entropy → the node’s distribution is flat/uncertain → the BN captures less signal; more randomness in the model
Note: Entropy computed from the posterior marginal, not the prior — it reflects how the BN propagates evidence.
Summary Statistics Compared
Five descriptive statistics are computed per BN (across all nodes):
| Statistic | Meaning |
|---|---|
| Mean | Average entropy across all nodes |
| Min | Node with lowest entropy (most certain) |
| 25th percentile | Lower quartile |
| Median (50%) | Middle node |
| 75th percentile | Upper quartile |
| Max | Node with highest entropy (most uncertain) |
Results (Table 5 in paper)
| LLM | BIC | Expert | |
|---|---|---|---|
| Mean | 1.4237 | 1.4770 | 1.4775 |
| Min | 0.8897 | 0.9119 | 0.9282 |
| 25% | 1.1654 | 1.1919 | 1.1473 |
| 50% | 1.2884 | 1.3226 | 1.2075 |
| 75% | 1.4882 | 1.5410 | 1.5555 |
| Max | 2.9855 | 3.0144 | 3.2357 |
Key finding: LLM-BN has the lowest mean and minimum entropy, meaning it produces the most structured overall model. The Expert BN has a slightly lower median than BIC, suggesting that in some nodes the human expert is more precise, but the LLM is uniformly better across the distribution.
Interpretation
- The LLM-based approach leads to models with overall lower uncertainty — consistent with a “strong performer” against traditional methods
- BIC graphs show more logical inconsistencies (revealed by expert review) which likely inflate entropy by introducing backward or confounded edges
- The LLM’s ability to reason about causal direction (not just statistical association) produces a more coherent structure
Limitations of This Metric
- Entropy measured on a single small dataset (400 rows); results may not generalize
- A BN with overly confident (low-entropy) CPTs could be overfitting, not genuinely more informative
- Does not directly measure causal accuracy — a correctly structured BN could still have high entropy if the domain has genuine uncertainty
Connections
- Builds on BN Construction Methods Comparison — three-way structural comparison
- Used to evaluate BN III from LLM Expert Elicitation for Bayesian Networks
See Also
- Shaposhnyk 2025 - Overview — paper context
- BN Construction Methods Comparison — structural comparison