Shaposhnyk 2025 - Overview
Summary
Shaposhnyk, Zahorska & Yanushkevich (2025) investigate whether LLMs (GPT-4o + Claude) can replace or augment human expert elicitation for constructing Bayesian networks. On the Sleep Health and Lifestyle dataset, LLM-generated BNs achieve lower entropy (more structured, less uncertain) than both BIC-based statistical methods and human expert-constructed BNs, and exhibit fewer logical inconsistencies.
Research Question and Contribution
Problem: Constructing probabilistic causal graph models (Bayesian networks) typically requires time-consuming human expert elicitation. Statistical/data-driven methods (BIC, PC algorithm, MIIC) can automate structure learning but lack domain knowledge and produce logically inconsistent relationships.
Contribution:
- A dual-LLM expert elicitation framework: one LLM (GPT-4o) proposes causal relationships; a second LLM (Claude) verifies and identifies confounders/inconsistencies
- Comparison of three BN construction strategies: human expert, information criteria (BIC/MIIC), and LLM
- Entropy-based evaluation showing LLM-BNs are more structured and consistent
- Case study demonstrating LLM-BN for health decision support
Published: arXiv:2504.10397v1 [cs.AI], 14 April 2025
Paper Structure
| Section | Content |
|---|---|
| §1 Introduction | Motivation; LLM as expert elicitation proxy |
| §2 Related Work | BN learning methods; LLM-based causal discovery |
| §3 Problem Formulation | Research question; contributions |
| §4 Methodology | Data selection, BN construction, expert elicitation via LLM |
| §5 Causal Modeling | Three BN structures compared (human, BIC, LLM); SEM validation; entropy |
| §6 BN for Decision Support | CPT construction; case studies on sleep/stress |
| §7 Conclusion | Summary; limitations (hallucination, small dataset) |
Key Results
- LLM-BNs: Lowest mean entropy (1.42), lowest min entropy (0.89) — most structured
- BIC-BNs: Mean entropy 1.48 — more uncertain, more logical inconsistencies (e.g., reversed causal directions)
- Human expert BNs: Mean entropy 1.48, median slightly lower (1.21) — suggesting expert networks are more structured in some nodes but not uniformly
- 10 out of 12 LLM-proposed relationships confirmed by second LLM, providing high confidence in the deduced structure
- SEM validation: all LLM-BN relationships statistically significant except Physical_Activity → Quality_of_Sleep
Limitations
- Dataset is small (400 rows × 13 columns) — some relationships may not be well-represented
- LLMs prone to hallucination; mitigated by dual-LLM cross-checking
- Bidirectional dependencies (Sleep Duration ↔ Stress Level, Heart Rate ↔ Stress Level) require manual direction resolution
- Contextual constraints, hallucinated dependencies, and training data biases may affect LLM-generated structures
See Also
- LLM Expert Elicitation for Bayesian Networks — dual-LLM methodology
- BN Construction Methods Comparison — human vs BIC vs LLM comparison
- Entropy-Based BN Evaluation — entropy metric
- LLM-BN Decision Support Application — health decision support case study
- Yamashita 2020 - Overview — complementary approach: interactive human elicitation