LLM Expert Elicitation for Bayesian Networks

Summary

The dual-LLM approach uses two LLMs sequentially: LLM-1 (GPT-4o) proposes causal relationships from domain knowledge, and LLM-2 (Claude) verifies those relationships, identifies confounders, and flags inconsistencies. This multi-expert AI elicitation produces Bayesian network structures with lower entropy and fewer logical inconsistencies than BIC-based or human expert methods.

Overview

Motivation: Traditional expert elicitation is slow and subjective. Statistical structure learning (BIC, MIIC) lacks domain knowledge and produces causally inconsistent graphs. LLMs function as proxies for domain experts, combining broad training knowledge with the ability to reason about causal plausibility.

Dual-LLM Architecture

Data → LLM-1 (GPT-4o): Find causal structure → LLM-2 (Claude): Evaluate relationships
                ↓                                          ↓
        Initial causal graph                  Verified/refined structure → BN III

Step 1 — LLM-1 Causal Discovery Prompt (Table 2):

The prompt instructs LLM-1 to act as a domain expert and:

Interpret statistically suggested causal relationships from a domain knowledge perspective
Assess plausibility in the context of the domain (e.g., sleep health)
Provide reasoned explanations for why relationships are natural or unexpected

Step 2 — LLM-2 Verification Prompt (Table 3):

LLM-2 receives LLM-1’s output and is instructed to:

Assess the plausibility of each proposed relationship
Identify confounding factors or alternative explanations
Suggest corrections or additional relationships where appropriate

This two-step design creates a “dual-expert” system: cross-checking reduces hallucination and improves consistency.

Key Implementation Choices

State-of-the-art prompt engineering applied to improve output quality
Sequential application of multiple LLMs: consistency verified across models; disagreements flagged as potential misinterpretations
SEM validation: final relationships verified using structural equation modeling before inclusion in BN

Confounders Identified by LLM-2

Variables not in the dataset but identified as likely confounders:

Psychological well-being (depression may affect sleep duration, stress, and physical activity)
Work schedule flexibility (affects both stress and sleep patterns)
Socioeconomic status and income (influences stress and work conditions)

Bidirectional Dependencies Resolved

LLM analysis identified several bidirectional relationships:

Sleep Duration ↔ Stress Level
Heart Rate ↔ Stress Level

Since BNs are DAGs (no cycles), logical reasoning was used to orient edges. Overall, 10 out of 12 LLM-proposed relationships were confirmed by LLM-2, giving high confidence.

Resulting Causal Structure (BN III)

Key edges in LLM-derived BN (all SEM-significant except where noted):

Parent → Child	Estimate	p-value
Daily_Steps → Stress_Level	0.5585	0
Sleep_Duration → Stress_Level	−0.7539	0
Gender → Occupation	−1.3380	0.0001
Occupation → Stress_Level	−0.0475	0.0008
Stress_Level → Quality_of_Sleep	−0.2180	0.0001
Physical_Activity → Quality_of_Sleep	0.0137	0.5989 (not significant)
Sleep_Duration → Quality_of_Sleep	0.5249	0

Comparison to Human and BIC Methods

See BN Construction Methods Comparison for full three-way comparison.

Connections

Compare to NLP Causal Extraction Methods — earlier rule-based approach to automated causal extraction (Yamashita 2020)
Builds on BN Construction Methods Comparison for evaluation context
Applied in LLM-BN Decision Support Application
BNs are DAGs — the graph structure produced is a directed acyclic graph; d-separation and the back-door criterion apply to these structures
Bidirectional dependency resolution (Sleep Duration ↔ Stress) uses the same reasoning as Canonical Causal DAGs (fork vs. pipe identification)
SEM validation step connects to Confirmatory Factor Analysis and SEM
Alternative prompting approach for causal structure: Code Prompts for Causal Structure

Second Brain

Explorer

LLM Expert Elicitation for Bayesian Networks

LLM Expert Elicitation for Bayesian Networks

Overview

Dual-LLM Architecture

Key Implementation Choices

Confounders Identified by LLM-2

Bidirectional Dependencies Resolved

Resulting Causal Structure (BN III)

Comparison to Human and BIC Methods

Connections

See Also

Graph View

Table of Contents

Backlinks