Knowledge Elicitation for Causal Models

Routing Summary

This folder covers methods for eliciting and constructing causal knowledge from human experts or LLMs, plus LLM causal reasoning methods. Contains 15 notes from four papers (Yamashita 2020, Shaposhnyk 2025, Liu 2025).

Concept Map

ConceptNoteTypeDepends OnKey Result
Cause-precondition-effect modelCausal Model - Cause Precondition EffectdefinitionEnables indirect causality via preconditions; maps 4 FRAM aspects to precondition
Interactive workshop methodInteractive Knowledge Elicitation MethodconceptCausal Model - Cause Precondition Effect20 events, 15 preconditions elicited from 2 participants
NLP causal extractionNLP Causal Extraction MethodsconceptCausal Model - Cause Precondition EffectMethod A: 46/100; Method B: 63/100; combined: 87/100
Dual-LLM elicitationLLM Expert Elicitation for Bayesian NetworksconceptBN Construction Methods Comparison10/12 LLM relationships confirmed; lower entropy than BIC
BN construction methodsBN Construction Methods ComparisonconceptLLM-BN: mean entropy 1.42 vs BIC 1.48 vs Expert 1.48
BN entropy evaluationEntropy-Based BN EvaluationdefinitionBN Construction Methods Comparison; lower = more structured
BN decision supportLLM-BN Decision Support ApplicationexampleLLM Expert Elicitation for Bayesian NetworksP(High stress | poor sleep, nurse) = 41.56%
Abductive NLG + Counterfactual reasoning tasksLLM Causal Reasoning TasksdefinitionLiu 2025 - OverviewαNLG: max ; TimeTravel: min-edit ending under counterfactual
Code prompt causal structure encodingCode Prompts for Causal StructureconceptLLM Causal Reasoning Tasksif hypothesis(): ending() encodes causal DAG edge
Code vs text prompt evaluationCode vs Text Prompt EvaluationexampleCode Prompts for Causal StructureCode prompts +5.1% BLEU, +5.3% BERTScore; Code-LLMs +14% BLEU avg over paired general-purpose
Code prompt aspects (information/structure/format/language)Code Prompt Aspects AnalysisconceptCode vs Text Prompt EvaluationConditional structure is critical: removing it causes ~10% BLEU / ~6% BERTScore drop
Fine-tuning on conditional statement corpusFine-tuning on Conditional StatementsconceptCode Prompt Aspects Analysis4,085 CodeAlpaca instances; gains transfer to text prompts; largest gain in first 20% of data

Notes

  • Yamashita 2020 - Overview — CONTAINS: paper overview, 10-page HCII 2020 conference paper; NLP+GUI knowledge elicitation for disaster scenarios
  • Causal Model - Cause Precondition Effect — CONTAINS: definitions of cause, precondition, effect; comparison to FRAM; countermeasure elicitation strategy; worked example (blackout/medical equipment)
  • Interactive Knowledge Elicitation Method — CONTAINS: 4-phase workshop procedure; GUI design; preliminary experiment results (20 events, 15 preconditions)
  • NLP Causal Extraction Methods — CONTAINS: Method A (clue expressions, 5 sentence patterns); Method B (sentence decomposition); Word2Vec deduplication; verification results (46/63/87)
  • Shaposhnyk 2025 - Overview — CONTAINS: paper overview, arXiv 2025; LLM as proxy expert for BN construction
  • LLM Expert Elicitation for Bayesian Networks — CONTAINS: dual-LLM architecture (GPT-4o + Claude); prompt templates; identified confounders; SEM-validated BN III structure
  • BN Construction Methods Comparison — CONTAINS: BN I (human expert), BN II (BIC/MIIC), BN III (LLM) structures; SEM validation results; entropy comparison table
  • Entropy-Based BN Evaluation — CONTAINS: Shannon entropy definition for BN nodes; full descriptive statistics table (LLM/BIC/Expert); interpretation
  • LLM-BN Decision Support Application — CONTAINS: CPT construction; Bayes formula inference; worked nurse/doctor stress examples
  • Liu 2025 - Overview — CONTAINS: 4 research questions; key contributions (code prompts, Code-LLM advantage, structural analysis, fine-tuning); experimental setup overview; results summary table
  • LLM Causal Reasoning Tasks — CONTAINS: αNLG task definition (abductive, maximize ); TimeTravel task definition (counterfactual, minimize edit); causal DAG mapping; evaluation metrics
  • Code Prompts for Causal Structure — CONTAINS: 4 code properties for causality; abductive code template; counterfactual if/elif template; code-to-DAG mapping table; comparison to text prompts
  • Code vs Text Prompt Evaluation — CONTAINS: Tables 2–6; zero-shot + one-shot results; format perturbation results; human evaluation (Table 6); alignment tax discussion
  • Code Prompt Aspects Analysis — CONTAINS: 4 intervention dimensions (information/structure/format/language); Table 8 full results; key finding that conditional structure is critical
  • Fine-tuning on Conditional Statements — CONTAINS: CodeAlpaca-20k filtering procedure; training setup; Table 10 gains; data fraction analysis (Fig. 5); conditional vs uniform baseline comparison

Sources

Cross-Cutting Theme

These papers address two related challenges around LLMs and causal knowledge:

  • Yamashita: human-in-the-loop with NLP assistance (semi-automated structure elicitation)
  • Shaposhnyk: fully automated via LLMs acting as domain experts (BN construction)
  • Liu: using code prompts with conditional statements to elicit and improve LLMs’ own causal reasoning abilities (abductive + counterfactual)

A unifying thread: conditional structure (whether in a Bayesian network or a code if statement) is the fundamental representation of causal knowledge, and LLMs can both elicit it and reason with it.

See Also