Table 2-2.
figure 2-2
Table 2-2.
figure 2-2
GPT-4o annotation of 200 randomly sampled unsupported claims (Cohen’s 𝜅=0.657LLM-LLM IAA, Claude vs ChatGPT; computed offline, annotation script not archived inrepository) partitions gaps into four categories:To characterise the nature of LGKC-identified gaps, a random sample of 200 un-supported claims was drawn from the MedChat-QA evaluation set and annotated usingGPT-4o with retrieval-augmented evidence from PubMed abstracts. Each claim was as-signed to one of four mutually exclusive categories defined by whether the underlyingpharmacological relationship exists in the literature and, if so, how its absence from theKG should be interpreted. Inter-annotator agreement was assessed by replicating the an-notation using a second LLM (Claude), yielding Cohen’s 𝜅 = 0.657, a level conventionallyinterpreted as substantial agreement. The resulting c
two paragraphs are repeat youself.
GENE_ASSOCIATED
not introduced in abbreviations maybe GENE_ASSOCIATED_WITH_DISEASE?
Note. △ Source-scoped: entity vocabulary bounded to source catalog. Fully-supported question rate by relation (allclaims KG-confirmed): CF=32.2% lowest, independently confirms CF=0 direct edges finding.
where is the medicationQA, pubmedQA you mentioned earlier?
Figure 2-5 Cross-KG × benchmark LGKC heatmap (OpenBioLLM-8B, K=10). △ = source-scoped.‡ = text-mined source circularity
the table looks odd, with the stripe in the blocks
nine systems,
according to table 2.4, should be eight systems
𝜑(𝑐, 𝐺)
not introduced before, maybe K(G,c)?
probabilistic lower-bound metric
too strong, -> "an operational estimate of KG coverage over schema-compatible pharmacological claims elicited from an LLM."