8 Matching Annotations
  1. Jun 2026
    1. GPT-4o annotation of 200 randomly sampled unsupported claims (Cohen’s 𝜅=0.657LLM-LLM IAA, Claude vs ChatGPT; computed offline, annotation script not archived inrepository) partitions gaps into four categories:To characterise the nature of LGKC-identified gaps, a random sample of 200 un-supported claims was drawn from the MedChat-QA evaluation set and annotated usingGPT-4o with retrieval-augmented evidence from PubMed abstracts. Each claim was as-signed to one of four mutually exclusive categories defined by whether the underlyingpharmacological relationship exists in the literature and, if so, how its absence from theKG should be interpreted. Inter-annotator agreement was assessed by replicating the an-notation using a second LLM (Claude), yielding Cohen’s 𝜅 = 0.657, a level conventionallyinterpreted as substantial agreement. The resulting c

      two paragraphs are repeat youself.

    2. Note. △ Source-scoped: entity vocabulary bounded to source catalog. Fully-supported question rate by relation (allclaims KG-confirmed): CF=32.2% lowest, independently confirms CF=0 direct edges finding.

      where is the medicationQA, pubmedQA you mentioned earlier?

    3. Figure 2-5 Cross-KG × benchmark LGKC heatmap (OpenBioLLM-8B, K=10). △ = source-scoped.‡ = text-mined source circularity

      the table looks odd, with the stripe in the blocks

    4. probabilistic lower-bound metric

      too strong, -> "an operational estimate of KG coverage over schema-compatible pharmacological claims elicited from an LLM."

    Annotators