2,633 Matching Annotations
  1. Nov 2025
    1. nan

      Oncogenic evidence:

      Oncogenic: The abstract discusses mutations in the PIK3CA gene, specifically mentioning H1047R and H1047L, and indicates that these mutations are present in invasive breast carcinoma. The presence of these mutations suggests they contribute to tumor development, which aligns with the definition of oncogenic variants.

    1. nan

      Diagnostic, Prognostic, Oncogenic evidence:

      Diagnostic: The study investigates the frequency of PIK3CA mutations and amplifications in nasopharyngeal carcinoma (NPC), indicating that PIK3CA gene amplification is associated with advanced tumor stage and lymph node involvement, which helps classify the disease.

      Prognostic: The findings show that patients with PIK3CA copy number gain have significantly reduced overall survival time, suggesting that this variant correlates with disease outcome independent of therapy.

      Oncogenic: The study highlights that PIK3CA gene amplification is frequent in NPC and is associated with advanced disease, indicating its role in tumor development or progression.

    1. nan

      Oncogenic evidence:

      Oncogenic: The abstract discusses the involvement of the PIK3CA gene and its common activating missense mutations in a variety of human tumor types, indicating that these mutations contribute to tumor development or progression. This aligns with the definition of oncogenic variants, which are known to drive cancer.

    1. nan

      Predictive, Prognostic evidence:

      Predictive: The study indicates that PIK3CA mutations were associated with shorter time to progression (TTP) in trastuzumab-treated breast cancer patients, suggesting a correlation with resistance to therapy. This aligns with the definition of predictive evidence, as it discusses the impact of the variant on treatment response.

      Prognostic: The presence of PIK3CA mutations was linked to decreased survival from the initiation of trastuzumab treatment, indicating that this variant correlates with disease outcome independent of therapy. This supports the classification as prognostic evidence, as it relates to survival metrics.

    1. nan

      Predictive, Prognostic evidence:

      Predictive: PIK3CA mutations predicted for longer local recurrence-free survival, indicating a correlation with treatment response or disease outcome based on therapy context.

      Prognostic: The study reports that patients with high S-phase fraction had longer recurrence-free survival if they carried mutations in the PIK3CA gene, suggesting that this variant correlates with disease outcome independent of therapy.

    1. nan

      Prognostic, Oncogenic evidence:

      Oncogenic: The abstract discusses somatic mutations of PIK3CA and their role in the pathogenesis and progression of human breast cancers, indicating that these mutations contribute to tumor development.

      Prognostic: The study reports that PIK3CA mutations are significantly associated with a favorable prognosis, demonstrating that the mutation status serves as an independent prognostic factor.

    1. nan

      Predictive evidence:

      Predictive: The study discusses how increased activity of the PI3K pathway in cancer is associated with resistance to chemotherapeutic agents, suggesting that PI3K inhibitors like GDC-0941 could overcome this resistance and enhance the effectiveness of doxorubicin treatment. This indicates a correlation between the variant's activity and the response to therapy, fulfilling the criteria for predictive evidence.

    1. nan

      Predictive evidence:

      Predictive: The study demonstrates that GDC-0941 sensitizes breast cancer cells to ABT-737, indicating a correlation with enhanced response to therapy when these agents are combined. This suggests that the variant may influence treatment sensitivity, as the combination leads to increased cytotoxicity and apoptosis in breast cancer cells.

    1. nan

      Predictive, Functional evidence:

      Predictive: The study investigates the pharmacokinetics and activity of GDC-0941, a PI3K pathway inhibitor, in various mouse models, suggesting that the variant's interaction with P-glycoprotein and breast cancer resistance protein may influence treatment response in patients with brain tumors.

      Functional: The research demonstrates that GDC-0941 is a substrate of P-glycoprotein and Bcrp1, indicating that the variant alters the molecular function of these transporters, which impacts the drug's brain penetration and pharmacokinetics.

    1. nan

      Predictive evidence:

      Predictive: The study evaluates the relationship between GDC-0941 plasma concentrations and tumor reduction, indicating that the variant's activation or transforming mutations in the PI3K pathway correlate with the response to this specific therapy. The mention of "tumor pharmacodynamic biomarker" responses and their association with antitumor efficacy further supports this classification.

    1. nan

      Predictive evidence:

      Predictive: The study discusses the resistance to HER2 inhibitors in breast cancer and investigates the combinatorial activity of GDC-0941, a PI3K inhibitor, with standard therapies, indicating that the variant's presence may correlate with treatment response. The results show that the combination of GDC-0941 with HER2-targeted therapies leads to significant growth inhibition, suggesting a predictive relationship between the variant and therapeutic efficacy.

    1. nan

      Predictive evidence:

      Predictive: The study discusses how trastuzumab treatment correlates with the disruption of HER2/HER3 interactions and leads to antiproliferative effects, indicating that the variant's presence may influence response to therapy. The mention of a selective PI3K inhibitor being effective in combination with trastuzumab further supports the predictive nature of the variant in relation to treatment response.

    1. nan

      Oncogenic evidence:

      Oncogenic: The abstract discusses that somatic mutations in the PIK3CA gene play a role in tumor initiation, indicating that these mutations contribute to tumor development in breast cancer. The mention of PIK3CA mutations being present in both in situ and invasive breast carcinomas supports the classification of this variant as oncogenic.

    1. nan

      Diagnostic, Prognostic evidence:

      Diagnostic: The study establishes that IDH1 mutations are a strong genetic marker for distinguishing between secondary glioblastomas and primary glioblastomas, indicating their role in defining and classifying these disease subtypes.

      Prognostic: The results indicate that glioblastoma patients with IDH1 mutations have significantly longer survival compared to those without, suggesting that these mutations correlate with better disease outcomes independent of therapy.

    1. nan

      Diagnostic, Prognostic, Oncogenic evidence:

      Prognostic: The study reports that loss of heterozygosity (LOH) 10q is predictive of shorter survival, indicating a correlation between this genetic alteration and disease outcome independent of therapy. The observed survival rates at various time points further support the prognostic implications of the genetic alterations discussed.

      Diagnostic: The abstract mentions the frequency of various genetic alterations in glioblastomas, including TP53 mutations, which are used to classify and define the disease subtypes. The association of these mutations with primary and secondary glioblastomas provides evidence for their role in disease classification.

      Oncogenic: The presence of TP53 mutations and LOH 10q in glioblastomas suggests that these somatic variants contribute to tumor development and progression, particularly in the context of secondary glioblastomas. The study highlights the different mechanisms of mutation acquisition in these tumor subtypes, reinforcing the oncogenic nature of these alterations.

    1. nan

      Prognostic evidence:

      Prognostic: The study investigates the relationship between histologic factors and survival in glioblastoma patients, indicating that certain histologic features correlate with patient outcomes, particularly noting the strong negative relationship between advancing age and duration of postoperative survival. This suggests that the variant's presence or absence may influence prognosis independent of therapy.

    1. nan

      Diagnostic, Prognostic, Oncogenic evidence:

      Prognostic: The abstract discusses median survival times for various glioma types, indicating that the prognosis of diffusely infiltrating gliomas is poorer, with specific survival rates provided for different grades. This correlates the presence of certain genetic alterations, such as TP53 mutations and LOH 10q, with survival outcomes, thus providing prognostic evidence.

      Diagnostic: The abstract mentions the frequency of TP53 mutations in different glioma subtypes, which can be used to classify or define these tumors. The association of specific mutations with certain tumor types supports the use of these variants as diagnostic markers.

      Oncogenic: The abstract indicates that TP53 mutations and other genetic alterations are frequent in gliomas and discusses their association with tumor characteristics and survival, suggesting that these mutations contribute to tumor development or progression. This aligns with the definition of oncogenic evidence.

    1. nan

      Diagnostic, Oncogenic evidence:

      Diagnostic: The abstract discusses the distinct disease subtypes of glioblastoma, indicating that primary and secondary glioblastomas are characterized by different genetic pathways and mutation patterns. This classification of glioblastoma based on genetic alterations supports the use of these variants as diagnostic markers for defining and confirming disease subtypes.

      Oncogenic: The abstract mentions that TP53 mutations are the most frequent and earliest detectable genetic alteration in the progression to secondary glioblastoma, suggesting that these mutations contribute to tumor development. This indicates that the somatic variants discussed are involved in oncogenic processes related to glioblastoma.

    1. nan

      Predisposing, Functional evidence:

      Predisposing: The abstract mentions "germline mutations that affect components of the Ras-Raf-MEK-ERK pathway," indicating that these inherited mutations confer risk for developmental disorders, which aligns with the definition of predisposing variants.

      Functional: The abstract states that "many of these mutant alleles encode proteins with aberrant biochemical and functional properties," suggesting that the variants alter molecular or biochemical function, which supports the functional evidence type.

    1. nan

      Oncogenic evidence:

      Oncogenic: The study provides evidence that RAS oncogenes, specifically K- or NRAS genes activated by point mutation, contribute to tumor development in various types of lung carcinomas, indicating their role in oncogenesis. The detection of activated protooncogenes in the majority of lung tumor DNAs supports the conclusion that these variants are involved in tumor progression.

    1. nan

      Oncogenic evidence:

      Oncogenic: The study discusses mutations in the N-ras gene found in patients with cutaneous melanoma, suggesting that these mutations contribute to tumor development, particularly in the context of UV exposure. The mention of mutations being localized at dipyrimidine sites known to be targets of UV damage further supports the notion of these mutations playing a role in tumor progression.

    1. nan

      Predictive, Diagnostic, Oncogenic evidence:

      Diagnostic: The study investigates the incidence of BRAF and NRAS mutations in various melanoma subtypes, indicating that these mutations are used to classify and define the molecular genetic profiles of different melanoma types, particularly in mucosal melanomas.

      Predictive: The mention of "possible new therapeutic options of anti-RAF treatment" for patients with BRAF mutations suggests that these mutations may correlate with response to specific therapies, indicating a predictive relationship.

      Oncogenic: The study discusses the unique oncogenetic pathways of tumor development in different melanoma subtypes, implying that BRAF mutations contribute to tumor development in the context of melanoma, which supports the classification as oncogenic.

    1. nan

      Predictive, Diagnostic evidence:

      Diagnostic: The study discusses the identification of deleterious mutations in moderate-risk breast and ovarian cancer genes and Lynch syndrome genes among patients lacking BRCA1/2 mutations, indicating that these mutations are associated with hereditary breast and/or ovarian cancer (HBOC). This suggests that the presence of these mutations can be used to classify and confirm the risk of disease in individuals, thus supporting their role as diagnostic markers.

      Predictive: The findings indicate that the identification of non-BRCA1/2 mutations can lead to changes in clinical management and additional disease-specific screening or prevention measures, suggesting that these mutations correlate with the response to clinical interventions. This highlights the potential for these variants to influence treatment decisions based on their presence.

    1. nan

      Diagnostic, Predisposing evidence:

      Predisposing: The study discusses the identification of germline BRCA1/2 mutations and other pathogenic variants in genes associated with cancer risk, indicating that these variants confer inherited risk for developing cancer. The mention of "germline-DNA sequencing panel for cancer-risk assessment" supports this classification.

      Diagnostic: The study evaluates the performance of a customized sequencing panel for cancer-risk assessment, which is used to identify and confirm pathogenic variants in patients, thereby classifying them based on their genetic mutations. The identification of variants in genes like ATM, BLM, and others suggests their role in defining disease risk.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses a selective R132H-IDH1 inhibitor (AGI-5198) that blocks the ability of the mutant enzyme to produce R-2-hydroxyglutarate, indicating that the presence of the R132H variant correlates with the response to this specific therapy, as it impaired the growth of IDH1-mutant glioma cells.

      Oncogenic: The abstract mentions that the mutant IDH1 (specifically the R132H variant) promotes glioma growth, suggesting that this somatic variant contributes to tumor development or progression.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the novel pan-mutant IDH1 inhibitor BAY1436032, which specifically inhibits R-2HG production and induces myeloid differentiation in AML cells carrying IDH1 mutations, including R132G. This indicates a correlation between the presence of the R132G variant and the response to the therapy.

      Oncogenic: The abstract mentions that mutations in IDH1, including R132G, contribute to tumor development by leading to the production of the oncometabolite R-2HG, which promotes tumorigenesis through mechanisms such as histone and DNA hypermethylation. This supports the classification of R132G as an oncogenic variant.

    1. nan

      Predictive, Oncogenic evidence:

      Oncogenic: The abstract discusses somatic gain-of-function mutations in IDH1 and IDH2, indicating that these mutations contribute to tumor development and progression by causing a block in cellular differentiation, which is characteristic of oncogenic behavior.

      Predictive: The study highlights that the mutant IDH2 enzyme can be targeted by the inhibitor AG-221, which not only suppresses 2HG production but also induces cellular differentiation in IDH2 mutation-positive AML cells, suggesting a correlation with treatment response.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the development of BAY 1436032 as a pan-inhibitor targeting IDH1 mutations, including R132H and R132L, and highlights its ability to significantly prolong survival in mice with tumors carrying the IDH1R132H mutation, indicating a correlation with treatment response.

      Oncogenic: The abstract mentions that mutations in codon 132 of IDH1, including R132H and R132L, are frequent in various tumors, suggesting that these somatic variants contribute to tumor development and progression through their neomorphic enzyme activity.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study demonstrates that IDH mutant (IDHm) intrahepatic cholangiocarcinoma (ICC) cells show a striking response to the multikinase inhibitor dasatinib, indicating a correlation between the IDH mutations and sensitivity to this specific therapy.

      Diagnostic: The abstract states that IDH mutations define a distinct subtype of ICC, suggesting that these mutations are used to classify and identify this specific malignancy.

    1. nan

      Predictive, Oncogenic evidence:

      Oncogenic: The abstract discusses the presence of the IDH1 mutation R132H in a patient with metastatic pancreatic ductal adenocarcinoma (PDA), indicating that this somatic variant is associated with tumor development in this specific cancer type. The mention of the mutation being detected through molecular profiling supports its role in oncogenesis.

      Predictive: The abstract notes that the patient received a mutant IDH1 inhibitor (AG-120) as a treatment option, but there was no response. This indicates that the R132H variant was evaluated in the context of therapy, suggesting a predictive relationship regarding treatment response.

    1. nan

      Predictive evidence:

      Predictive: The study indicates that patients with non-small-cell lung cancer harboring EGFR mutations, including the L858R point mutation, respond well to the EGFR-specific tyrosine kinase inhibitor gefitinib, suggesting a correlation between the presence of this variant and improved treatment outcomes. The results show that gefitinib leads to significantly longer progression-free survival compared to standard chemotherapy, highlighting the predictive nature of the L858R mutation in response to therapy.

    1. nan

      Predictive evidence:

      Predictive: The study indicates that depletion of Akt3 in triple-negative breast cancer (TNBC) sensitizes cells to the pan-Akt inhibitor GSK690693, suggesting that the variant correlates with sensitivity to a specific therapy. This implies a potential therapeutic target for treatment in TNBC, aligning with the predictive evidence type.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the resistance mechanisms to the AKT inhibitor MK2206, indicating that the expression of AKT3 in resistant cells correlates with a loss of sensitivity to this therapy. This suggests that AKT3 plays a role in the response to treatment, making it predictive of resistance to AKT inhibitors.

      Oncogenic: The upregulation of AKT3 in AKT inhibitor-resistant breast cancer cells suggests that it contributes to tumor progression and resistance mechanisms, indicating its role as an oncogenic driver in this context.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study evaluates the efficacy of dovitinib, an FGFR inhibitor, in patients with advanced SCC of the lung whose tumors demonstrated FGFR1 amplification, indicating that the variant correlates with response to a specific therapy. The mention of "overall response" and "disease control rate" further supports this classification as it directly relates to treatment outcomes.

      Diagnostic: The study specifies that patients were enrolled based on their tumors demonstrating FGFR1 amplification of > 5 copies, which indicates that this variant is used to classify and confirm a specific subtype of lung cancer. This association with a defined characteristic of the disease supports the diagnostic classification.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study demonstrates that ATM-deficient MCL cell lines, such as Granta-519 and UPN2, are more sensitive to PARP-1 inhibition, indicating a correlation between the ATM variant status and response to therapy. Additionally, the use of the PARP-1 inhibitor olaparib significantly decreased tumor growth and increased overall survival in mice with ATM-deficient tumors, further supporting the predictive nature of this variant in therapeutic response.

      Oncogenic: The characterization of ATM alterations in MCL and the observation that ATM-deficient cells exhibit defective DNA damage signaling and increased sensitivity to treatment suggest that these somatic variants contribute to tumor development and progression in this cancer type. The study's focus on the functional consequences of ATM mutations in the context of cancer supports the classification of these variants as oncogenic.

    1. nan

      Oncogenic, Functional evidence:

      Functional: The study describes how the CHEK2 mutation p.R474C alters the tertiary structure of the CHK2 protein by disrupting a salt bridge, indicating a change in molecular function. Additionally, the cell-based transfection analysis showed that this variant was unstable and scarcely activated, further supporting its functional impact.

      Oncogenic: The conclusion that the homozygous CHEK2 variant p.R474C was contributory in the case of familial cancer suggests that this somatic variant may play a role in tumor development or progression, particularly given the context of multiple primary cancers in the patients.

    1. nan

      Diagnostic evidence:

      Diagnostic: The abstract discusses the updated recommendations for the diagnosis and management of acute myeloid leukemia (AML), which includes a revised version of the ELN genetic categories. This indicates that the variants are being used to classify or define the disease, aligning with the diagnostic evidence type.

    1. nan

      Predictive, Functional evidence:

      Predictive: The study indicates that the JAK1S703I mutation is sensitive to treatment with the JAK1/2 inhibitor, ruxolitinib, suggesting a correlation between this variant and therapeutic response. This is supported by the observation that the mutant PDX model showed sensitivity to the treatment, while other non-activating mutants did not.

      Functional: The JAK1S703I mutation was shown to activate the JAK-STAT signaling pathway and drive cell proliferation in vitro, indicating that this variant alters molecular function. The introduction of this mutation into cell lines demonstrated its capability to enhance signaling in the absence of cytokine stimulation.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study assesses the ability of tumor genomic loss of heterozygosity (LOH) to predict response to rucaparib, indicating that LOH high status correlates with a better response to this PARP inhibitor in patients with ovarian carcinoma. This is supported by the findings that progression-free survival was significantly longer in the LOH high subgroup compared to the LOH low subgroup, demonstrating the predictive nature of the variant in relation to treatment outcomes.

      Diagnostic: The classification of patients into homologous recombination deficiency subgroups based on tumor mutational analysis, including BRCA mutant and LOH high, indicates that these variants are used to define and classify the disease subtype. This classification is essential for determining the appropriate treatment strategy, thus supporting the diagnostic evidence type.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The abstract discusses the ability of the L858R mutation to predict sensitivity to gefitinib, stating that it has been examined for its predictive capabilities in the context of treatment response. The mention of patients with the L858R mutation who did not respond to gefitinib further emphasizes the predictive nature of this variant regarding treatment resistance.

      Oncogenic: The L858R mutation is described as a common somatic mutation in the EGFR gene associated with non-small-cell lung cancer (NSCLC), indicating its role in tumor development or progression. The context of the mutation being part of the EGFR gene, which is known to drive cancer, supports its classification as oncogenic.

    1. nan

      Predisposing, Oncogenic evidence:

      Predisposing: The abstract mentions that DDX41 mutations are identified as both germline and acquired somatic mutations in families with multiple cases of hematologic malignancies, indicating that these mutations confer inherited risk for developing diseases like myelodysplastic syndrome and acute myeloid leukemia.

      Oncogenic: The abstract suggests that DDX41 acts as a tumor suppressor and discusses the identification of mutations in families with hematologic malignancies, implying that these somatic mutations contribute to tumor development or progression.

    1. nan

      Predisposing, Oncogenic, Functional evidence:

      Predisposing: The study describes a familial acute myeloid leukemia (AML) syndrome caused by germline mutations in the DDX41 gene, indicating that these mutations confer inherited risk for developing the disease.

      Oncogenic: The abstract mentions that DDX41 is affected by somatic mutations in sporadic cases of myeloid neoplasms, suggesting that these mutations contribute to tumor development or progression.

      Functional: The abstract states that DDX41 lesions caused altered pre-mRNA splicing and RNA processing, indicating that the variant affects molecular or biochemical function.

    1. nan

      Oncogenic evidence:

      Oncogenic: The variant N356fs is mentioned in the results section as a somatic mutation associated with T-ALL, indicating its contribution to tumor development or progression. The classification as "homo (LOH)" further supports its role in oncogenesis within the context of leukemia.

    1. nan

      Diagnostic, Predisposing evidence:

      Predisposing: The study discusses inherited mutations in ETV6, including the N385fs variant, which are associated with susceptibility to acute leukemia, indicating that these mutations confer inherited risk for developing the disease.

      Diagnostic: The presence of the ETV6 N385fs mutation is used to classify and confirm the diagnosis of leukemia in the affected individuals, as it segregates with the disease in the studied kindreds.

    1. nan

      Predisposing evidence:

      Predisposing: The study discusses a "familial platelet disorder with propensity to myeloid malignancy," indicating that the germline heterozygous mutations in Runt-related transcription factor 1 confer inherited risk for developing myeloid malignancies. This aligns with the definition of a predisposing variant as it is explicitly described as germline and associated with an inherited condition.

    1. nan

      Diagnostic, Functional evidence:

      Diagnostic: The abstract mentions that "Platelet dysfunction suggestive of defective delta-granule release could be of values for the diagnosis of FPD/AML," indicating that the variant is used to define or confirm a disease, specifically in the context of autosomal dominant thrombocytopenia.

      Functional: The results section describes various assays performed to investigate platelet function and delta-granule release, which implies that the variant affects molecular or biochemical functions related to platelet aggregation and granule release.

    1. nan

      Diagnostic, Predisposing evidence:

      Predisposing: The study discusses "inherited thrombocytopenias," indicating that the variants are germline and confer inherited risk for developing this disorder. The mention of "inherited or sustained thrombocytopenia of unknown etiology" supports the classification as predisposing.

      Diagnostic: The identification of "pathogenic" or "likely pathogenic" variants in patients with thrombocytopenia suggests that these variants are used to confirm or classify the disease. The use of whole exome sequencing to elucidate potential pathogenic genetic variants further supports this classification.

    1. nan

      Predisposing, Oncogenic evidence:

      Predisposing: The abstract discusses "monoallelic RUNX1 germline mutations" found in families with familial platelet disorder (FPD), indicating that these inherited mutations confer a genetic predisposition to developing acute myeloid leukemia (AML).

      Oncogenic: The identification of "a second RUNX1 alteration" in AML cases, including "acquired point mutations" and "duplication of the altered RUNX1 allele," suggests that these somatic changes contribute to tumor development and progression in the context of AML.

    1. nan

      Diagnostic, Prognostic evidence:

      Diagnostic: The study discusses GATA2 deficiency and its association with various diseases, including myelodysplastic syndromes (MDS) and acute myeloid leukemia (AML), indicating that the variant is used to classify and confirm these conditions. The identification of genotype-phenotype associations further supports its role as a diagnostic marker.

      Prognostic: The abstract mentions that monocytopenia and lymphocytopenia correlate with the presence of disease, suggesting that the variant may have implications for disease outcomes independent of therapy. This correlation indicates a potential prognostic value of the GATA2 deficiency in predicting disease severity or progression.

    1. nan

      Predisposing, Functional evidence:

      Predisposing: The study identifies the p.Thr355del variant as a heritable mutation associated with familial myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML), indicating that it confers inherited risk for developing these diseases.

      Functional: The abstract mentions that the mutations, including p.Thr355del, affect transactivation of target genes, cellular differentiation, apoptosis, and global gene expression, demonstrating that this variant alters molecular or biochemical function.

    1. nan

      Prognostic evidence:

      Prognostic: The abstract indicates that KRAS G13D is associated with poor survival outcomes in mCRC patients, suggesting that this variant correlates with disease prognosis independent of therapy. The mention of "inferior PFS and OS" highlights its relevance in predicting patient outcomes.

    1. nan

      Predictive, Diagnostic, Oncogenic evidence:

      Predictive: The study discusses the patient's response to pembrolizumab, indicating that the hypermutated glioblastoma may be susceptible to checkpoint blockade therapy, which correlates with the variant's potential impact on treatment response. The mention of "an objective radiographic response" suggests a relationship between the variant and the effectiveness of the therapy.

      Diagnostic: The abstract states that the patient has a "POLE germline alteration," which is used to define the hypermutated genotype of the glioblastoma, indicating that this variant is associated with a specific disease subtype. This classification supports the use of the variant as a biomarker for identifying patients with this particular tumor profile.

      Oncogenic: The presence of a hypermutated genotype in the glioblastoma suggests that the variant contributes to tumor development or progression, as indicated by the tumor's characteristics and the context of the disease. The study implies that the POLE alteration plays a role in the tumor's aggressive behavior and its response to treatment.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study indicates that patients with TP53 mutations had a significantly higher response rate to decitabine therapy, with 100% of those with TP53 mutations achieving bone marrow blast clearance compared to only 41% of wild-type TP53 patients. This suggests that the presence of TP53 mutations correlates with a favorable clinical response to the treatment.

      Diagnostic: The abstract mentions that TP53 mutations are associated with an unfavorable-risk cytogenetic profile, which is used to classify patients in terms of their risk for poor outcomes. This association supports the use of TP53 mutations as a biomarker for defining patient subtypes in AML and MDS.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses how MET inhibitors suppressed tumor growth in xenograft models and reports a case where treatment with the targeted inhibitor crizotinib led to substantial tumor shrinkage, indicating a correlation between the MET fusion variant and response to therapy.

      Oncogenic: The identification of MET fusions that activated MAPK signaling and induced aggressive glial tumors in vivo provides evidence that these somatic variants contribute to tumor development and progression in pediatric glioblastoma.

    1. nan

      Predictive, Prognostic evidence:

      Predictive: The study discusses TP53 and MDM2 alterations as being associated with cisplatin resistance, indicating that these variants may correlate with treatment response and resistance to chemotherapy. The mention of actionable alterations in cisplatin-resistant GCTs suggests potential sensitivity to targeted therapies, further supporting the predictive nature of these findings.

      Prognostic: The abstract states that TP53 and MDM2 alterations predicted adverse prognosis independent of the IGCCCG model, indicating that these variants correlate with inferior outcomes such as increased risk of cancer-related death. This suggests that the presence of these alterations can provide prognostic information regarding disease outcome.

    1. nan

      Predictive, Oncogenic evidence:

      Oncogenic: The L768S mutation is described as contributing to tumor development, as it exhibited a significant increase in tyrosine kinase-specific activity and promoted rapid growth in xenograft experiments. This indicates that the mutation plays a role in the oncogenic process within HER2-negative breast cancer.

      Predictive: The study discusses how the L768S mutation affects sensitivity to HER2-targeted therapies, indicating that it may correlate with treatment response. Specifically, the presence of this mutation in HER2-negative tumors suggests a potential for these tumors to benefit from targeted therapies, highlighting its predictive nature regarding treatment outcomes.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the antitumor activity of dabrafenib plus trametinib specifically in patients with BRAF(V600E)-mutant NSCLC, indicating a correlation between the presence of this variant and the response to the therapy. The mention of "BRAF inhibition has shown antitumour activity" supports the predictive nature of the evidence regarding treatment response.

      Oncogenic: The abstract states that BRAF mutations, including V600E, act as oncogenic drivers in non-small cell lung cancer, which indicates that these variants contribute to tumor development or progression. This classification is supported by the context of the study focusing on a specific mutation known to drive cancer.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study discusses how CDK12 deficiency serves as a clinically relevant biomarker of PARP1/2 inhibitor sensitivity, indicating that the presence of this variant correlates with response to therapy. The mention of "sensitivity to PARP1/2 inhibition" directly links the variant to treatment outcomes, fulfilling the criteria for predictive evidence.

      Diagnostic: The identification of CDK12 as a biomarker of PARP1/2 inhibitor sensitivity suggests its role in classifying patients who may benefit from this specific therapy, thus providing diagnostic evidence. The abstract states that CDK12 is one of the genes significantly mutated in high-grade serous ovarian cancer, further supporting its use in defining a disease subtype.

    1. nan

      Diagnostic evidence:

      Diagnostic: The study identifies 3' end deletions in the EPCAM gene as a novel cause of Lynch syndrome, indicating that these deletions are used to classify and confirm the diagnosis of this genetic condition. The mention of the frequency of EPCAM deletions in confirmed Lynch syndrome families further supports its role as a diagnostic marker.

    1. nan

      Predisposing evidence:

      Predisposing: The abstract describes familial platelet disorder with predisposition to acute myelogenous leukaemia (FPD/AML) as an autosomal dominant disorder, indicating that the variant confers inherited risk for developing the disease. The mention of "haploinsufficiency of CBFA2" causing a congenital platelet defect and predisposing individuals to leukaemia further supports this classification.

    1. nan

      Predictive, Prognostic evidence:

      Prognostic: The abstract states that "the circulating concentrations of C-X-C motif chemokine ligand 10 (CXCL10), Fms-related tyrosine kinase 3 ligand (FLT3LG), interferon gamma (IFNG), and C-C motif chemokine ligand 4 (CCL4) were significantly associated with overall survival in both cohorts," indicating that these biomarkers correlate with disease outcome independent of therapy.

      Predictive: The conclusion mentions that "High circulating levels of CXCL10 and FLT3LG predicted worse survival for patients with OS," suggesting that these biomarkers may correlate with treatment response or resistance, which aligns with predictive evidence.

    1. nan

      Predictive, Diagnostic, Oncogenic evidence:

      Predictive: The abstract states that BRAF inhibitors are the standard treatment for metastatic melanoma with BRAF V600 mutations, indicating a correlation between the presence of this variant and the response to therapy. Additionally, the results highlight improved survival outcomes with BRAF inhibitors in patients with BRAF mutations, further supporting the predictive nature of this evidence.

      Diagnostic: The abstract mentions that all patients had BRAFV600E mutations, which implies that this variant is used to classify and confirm the presence of a specific subtype of melanoma. This association with a specific disease subtype qualifies the evidence as diagnostic.

      Oncogenic: The results section discusses the prevalence of BRAF mutations in malignant melanoma, specifically noting that the BRAF V600E mutation is the most common. This indicates that the variant contributes to tumor development or progression, supporting its classification as oncogenic.

    1. nan

      Predictive evidence:

      Predictive: The study discusses the correlation between isocitrate dehydrogenase 1/2 mutations and the response to venetoclax treatment, indicating that these mutations may serve as predictive markers of response consistent with BCL2 dependence. This suggests that the presence of these mutations could influence the effectiveness of the therapy in patients with AML.

    1. nan

      Diagnostic, Prognostic evidence:

      Diagnostic: The presence of the T790M mutation is used to define a clinical subset of patients with acquired resistance to EGFR TKIs, indicating its role in classifying disease progression and prognosis.

      Prognostic: The study reports that patients with the T790M mutation have a significantly longer postprogression survival compared to those without the mutation, suggesting that T790M correlates with a more favorable disease outcome.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the emergence of imatinib-resistant clones in patients with metastatic gastrointestinal stromal tumors (GIST) and correlates this with disease progression, indicating that the presence of specific mutations in the KIT and PDGFRA kinases is associated with resistance to imatinib therapy.

      Oncogenic: The identification of new activating kinase mutations in 80% of patients with progressive disease suggests that these somatic variants contribute to tumor development or progression in the context of GISTs.

    1. nan

      Oncogenic evidence:

      Oncogenic: The study discusses clonal chromosome aberrations involving 2p23 and the rearrangement of the ALK gene, suggesting that these alterations contribute to the development of inflammatory myofibroblastic tumors (IMT). This indicates that the variant plays a role in tumor progression, aligning with the definition of oncogenic evidence.

    1. nan

      Predictive, Prognostic evidence:

      Predictive: The study discusses how mutation scoring based on in vitro inhibitory concentration of TKI-mutation pairs can predict long-term clinical outcomes, indicating that the T315I variant has a low sensitivity to treatment, which correlates with response rates to therapy.

      Prognostic: The results indicate that tumors with low and intermediate mutation scores, including those with the T315I variant, had worse event-free and overall survival rates compared to those with highly sensitive mutations, demonstrating a correlation with disease outcome independent of therapy.

    1. nan

      Predisposing, Oncogenic evidence:

      Predisposing: The study identifies germline mutations in the ALK gene as the main cause of familial neuroblastoma, indicating that these inherited mutations confer a risk for developing the disease.

      Oncogenic: The abstract mentions that somatically acquired mutations in the ALK gene were predicted to be oncogenic drivers, as they resulted in constitutive phosphorylation and were linked to tumor growth.

    1. nan

      Predictive evidence:

      Predictive: The abstract states that EGFR gene mutations, including G719X, predict favorable responses to EGFR tyrosine kinase inhibitors (TKIs) in advanced non-small cell lung cancer (NSCLC). This indicates a correlation between the presence of the G719X variant and the effectiveness of specific therapies, qualifying it as predictive evidence.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study discusses the efficacy of trastuzumab-DM1 (T-DM1) in patients with HER2-positive metastatic breast cancer, indicating that the response rates were higher among patients with confirmed HER2-positive tumors. This suggests a correlation between the HER2 variant status and the response to the therapy, classifying it as predictive evidence.

      Diagnostic: The abstract mentions that the response rates were higher among patients with confirmed HER2-positive tumors, which implies that the HER2 status is used to classify and confirm the disease subtype (HER2-positive metastatic breast cancer). This supports the classification as diagnostic evidence.

    1. nan

      Diagnostic evidence:

      Diagnostic: The abstract discusses the identification of individuals at increased risk of hereditary cancer syndromes and presents criteria for cancer genetic consultation referral, indicating that the variant is used to classify or confirm a disease or subtype. This aligns with the definition of the Diagnostic evidence type, as it involves the use of specific criteria to identify individuals who may harbor genetic variants associated with cancer risk.

    1. nan

      Oncogenic evidence:

      Oncogenic: The abstract discusses somatic mutations of EGFR that are associated with "oncogene addiction," indicating that these mutations contribute to tumor development or progression. The mention of "oncogenic shock" further supports the idea that these mutations play a critical role in the cancer's response to treatment, which is characteristic of oncogenic variants.

    1. nan

      Predictive, Oncogenic evidence:

      Oncogenic: The study identifies rearrangements of the RET proto-oncogene, specifically the CCDC6-RET fusion, as potential driver mutations in lung adenocarcinoma, indicating that this somatic variant contributes to tumor development and progression. The evidence is supported by the demonstration of the biological relevance of the CCDC6-RET gene products in promoting cell growth and survival.

      Predictive: The study evaluates the efficacy of RET inhibitors, such as vandetanib, in reducing cell viability and demonstrates that treatment with these inhibitors can effectively target the CCDC6-RET fusion, suggesting a correlation with response to therapy. This indicates that the presence of the RET fusion may predict sensitivity to RET-targeted treatments.

    1. nan

      Predictive, Diagnostic, Oncogenic evidence:

      Oncogenic: The study identifies RET rearrangements as rare oncogenic alterations in non-small-cell lung cancer (NSCLC), indicating that these variants contribute to tumor development or progression.

      Predictive: The results demonstrate that vandetanib, a tyrosine kinase inhibitor, shows clinical antitumor activity in patients with advanced RET-rearranged NSCLC, suggesting that the presence of RET rearrangements correlates with response to this specific therapy.

      Diagnostic: The study defines RET rearrangement as a new molecular subgroup of NSCLC, indicating its role in classifying patients for targeted therapy, which aligns with the criteria for diagnostic evidence.

    1. nan

      Predictive evidence:

      Predictive: The study reports that the response rate of 33% with dabrafenib in patients with advanced BRAF V600E-mutant lung cancers indicates a correlation between the BRAF V600E variant and sensitivity to this specific therapy. This suggests that the presence of the BRAF V600E variant can predict treatment response to BRAF inhibitors.

    1. nan

      Predictive evidence:

      Predictive: The study discusses the efficacy of the CDK4/6 inhibitor ribociclib in combination with letrozole, indicating that this treatment correlates with improved progression-free survival in patients with HR-positive, HER2-negative advanced breast cancer. This suggests a predictive relationship between the variant (in this case, the treatment regimen) and the response to therapy.

    1. nan

      Predictive, Prognostic evidence:

      Predictive: The study discusses how the variant rs9637468 is associated with overall survival (OS) in patients treated with gemcitabine, indicating that it may help predict response to this therapy in pancreatic cancer. The mention of "gemcitabine response" directly links the variant to treatment outcomes, fulfilling the criteria for predictive evidence.

      Prognostic: The results indicate that rs9637468 is associated with overall survival (OS) of patients treated with gemcitabine, which correlates with disease outcome independent of therapy. This association with OS suggests that the variant may have prognostic implications for patients undergoing treatment.

    1. nan

      Prognostic, Predisposing evidence:

      Predisposing: The study discusses "familial CEBPA mutations" and indicates that these mutations are "germ-line," which confers an inherited risk for developing acute myeloid leukemia (AML). This aligns with the definition of predisposing evidence as it highlights the genetic basis of the disease in affected families.

      Prognostic: The abstract mentions a "cumulative incidence of relapse in familial AML was 56% at 10 years" and "long-term overall survival (10-year overall survival, 67%)," indicating that the presence of CEBPA mutations correlates with disease outcomes independent of therapy. This supports the classification as prognostic evidence.

    1. nan

      Diagnostic, Oncogenic evidence:

      Diagnostic: The abstract states that mutations occurring exclusively at arginine-625 in SF3B1 are associated with low-grade uveal melanomas, indicating that these mutations can be used to classify a distinct molecular subset of the disease.

      Oncogenic: The results section describes the identification of deleterious somatic variants in SF3B1, specifically noting that the p.R625C alteration was found in multiple tumor samples, suggesting its role in tumor development or progression in uveal melanoma.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses how alterations in the mTOR pathway, specifically a nonsense mutation in TSC2, correlate with sensitivity to everolimus, indicating that this variant may predict treatment response. The mention of resistance developing in a patient after an 18-month response further emphasizes the predictive nature of the variant in relation to therapy outcomes.

      Oncogenic: The identification of a mutation in TSC2 as a negative regulator of mTOR suggests that this somatic variant contributes to tumor development or progression, particularly in the context of anaplastic thyroid carcinoma. The study implies that the mutation plays a role in the tumor's response to treatment, reinforcing its oncogenic potential.

    1. nan

      Predictive evidence:

      Predictive: The abstract states that mutations in the Bcr-Abl kinase domain may cause resistance to tyrosine kinase inhibitors (TKIs) in chronic myeloid leukemia patients, indicating a correlation between the variant and treatment response. This suggests that the presence of specific mutations can influence the effectiveness of therapy, which aligns with the predictive evidence type.

    1. nan

      Predictive evidence:

      Predictive: The abstract discusses how NPM1 mutations in acute myeloid leukemia (AML) are associated with sensitivity to arsenic trioxide (ATO) and the beneficial effects of combining ATO with all-trans retinoic acid (ATRA) in treatment, indicating a correlation with treatment response. The statement about NPM1 mutant downregulation by ATO/ATRA potentiating response to daunorubicin further supports this classification.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the sensitivity of non-small-cell lung cancer (NSCLC) harboring the ALK rearrangement to the ALK inhibitor ceritinib, indicating a correlation between the presence of the ALK variant and the response to therapy. The overall response rate of 58% among patients treated with ceritinib further supports this predictive evidence regarding treatment efficacy.

      Oncogenic: The mention of ALK rearrangement in NSCLC suggests that this somatic variant contributes to tumor development or progression, as it is associated with the cancer type being studied. The context of resistance mutations in ALK also implies its role in oncogenesis within the framework of the disease.

    1. nan

      Predictive evidence:

      Predictive: The study discusses the objective response rate (ORR) of alectinib in patients with crizotinib-refractory ALK-positive NSCLC, indicating that the variant (ALK rearrangement) correlates with treatment response to alectinib, a specific therapy. The mention of ORR and progression-free survival highlights the predictive nature of the variant in relation to therapy outcomes.

    1. nan

      Predictive, Diagnostic, Oncogenic evidence:

      Predictive: The abstract discusses the occurrence of BCR-ABL mutations, including T315I, and their contribution to resistance to tyrosine kinase inhibitor therapy, indicating a correlation between the variant and treatment response. The mention of "imatinib-resistant" mutations suggests that T315I is associated with resistance to this specific therapy.

      Diagnostic: The abstract notes the incidence of the T315I mutation in patients with varying Sokal scores, implying its role in classifying or defining a subset of patients with chronic myeloid leukemia based on mutation status. This association with specific patient characteristics supports its use as a diagnostic marker.

      Oncogenic: The T315I mutation is mentioned in the context of patients progressing to accelerated phase/blast crisis, indicating that it contributes to tumor progression in chronic myeloid leukemia. This suggests that T315I has oncogenic properties, as it is associated with a more aggressive disease state.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study evaluates the response of patients with CML and ALL to the BCR-ABL tyrosine kinase inhibitor STI571, indicating that the presence of the BCR-ABL variant correlates with treatment response, as evidenced by the reported response rates in patients with myeloid and lymphoid blast crises.

      Diagnostic: The abstract mentions that BCR-ABL is present in virtually all cases of chronic myeloid leukemia (CML) and in a significant percentage of acute lymphoblastic leukemia (ALL), suggesting its role in defining and confirming these diseases.

    2. nan

      Predictive, Diagnostic evidence:

      Predictive: The study evaluates the response of patients with chronic myeloid leukemia (CML) and acute lymphoblastic leukemia (ALL) to the BCR-ABL tyrosine kinase inhibitor STI571, indicating a correlation between the presence of the BCR-ABL variant and treatment response. The results show that a significant percentage of patients experienced a response to the therapy, demonstrating the predictive nature of the variant in relation to treatment efficacy.

      Diagnostic: The presence of the BCR-ABL variant is used to classify patients with CML and ALL, as it is noted to be present in virtually all cases of CML and in a significant percentage of ALL cases. This establishes the variant as a key biomarker for diagnosing these diseases.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the use of STI571, a specific inhibitor of the BCR-ABL tyrosine kinase, in treating patients with chronic myeloid leukemia (CML), indicating that the presence of the BCR-ABL variant correlates with response to this therapy. The results show significant antileukemic activity and complete hematologic responses in patients treated with doses of 300 mg or more, highlighting the variant's role in treatment sensitivity.

      Oncogenic: The abstract mentions that BCR-ABL is a constitutively activated tyrosine kinase that causes chronic myeloid leukemia (CML), indicating that this somatic variant contributes to tumor development and progression in this disease context. The evidence of its transforming function supports its classification as oncogenic.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The abstract discusses the response rates of various EGFR mutations, including L861Q, to EGFR-tyrosine kinase inhibitors (TKIs) like gefitinib and erlotinib, indicating that these mutations show moderate sensitivities to these therapies. This suggests a correlation between the presence of the L861Q variant and the effectiveness of specific treatments, which aligns with the predictive evidence type.

      Oncogenic: The mention of somatic mutations in the EGFR gene, including L861Q, in the context of lung adenocarcinoma implies that these mutations contribute to tumor development or progression. This classification is supported by the context of the study focusing on mutations that are prevalent in cancer and their implications for targeted therapy.

    1. nan

      Oncogenic, Functional evidence:

      Functional: The study describes the use of a mouse embryonic stem cell-based functional assay to characterize BRCA2 variants, indicating that these variants alter molecular or biochemical function, such as their ability to rescue lethality in Brca2-deficient cells and their effect on sensitivity to DNA-damaging agents.

      Oncogenic: The classification of the BRCA2 variants as pathogenic or non-pathogenic based on their effects on genomic integrity and homologous recombination suggests that these somatic variants contribute to tumor development or progression.

    1. nan

      Predictive, Prognostic evidence:

      Predictive: The study indicates that miR-34a significantly sensitized the anticancer effects of 5-fluorouracil (5-FU), suggesting that the variant correlates with response to this specific therapy in pancreatic ductal adenocarcinoma (PDAC) patients.

      Prognostic: The loss of expression of miR-34a is associated with disease progression and poor prognosis in PDAC patients, indicating that this variant correlates with disease outcome independent of therapy.

    1. nan

      Diagnostic, Predisposing evidence:

      Diagnostic: The study discusses the identification of germline mutations in the STK11 gene that are associated with Peutz-Jeghers syndrome, indicating that these mutations can be used to define and confirm the diagnosis of this autosomal-dominant disorder.

      Predisposing: The abstract explicitly states that the identified mutations in the STK11 gene are germline mutations, which confer inherited risk for developing Peutz-Jeghers syndrome, a condition characterized by an increased risk for various neoplasms.

    1. nan

      Predisposing evidence:

      Predisposing: The study discusses hereditary cancer syndromes and identifies truncating germline mutations in the LKB1 gene associated with Peutz-Jeghers syndrome, indicating that these mutations confer an inherited risk for developing the disease. The mention of "germline mutations" and the context of a hereditary syndrome supports this classification.

    1. nan

      Predictive evidence:

      Predictive: The G497W mutation in SMO is associated with resistance to vismodegib, as it interferes with drug binding, indicating that this variant correlates with treatment response. The abstract specifically mentions that the mutation contributes to primary resistance in a patient undergoing treatment with vismodegib.

    1. nan

      Diagnostic evidence:

      Diagnostic: The study establishes that HMGA2-LPP and LPP-HMGA2 fusion genes are specific to lipoma, while TLS-CHOP and EWS-CHOP are specific to liposarcoma, indicating their use in classifying these adipocytic tumors. This classification supports the notion that these fusion genes can be used as biomarkers for diagnosing specific tumor types.

    1. nan

      Predictive, Functional evidence:

      Predictive: The study discusses how PDK1 and SGK1 contribute to the resistance of PIK3CA-mutant cancer cells to PI3Kalpha inhibition, indicating that the variant's presence correlates with treatment response. This suggests that targeting these kinases can restore sensitivity to therapy, which aligns with predictive evidence.

      Functional: The results mention measuring S6 phosphorylation as a readout of mTORC1 activity, indicating that the variant affects molecular function related to signaling pathways. This suggests that the variant alters biochemical activity, which is characteristic of functional evidence.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study evaluates gefitinib in patients with metastatic colorectal cancer and discusses clinical outcomes such as progression-free survival and response rates, indicating a focus on the relationship between the variant (in this case, the EGFR signaling pathway) and treatment response.

      Oncogenic: The study analyzes tumor biopsies for activation of the EGFR signaling pathway, which is indicative of the variant's role in tumor development or progression, particularly in the context of evaluating gefitinib's efficacy.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The study compares the efficacy of afatinib and gefitinib in treatment-naive patients with EGFR mutation-positive non-small-cell lung cancer, specifically mentioning the Leu858Arg variant. This indicates that the variant is associated with treatment response, making it predictive of therapy outcomes.

      Diagnostic: The abstract states that patients with a common EGFR mutation, including Leu858Arg, were included in the study, suggesting that this variant is used to classify patients as having EGFR mutation-positive NSCLC. This supports its role as a diagnostic marker for the disease.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study indicates that the substitution of threonine with isoleucine in the Abl kinase domain is associated with resistance to the Abl tyrosine kinase inhibitor STI-571, suggesting that this variant correlates with treatment response. The phrase "sufficient to confer STI-571 resistance" directly links the variant to therapeutic implications.

      Oncogenic: The evidence presented shows that the threonine to isoleucine substitution contributes to drug resistance, which is a critical aspect of tumor progression in the context of chronic myeloid leukemia. This variant's role in conferring resistance indicates its involvement in oncogenic processes.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses how the addition of retinoic acid (RA) to chemotherapy may improve survival in patients with NPM1 mutations, indicating a correlation between the NPM1 variant and response to therapy. The mention of "proposed benefit of adding RA to chemotherapy in NPM1 mutant AMLs" supports this classification.

      Oncogenic: The abstract states that NPM1 mutations contribute to the disorganization of promyelocytic leukemia (PML) nuclear bodies and lead to differentiation and apoptosis in AML, suggesting that the NPM1 variant plays a role in tumor development or progression. This aligns with the definition of oncogenic variants contributing to cancer biology.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses how overexpression of CRIPTO1 in EGFR-mutated NSCLC cells leads to resistance against EGFR tyrosine kinase inhibitors (EGFR-TKIs) like erlotinib, indicating a correlation between the variant and treatment response. The mention of "erlotinib sensitivity" and "erlotinib resistance" directly relates to the predictive nature of the variant's impact on therapy outcomes.

      Oncogenic: The evidence presented shows that CRIPTO1 contributes to the development of resistance in EGFR-mutated NSCLC cells, suggesting its role in tumor progression. The activation of SRC and induction of epithelial-to-mesenchymal transition (EMT) further supports the oncogenic behavior of the variant in the context of cancer development.

    1. nan

      Predictive evidence:

      Predictive: The abstract discusses the acquired resistance involving the BTK C481S mutation in the context of treatment with the BTK inhibitor ibrutinib, indicating that this variant is associated with resistance to therapy. This suggests that the presence of the C481S mutation may predict a lack of response to ibrutinib treatment in patients with mantle cell lymphoma.

    1. nan

      Predictive, Diagnostic evidence:

      Predictive: The abstract discusses the treatment of a patient with advanced colorectal cancer using trastuzumab based on the presence of the ERBB2 p.L755S mutation, indicating a correlation between the variant and the therapeutic approach, which aligns with predictive evidence regarding treatment response.

      Diagnostic: The mention of the ERBB2 p.L755S mutation in the context of identifying genetic driver events in tumors suggests its role in classifying or defining the disease, supporting its classification as diagnostic evidence.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the effectiveness of the combination of selinexor and ibrutinib in CLL, particularly noting that it is effective in a cell line harboring the BTK C481S mutation, which suggests that this variant may influence treatment response and resistance to therapy.

      Oncogenic: The mention of the BTK C481S mutation in the context of acquired resistance to ibrutinib indicates that this somatic variant contributes to tumor progression and the development of resistance mechanisms in CLL.

    1. nan

      Predictive, Oncogenic evidence:

      Predictive: The study discusses the response to targeted therapy in patients with FGFR2 fusions, indicating that these genetic alterations correlate with treatment response. The mention of the E384X variant as a negative regulator of EGFR suggests its potential relevance in therapeutic contexts, particularly in relation to treatment strategies targeting the EGFR pathway.

      Oncogenic: The E384X variant is described as an SNV in ERRFI1, which is a negative regulator of EGFR, implying its role in tumor biology and potentially contributing to tumor development or progression. This suggests that the variant may have oncogenic implications, particularly in the context of cholangiocarcinoma.

  2. Oct 2025
    1. TLDR: When working with LLMs, the risks for the L&D workflow and its impact on substantive learning are real:Hallucination — LLMs invent plausible-sounding facts that aren’t trueDrift — LLM outputs wander from your brief without clear constraintsGeneric-ness — LLMs surface that which is most common, leading to homogenisation and standardisation of “mediocre”Mixed pedagogical quality — LLMs do not produce outputs which are guaranteed to follow evidence-based practiceMis-calibrated trust — LLMs invite us to read guesswork as dependable, factual knowledge These aren’t edge cases or occasional glitches—they’re inherent to how AI / all LLMs function. Prediction machines can’t verify truth. Pattern-matching can’t guarantee validity. Statistical likelihood doesn’t equal quality.

      Real inherent issue using AI for learning.

  3. Sep 2025
  4. Aug 2025
    1. Companies have invested billions into AI, 95 percent getting zero return

      MIT report: 95% of companies see no profit from investments in generative AI, which amounted to approximately $35 billion.

      Most AI pilots have no measurable impact on company profits. Attempts to implement tools like ChatGPT into the workplace primarily increase the productivity of individual employees, not the earnings of the entire company.

    1. Rewrote the keyboard shortcut code to remove an unmaintained dependency — PR #35425 (by diondiondion)

      GitHub Copilot's explanation of this checkout seemed rather impressive to me:


      At checkout 35425 in the Mastodon repository, the former implementation of keyboard shortcuts, which relied on the third-party library react-hotkeys, has been replaced by a custom React-based solution. Here is a detailed breakdown of the technologies and architectural patterns now used:


      1. Custom React Hook and Component

      • File Introduced:
      • app/javascript/mastodon/components/hotkeys/index.tsx
      • Key Elements:
      • A custom React hook (useHotkeys) and a component (Hotkeys) have been implemented to handle keyboard shortcut logic directly within the application.
      • This solution leverages React's built-in hooks (useEffect, useRef) for event handling and state management.

      2. TypeScript for Type Safety

      • TypeScript Usage:
      • The new implementation is written in TypeScript, providing stronger type safety and maintainability.
      • For example, handler maps are strongly typed, and key matcher logic uses TypeScript types for clarity and correctness.

      3. Utility Functions for Key Handling

      • File Introduced:
      • app/javascript/mastodon/components/hotkeys/utils.ts
      • Features:
      • Utility functions such as normalizeKey and type guards like isKeyboardEvent help standardize and robustly handle keyboard events across browsers.

      4. Storybook Integration for Testing

      • File Introduced:
      • app/javascript/mastodon/components/hotkeys/hotkeys.stories.tsx
      • Purpose:
      • A Storybook story is included for interactive demonstration and testing of the hotkeys component.
      • This enables developers to visually and functionally verify the keyboard shortcut system in isolation.

      5. Removal of react-hotkeys

      • The dependency on react-hotkeys is removed from package.json and corresponding imports are replaced by the new local implementation throughout the codebase.

      6. Architectural Improvements

      • Matching Logic:
      • Hotkey matching logic now supports single-key, combo (modifier + key), and sequence (multi-key) shortcuts, with prioritization for conflict resolution.
      • Global vs. Local Scoping:
      • The custom component allows for both globally-scoped shortcuts (listening on the document) and locally-scoped ones (within a React component subtree).
      • Accessibility:
      • The new approach explicitly avoids triggering shortcuts in <input>, <textarea>, and <select> elements, enhancing usability and accessibility.

      7. Modern React Patterns

      • The hook-based and functional component design aligns with modern React development best practices, reducing reliance on class components and third-party abstractions.

      Summary Table

      | Old Tech (Removed) | New Tech (Introduced) | |----------------------------|-----------------------------------------------| | react-hotkeys (NPM lib) | Custom React hook/component (Hotkeys) | | JavaScript (in places) | TypeScript | | Third-party event handling | Native DOM/React event handling & utilities | | External matching | App-specific matcher logic (TypeScript) | | No isolated test UI | Storybook-based playground |


      References:


      In summary:<br /> The Mastodon project at this commit has migrated from a third-party keyboard shortcut library (react-hotkeys) to a modern, TypeScript-based, locally developed React solution for keyboard shortcut management, offering improved flexibility, maintainability, and testability.

  5. Jul 2025
    1. Automating oral argument

      A Harvard Law graduate who argued before the Supreme Court fed his case briefs into Claude 4 Opus and had it answer the same questions the Justices posed to him. The AI delivered what he called an "outstanding oral argument" with coherent answers and clever responses he hadn't considered, leading him to conclude that AI lawyers could soon outperform even top human advocates at oral argument.

  6. Apr 2025
    1. By writing a paper, you’re going to have to take all these bits of evidence into account, weigh them and figure out how to  articulate them correctly. That’s a  process of character building

      Why chatGPT can't replace writing

  7. Mar 2025
    1. AI adoption is rapidly increasing in all industries for several use cases. In terms of natural language technologies, the question generally is – is it better to use NLP approaches or invest in LLM technologies? LLM vs NLP is an important discussion to identify which technology is most ideal for your specific project requirements.

      Explore the key differences between NLP and LLM in this comprehensive comparison. Learn how these technologies shape AI-driven applications, their core functionalities, and their impact on industries like chatbots, sentiment analysis, and content generation.

    1. The analysis uncovered an average of 11 different types of data out of the 35 possible. As mentioned earlier, Google Gemini stands out as the most data-hungry service, collecting 22 of these data types, including highly sensitive data like precise location, user content, the device's contacts list, browsing history, and more.Among the analyzed applications, only Google Gemini, Copilot, and Perplexity were found to collect precise location data. The controversial DeepSeek chatbot stands right in the middle, collecting 11 unique types of data, such as user input like chat history.
    1. Hao-Ping (Hank) Lee, Advait Sarkar, Lev Tankelevitch, Ian Drosos, Sean Rintel, Richard Banks, and Nicholas Wilson. 2025. The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers. In CHI Conference on Human Factors in Computing Systems (CHI ’25), April 26–May 01, 2025, Yokohama, Japan. ACM, New York, NY, USA, 23 pages. https://doi.org/10.1145/3706598.3713778

      Abstract

      The rise of Generative AI (GenAI) in knowledge workflows raises questions about its impact on critical thinking skills and practices. We survey 319 knowledge workers to investigate 1) when and how they perceive the enaction of critical thinking when using GenAI, and 2) when and why GenAI affects their effort to do so. Participants shared 936 first-hand examples of using GenAI in work tasks. Quantitatively, when considering both task- and user-specific factors, a user’s task-specific self-confidence and confidence in GenAI are predictive of whether critical thinking is enacted and the effort of doing so in GenAI-assisted tasks. Specifically, higher confidence in GenAI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking. Qualitatively, GenAI shifts the nature of critical thinking toward information verification, response integration, and task stewardship. Our insights reveal new design challenges and opportunities for developing GenAI tools for knowledge work

    1. システム - エージェントを単一のコンポーネントとしてではなく、複数のコンポーネントから成るシステムとして考えることが重要です。AIエージェントの基本的な構成要素は以下の通りです: 環境 - AIエージェントが動作する定義された空間。例えば、旅行予約のAIエージェントの場合、環境はエージェントがタスクを完了するために使用する旅行予約システムとなります。 センサー - 環境には情報があり、フィードバックを提供します。AIエージェントはセンサーを使用して、環境の現在の状態に関する情報を収集・解釈します。旅行予約エージェントの例では、ホテルの空室状況や航空券の価格などの情報を旅行予約システムから取得します。 アクチュエーター - AIエージェントが環境の現在の状態を受け取った後、そのタスクに応じて環境を変化させるためにどのようなアクションを実行するかを決定します。旅行予約エージェントの場合、ユーザーのために空室を予約するアクションが該当します。
  8. Feb 2025
    1. Cursor noted Claude is once again best-in-class for real-world coding tasks, with significant improvements in areas ranging from handling complex codebases to advanced tool use. Cognition found it far better than any other model at planning code changes and handling full-stack updates. Vercel highlighted Claude’s exceptional precision for complex agent workflows, while Replit has successfully deployed Claude to build sophisticated web apps and dashboards from scratch, where other models stall. In Canva’s evaluations, Claude consistently produced production-ready code with superior design taste and drastically reduced errors.

      Claude 3.7 Sonnet again excels at coding, as verified by multiple teams

    1. Programming Language for LLMs.

      What would a programming language designto make programming easier for LLMs look like.

      Suggested analogy: RISC architecture assembler not feasible for humans to write, so compilers created for them.

      RISC architecture for language and libraries (in potential new programming language) be fundamental in way that human cannot keep track of everything, but LLM could write program for it.

      Reasons: Validating LLM output is easier/possible. Make things less brittle. Make things more adaptive.

      Suggestion: Reduced Intruction set programming language with mall libraires. (no syntactic sugar)

    1. If robust general-purpose reasoning abilities have emerged in LLMs, this bolsters the claim that such systems are an important step on the way to trustworthy general intelligence.
    2. The word “reasoning” is an umbrella term that includes abilities for deduction, induction, abduction, analogy, common sense, and other “rational” or systematic methods for solving problems. Reasoning is often a process that involves composing multiple steps of inference.
    3. LLMs are substantially better at solving problems that involve terms or concepts that appear more frequently in their training data, leading to the hypothesis that LLMs do not perform robust abstract reasoning to solve problems, but instead solve problems (at least in part) by identifying patterns in their training data that match, or are similar to, or are otherwise related to the text of the prompts they are given.
  9. Jan 2025
    1. Take aways: AI will become cheaper and more efficient. - closed source models can cache responses and save computations for repetitive queries - closed source also has possibility of iterative improvements using constant reinforcement learning. - Prioritizing capabilities and deliberate strategy in data selection, carefully designed training objectives.

    1. Critics of A.I. use by religious leaders have pointed to the issue of hallucinations — times when chatbots make stuff up. While harmless in certain situations, faith-based A.I. tools that fabricate religious scripture present a serious problem. In Rabbi Bot’s sermon, for instance, the A.I. invented a quote from the Jewish philosopher Maimonides that would have passed as authentic to the casual listener.

      LLM Confabulation of Religious Ideas

    1. Really good PMs and engineers will actually start to converge. With LLMs, coding won't be enough to differentiate as an engineer, you'll need to think about the product, business KPIs, strategy etc. You need to think about solutions to problems, not software tools. And PMs are going to be expected to get more technical.

      MLOps prediction for 2025

    2. We’ll also see a big surge in the use of buzzword-heavy AI concepts like Retrieval-Augmented Generation (RAG) systems, generative AI, and cloud-based AI products, all of which will become easier to use and, hopefully, cheaper, thereby driving further broad adoption.

      RAG will shine even more in 2025

    1. The concept of RAG is relatively straightforward. It involves two main components: a document retriever and a large language model (LLM). The document retriever is responsible for finding relevant information from a large corpus of documents based on the input question using semantic search. This information is then passed to the LLM, which generates a response. The unique aspect of RAG is the way it combines these two components. Instead of retrieving documents and then generating a response in two separate steps, RAG uses a joint process where the document retrieval and response generation steps are connected. This allows the model to consider multiple documents simultaneously when generating a response, leading to more accurate and contextually relevant outputs.

      Simple definition of RAG

  10. Dec 2024
    1. In fourth generation  warfare, seen here on the right, the direct control of the bots is  replaced with AI. Using AI we can define artificial personalities with artificial social media behaviour, that will make it increasingly difficult to detect the bots.

      "Artificial Intelligent Personalities", forth generation warfare

  11. Nov 2024
    1. Data scientists, MLOps engineers, or AI developers, can mount large language model weights or machine learning model weights in a pod alongside a model-server, so that they can efficiently serve them without including them in the model-server container image. They can package these in an OCI object to take advantage of OCI distribution and ensure efficient model deployment. This allows them to separate the model specifications/content from the executables that process them.

      The introduction of the Image Volume Source feature in Kubernetes 1.31 allows MLOps practitioners to mount OCI-compatible artifacts, such as large language model weights or machine learning models, directly into pods without embedding them in container images. This streamlines model deployment, enhances efficiency, and leverages OCI distribution mechanisms for effective model management.

  12. Sep 2024
    1. 可以通过构建一个更大的概率矩阵来改进模型的预测。为了使用两个 Token 的上下文窗口,需要增加额外的表行,这些行代表所有可能的两个 Token 序列。在示例中使用的五个 Token 中,每一对 Token 将在概率表中新增 25 行,加上已经存在的 5 个单 Token 行。

      现在假设我训练了['I', 'you', 'like', 'apples', 'bananas']这五个词元,那么我的概率表是一个5×5的表,这里我的上下文窗口是1个token,如果我现在要把上下文窗口设置为2个token,为什么说增加上下文窗口会增加概率表格行?

      您提出了一个很好的问题。让我来详细解释一下为什么增加上下文窗口会增加概率表的行数。

      首先,让我们明确一下两种情况:

      1. 上下文窗口为1个token的情况: 在这种情况下,您的概率表确实是一个5×5的表格。每一行代表一个可能的前一个token,每一列代表下一个可能的token。

      2. 上下文窗口为2个token的情况: 这种情况下,概率表的结构会发生变化。让我解释为什么:

      当上下文窗口增加到2个token时,我们实际上是在考虑所有可能的两个词的组合作为"前文"。对于您给出的5个词元,可能的两词组合数量是:5 × 5 = 25。

      这意味着我们的概率表现在需要25行,而不是原来的5行。每一行代表一个可能的两词组合,而列仍然是5,代表可能的下一个词。

      例如,概率表可能看起来像这样:

      ``` 前两个词 | I | you | like | apples | bananas


      I I | ... | ... | ... | ... | ... I you | ... | ... | ... | ... | ... I like | ... | ... | ... | ... | ... ... bananas I | ... | ... | ... | ... | ... bananas you | ... | ... | ... | ... | ... ... ```

      这个表格现在有25行(所有可能的两词组合)和5列(下一个可能的词)。

      所以,当我们说"增加上下文窗口会增加概率表格的行"时,我们指的是: - 对于n个词元和k个token的上下文窗口,概率表的行数将是n^k。 - 列数始终保持为n,因为它代表可能的下一个词。

      这就解释了为什么增加上下文窗口会显著增加概率表的大小,特别是行数。这也是为什么大型语言模型需要如此大的计算资源和复杂的架构来处理长上下文。

  13. Jun 2024
  14. May 2024
    1. three different issues that are being implicated by artificial intelligence. And this is true with, you know, all artificial intelligence, not just a generative but particularly generative.

      Three issues implicated by Generative AI

      1. Does ingestion for training AI constitute infringement?
      2. Does the output infringe?
      3. Is the output copyrightable?

      The answer is different in different jurisdictions.

    2. Handling Academic Copyright and Artificial Intelligence Research Questions as the Law Develops

      Spring 2024 Member Meeting: CNI websiteYouTube

      Jonathan Band Copyright Attorney Counsel to the Library Copyright Alliance

      Timothy Vollmer Scholarly Communication & Copyright Librarian University of California, Berkeley

      The United States Copyright Office and courts in many United States jurisdictions are struggling to address complex copyright issues related to the use of generative artificial intelligence (AI). Meanwhile, academic research using generative AI is proliferating at a fast pace and researchers still require legal guidance on which sources they may use, how they can train AI legally, and whether the reproduction of source material will be considered infringing. The session will include discussion of current perspectives on copyright and generative AI in academic research.

    1. So how does this work? I wanted to give this picture of what's actually happening behind the scenes, especially with this question and answer. So first, I will say that we're using a combination of OpenAI's GPT 3.5 to do this as well as some open source, smaller open source models to generate the vectors for the semantic search.

      JSTOR implements a RAG

      RAG == Retrieval Augmented Generation

    1. Our core assumption is that foundational models, having been extensively trained in English texts, possess a substantial level of understanding and reasoning capabilities. Transferring these capabilities from English to another language, such as Korean, could be more efficient than developing performance from standalone Korean pre-training.

      Hipótesis: Transferencia de conocimientos de Ingles a nuevo lenguaje

  15. Apr 2024
    1. https://web.archive.org/web/20240430105622/https://garymarcus.substack.com/p/evidence-that-llms-are-reaching-a

      Author suggests the improvement of LLMs is flattening. E.g. points to the closing gap between proprietary and open source models out there, while improvement of proprietary stuff is diminishing or no longer happening (OpenAI progress flatlined 13 months ago it seems). In comment someone points to https://arxiv.org/abs/2404.04125 which implies a hard upper limit in improvement

    1. The same LM can be a much more or less capable agent depending on the enhancements added. The researchers created and tested four different agents built on top of GPT-4 and Anthropic’s Claude:

      While today’s LMs agents don't pose a serious risk, we should be on the lookout for improved autonomous capabilities as LMs get more capable and reliable.

    1. On code-authoring tasks, students in the Codex group had a significantly higher correctness score (80%) than the Baseline (44%), and overall finished the tasks significantly faster. However, on the code-modifying tasks, both groups performed similarly in terms of correctness, with the Codex group performing slightly better (66%) than the Baseline (58%).

      In a study, students who learned to code with AI made more progress during training sessions, had significantly higher correctness scores, and retained more of what they learned compared to students who didn't learn with AI.

  16. Feb 2024
  17. Jan 2024
    1. Santosh Vempala, a computer science professor at Georgia Tech, has also studied hallucinations. “A language model is just a probabilistic model of the world,” he says, not a truthful mirror of reality. Vempala explains that an LLM’s answer strives for a general calibration with the real world—as represented in its training data—which is “a weak version of accuracy.” His research, published with OpenAI’s Adam Kalai, found that hallucinations are unavoidable for facts that can’t be verified using the information in a model’s training data.

      “A language model is just a probabilistic model of the world”

      Hallucinations are a result of an imperfect model, or attempting answers without the necessary data in the model.

    1. Moreover, Midjourney apparently sought to suppress our findings, banning Southen from its service (without even a refund of his subscription fee) after he reported his first results, and again after he created a new account from which additional results were reported. It then apparently changed its terms of service just before Christmas by inserting new language: “You may not use the Service to try to violate the intellectual property rights of others, including copyright, patent, or trademark rights. Doing so may subject you to penalties including legal action or a permanent ban from the Service.” This change might be interpreted as discouraging or even precluding the important and common practice of red-team investigations of the limits of generative AI—a practice that several major AI companies committed to as part of agreements with the White House announced in 2023. (Southen created two additional accounts in order to complete this project; these, too, were banned, with subscription fees not returned.)

      Midjourney bans researchers and changes terms of service

    2. One user on X pointed to the fact that Japan has allowed AI companies to train on copyright materials. While this observation is true, it is incomplete and oversimplified, as that training is constrained by limitations on unauthorized use drawn directly from relevant international law (including the Berne Convention and TRIPS agreement). In any event, the Japanese stance seems unlikely to be carry any weight in American courts.

      Specifics in Japan for training LLMs on copyrighted material

    3. After a bit of experimentation (and in a discovery that led us to collaborate), Southen found that it was in fact easy to generate many plagiaristic outputs, with brief prompts related to commercial films (prompts are shown).

      Plagiaristic outputs from blockbuster films in Midjourney v6

      Was the LLM trained on copyrighted material?

  18. Dec 2023
    1. 更近期、相关和重要的记忆更有可能被提取出来

      更近的、更相关的、更重要的记忆被提取出来的可能性越大。所以,记笔记需要用自己的话来写一遍,并且和已有的知识、或者自身的经验进行结合,这样才能记得牢,且需要的时候更容易被提取出来。

  19. Nov 2023
    1. This illustration shows four alternative ways to nudge an LLM to produce relevant responses:Generic LLM - Use an off-the-shelf model with a basic prompt. The results can be highly variable, as you can experience when e.g. asking ChatGPT about niche topics. This is not surprising, because the model hasn’t been exposed to relevant data besides the small prompt.Prompt engineering - Spend time structuring the prompt so that it packs more information about the desired topic, tone, and structure of the response. If you do this carefully, you can nudge the responses to be more relevant, but this can be quite tedious, and the amount of relevant data input to the model is limited.Instruction-tuned LLM - Continue training the model with your own data, as described in our previous article. You can expose the model to arbitrary amounts of query-response pairs that help steer the model to more relevant responses. A downside is that training requires a few hours of GPU computation, as well as a custom dataset.Fully custom LLM - train an LLM from scratch. In this case, the LLM can be exposed to only relevant data, so the responses can be arbitrarily relevant. However, training an LLM from scratch takes an enormous amount of compute power and a huge dataset, making this approach practically infeasible for most use cases today.

      RAG with a generic LLM - Insert your dataset in a (vector) database, possibly updating it in real time. At the query time, augment the prompt with additional relevant context from the database, which exposes the model to a much larger amount of relevant data, hopefully nudging the model to give a much more relevant response. RAG with an instruction-tuned LLM - Instead of using a generic LLM as in the previous case, you can combine RAG with your custom fine-tuned model for improved relevancy.

    1. Yuen-Hsien Tseng「During the pre-training phase, GPT predicts missing words in sentences based on the surrounding context.」預測句子中缺失的單詞來學習上下文的關係,是BERT,不是GPT。

      BERT?

    1. 基於變換器的雙向編碼器表示技術(英語:Bidirectional Encoder Representations from Transformers,BERT)是用於自然語言處理(NLP)的預訓練技術,由Google提出。[1][2]2018年,雅各布·德夫林和同事建立並發布了BERT。Google正在利用BERT來更好地理解使用者搜尋語句的語意。[3] 2020年的一項文獻調查得出結論:「在一年多一點的時間裡,BERT已經成為NLP實驗中無處不在的基線」,算上分析和改進模型的研究出版物超過150篇。[4] 最初的英語BERT發布時提供兩種類型的預訓練模型[1]:(1)BERTBASE模型,一個12層,768維,12個自注意頭(self attention head),110M參數的神經網路結構;(2)BERTLARGE模型,一個24層,1024維,16個自注意頭,340M參數的神經網路結構。兩者的訓練語料都是BooksCorpus[5]以及英語維基百科語料,單詞量分別是8億以及25億。

      BERT

      cf

    1. I am even more attuned to creative rights. We can address algorithms of exploitation by establishing creative rights that uphold the four C’s: consent, compensation, control, and credit. Artists should be paid fairly for their valuable content and control whether or how their work is used from the beginning, not as an afterthought.

      Consent, compensation, control, and credit for creators whose content is used in AI models

    1. Fine-tuning takes a pre-trained LLM and further trains the model on a smaller dataset, often with data not previously used to train the LLM, to improve the LLM’s performance for a particular task.

      LLMs can be extended with both RAG and Fine-Tuning Fine-tuning is appropriate when you want to customize a LLM to perform well in a particular domain using private data. For example, you can fine-tune a LLM to become better at producing Python programs by further training the LLM on high-quality Python source code.

      In contrast, you should use RAG when you are able to augment your LLM prompt with data that was not known to your LLM at the time of training, such as real-time data, personal (user) data, or context information useful for the prompt.

    2. Vector databases are used to retrieve relevant documents using similarity search. Vector databases can be standalone or embedded with the LLM application (e.g., Chroma embedded vector database). When structured (tabular) data is needed, an operational data store, such as a feature store, is typically used. Popular vector databases and feature stores are Weaviate and Hopsworks that both provide time-unlimited free tiers.
    3. RAG LLMs can outperform LLMs without retrieval by a large margin with much fewer parameters, and they can update their knowledge by replacing their retrieval corpora, and provide citations for users to easily verify and evaluate the predictions.
    1. The key enablers of this solution are * The embeddings generated with Vertex AI Embeddings for Text * Fast and scalable vector search by Vertex AI Vector Search

      Embeddings space is a map of the context of the meanings. Basically, values are assigned in n-dimensional space tied to the similar semantic inputs - tying meaning between concepts.

      Example of vectorized n-dimensional embedding

    2. With the embedding API, you can apply the innovation of embeddings, combined with the LLM capability, to various text processing tasks, such as:LLM-enabled Semantic Search: text embeddings can be used to represent both the meaning and intent of a user's query and documents in the embedding space. Documents that have similar meaning to the user's query intent will be found fast with vector search technology. The model is capable of generating text embeddings that capture the subtle nuances of each sentence and paragraphs in the document.LLM-enabled Text Classification: LLM text embeddings can be used for text classification with a deep understanding of different contexts without any training or fine-tuning (so-called zero-shot learning). This wasn't possible with the past language models without task-specific training.LLM-enabled Recommendation: The text embedding can be used for recommendation systems as a strong feature for training recommendation models such as Two-Tower model. The model learns the relationship between the query and candidate embeddings, resulting in next-gen user experience with semantic product recommendation.LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also handled with the LLM-level deep semantics understanding.
    3. Grounded to business facts: In this demo, we didn't try having the LLM to memorize the 8 million items with complex and lengthy prompt engineering. Instead, we attached the Stack Overflow dataset to the model as an external memory using vector search, and used no prompt engineering. This means, the outputs are all directly "grounded" (connected) to the business facts, not the artificial output from the LLM. So the demo is ready to be served today as a production service with mission critical business responsibility. It does not suffer from the limitation of LLM memory or unexpected behaviors of LLMs such as the hallucinations.
    1. Preparation Steps * Ingest data into a database. The destination may be an array or a JSON data type. * Harmonize data. This is a lightweight data transformation step * Encode data. This step is used to convert the ingested data into embeddings. One option is to use an external API. For example, OpenAI’s ADA and sentence_transformer have many pre-trained models to convert unstructured data like images and audio into vectors. * Load embedding vectors. data is moved to a table that mirrors the original table but has an additional column of type ‘vector, ’ JSON or a blob that stores the vectors. * Performance tuning. SingleStoreDB provides JSON_ARRAY_PACK. And indexing vector using HNSW as mentioned earlier. This allows parallel scans using SIMD.

    2. In the new AI model, you ingest the data in real time, apply your models by reaching to one or multiple GPT services and action on the data while your users are in the online experience. These GPT models may be used for recommendation, classification personalization, etc., services on real-time data. Recent developments, such as LangChain and AutoGPT, may further disrupt how modern applications are deployed and delivered.
    3. Let’s say, for example, you search for a very specific product on a retailer’s website, and the product is not available. An additional API call to an LLM with your request that returned zero results may result in a list of similar products. This is an example of a vector search, which is also known as a similarity or semantic search.
    4. Modes of Private Data consumption: 1. Train Custom LLM - requires massive infrastructure, investment, and deep AI skills 2. Tune the LLM - utilizes model weights to fine-tune an existing model- new category of LLMOps - similar issue to #1 3. Prompt general-purpose LLMs - uses modeled context input with Retrieval Augmented Generation (Facebook)

      For leveraging prompts, there are two options:

      Short-term memory for LLMs that use APIs for model inputs Long-term memory for LLMs that persist the model inputs. Short-term memory is ephemeral while long-term memory introduces persistence.

    5. Conventional search works on keys. However, when the ask is a natural query, that sentence needs to be converted into a structure so that it can be compared with words that have similar representation. This structure is called an embedding. An embedding uses vectors that assign coordinates into a graph of numbers — like an array. An embedding is high dimensional as it uses many vectors to perform semantic search.

      When a search is made on a new text, the model calculates the “distance” between terms. For example, searching for “king” is closer to “man,” than to “woman.” This distance is calculated on the “nearest neighbors” using functions like, cosine, dot product and Euclidean. his is where “approximate nearest neighbors” (ANN) algorithms are used to reduce the vector search space. A very popular way to index the vector space is through a library called ‘Hierarchical Navigable Small World (HNSW).’ Many vector databases and libraries like FAISS use HNSW to speed up vector search.

    6. The different options for storing and querying vectors for long-term memory in AI search. The options include: * Native vector databases - many non-relational DBMSs are adding vectors such as Elastic. Others are Pinecone Qdrant, etc * SingleStoreDB support vector embeddings and semantic search * Apache Parquet or CSV columnar data - slow indicies if used

    1. Retrieval Augmented Generation (RAG) is a method in natural language processing (NLP) that combines the power of both neural language models and information retrieval methods to generate responses or text that are informed by a large body of knowledge. The concept was introduced by Facebook AI researchers and represents a hybrid approach to incorporating external knowledge into generative models.

      RAG models effectively leverage a large corpus of text data without requiring it to be stored in the parameters of the model. This is achieved by utilizing a retriever-generator framework:

      1. The Retriever component is responsible for finding relevant documents or passages from a large dataset (like Wikipedia or a corpus of scientific articles) that are likely to contain helpful information for generating a response. This retrieval is typically based on vector similarity between the query and the documents in the dataset, often employing techniques like dense passage retrieval (DPR).

      2. The Generator component is a large pre-trained language model (like BART or GPT-2) that generates a response by conditioning on both the input query and the documents retrieved by the retriever. It integrates the information from the external texts to produce more informed, accurate, and contextually relevant text outputs.

      The RAG model performs this process in an end-to-end differentiable, meaning it can be trained in a way that updates both the retriever and generator components to minimize the difference between the generated text and the target text. The retriever is typically optimized to select documents that will lead to a correct generation, while the generator is optimized to produce accurate text given the input query and the retrieved documents.

      To summarize, RAG allows a generative model to:

      • Access vast amounts of structured or unstructured external data.
      • Answer questions or generate content that requires specific knowledge not contained within the model itself.
      • Benefit from up-to-date and expansive datasets, assuming the retriever's corpus is kept current.

      RAG addresses the limitation of standard language models that must rely solely on their internal parameters for generating text. By augmenting generation with on-the-fly retrieval of relevant context, RAG-equipped models can produce more detailed, accurate, and nuanced outputs, especially for tasks like question answering, fact-checking, and content creation where detailed world knowledge is crucial.

      This technique represents a significant advancement in generative AI, allowing models to provide high-quality outputs without memorizing all the facts internally, but rather by knowing (GPT4-0web)

  20. Oct 2023
    1. Plex is a scientific philosophy. Instead of claiming that science is so powerfulthat it can explain the understanding of understanding in question, we takeunderstanding as the open question, and set about to determine what scienceresults. [It turns out to be precisely the science we use every day, so nothingneed be discarded or overturned - but many surprises result. Some very simpleexplanations for some very important scientific observations arise naturally inthe course of Plex development. For example, from the First Definition, thereare several Plex proofs that there was no beginning, contrary to StephenHawking's statement that "this idea that time and space should be finite withoutboundary is just a proposal: it cannot be deduced from some other principle."(A Brief History of Time, p. 136.) The very concept of a "big bang" is strictlyan inherent artifact of our science's view of the nature of nature. There was no"initial instant" of time.]Axioms are assumptions. Plex has no axioms - only definitions. (Only) Noth-ing is assumed to be known without definition, and even that is "by definition" ,

      It doesn't claim that science can explain everything, but rather, it uses science to explore and understand our understanding of the world. The surprising part is that the science it uses is the same science we use daily, so nothing new needs to be learned or old knowledge discarded.

      One example of a surprising discovery made through Plex is that, contrary to Stephen Hawking's theory, there was no beginning to time and space. This contradicts the popular "big bang" theory, which suggests there was an initial moment when time and space began. According to Plex, this idea of a "big bang" is just a result of how our current science views the nature of the universe.

      Plex also differs from other scientific approaches in that it doesn't rely on axioms, which are assumptions made without proof. Instead, Plex only uses definitions, meaning it only accepts as true what can be clearly defined and understood.

      We're saying let's consider the concept of a "big bang". In traditional science, we might assume the existence of a "big bang" like this:

      instead of thinking big_bang = True

      But in Plex, we would only accept the "big bang" if we can define it:

      python def big_bang(): # Define what a "big bang" is # If we can't define it, then it doesn't exist in Plex pass

      Let's not assume reality but rather just try to define the elements we need to use

  21. Sep 2023
  22. Aug 2023