5 Matching Annotations
  1. Apr 2026
    1. Cost (USD) to run the evaluation: GPT-5.4 (xhigh): $1,110, Claude Opus 4.6 (max): $1,055

      运行一次 452 个任务的评测,GPT-5.4 花费 1110 美元,Claude Opus 4.6 花费 1055 美元——每个任务平均约 2.3 美元。而 Gemini 3 Flash 只需要 596 美元,实现了 27.7% 的成绩(vs 顶级模型的 33.3%)。这个性价比数据对 AI 选型决策极为关键:如果业务场景可以接受 27% 而非 33% 的成功率,Gemini 3 Flash 能节省近一半成本。在金融服务的大规模部署中,这个差异将被放大数千倍。

  2. Dec 2023
    1. Interpreting accuracy is one of the most commonly used indicators of cognitive demands in experimental interpreting studies. One possibility to assess interpreting performance is to analyse interpreting accuracy based on meaning units. The methodological approaches used thus far, however, have some drawbacks: (a) they are limited to an assessment of sense consistency with no indication of the logical cohesion of the rendition, (b) they do not take into account the difference between unintended and strategic omissions or, more generally, the prioritization of source speech information as an interpreting strategy, and (c) they do not allow for the observation of fluctuations of cognitive load or effects of fatigue. In this article, we will present a refined approach to unit-based accuracy analysis that may contribute to solving the issues mentioned above.

      This piques my interest, especially (b).

      口譯訊息的遺漏:刻意(運用口譯策略),還是無心(因爲無力)?

      源語訊息的權重:每個meaning unit肯定有不同權重,而且權重的認定很主觀。

      整個語篇論述的語意連貫、邏輯銜接、承轉(cohesion),也是一大挑戰,如何判定?銜接詞是否僅是一個語義單位,給予某一權重,還是自成一格,必須另外設計評量方式?

  3. Feb 2021
  4. Jul 2020
  5. May 2020