Hypothesis

5 Matching Annotations

Apr 2026
artificialanalysis.ai artificialanalysis.ai

APEX-Agents-AA Benchmark Leaderboard | Artificial Analysis

1
1. fxp007 10 Apr 2026
  
  in Public
  
  Cost (USD) to run the evaluation: GPT-5.4 (xhigh): $1,110, Claude Opus 4.6 (max): $1,055
  
  运行一次 452 个任务的评测，GPT-5.4 花费 1110 美元，Claude Opus 4.6 花费 1055 美元——每个任务平均约 2.3 美元。而 Gemini 3 Flash 只需要 596 美元，实现了 27.7% 的成绩（vs 顶级模型的 33.3%）。这个性价比数据对 AI 选型决策极为关键：如果业务场景可以接受 27% 而非 33% 的成功率，Gemini 3 Flash 能节省近一半成本。在金融服务的大规模部署中，这个差异将被放大数千倍。
  
  cost-analysis 2-dollars-per-task cost-performance model-selection
Visit annotations in context

Tags

2-dollars-per-task

model-selection

cost-analysis

cost-performance

Annotators

fxp007

URL

artificialanalysis.ai/evaluations/apex-agents-aa
Dec 2023
www.tandfonline.com www.tandfonline.com

Interpreting accuracy revisited: a refined approach to interpreting performance analysis

1
1. ekliao 02 Dec 2023
  
  in Public
  
  Interpreting accuracy is one of the most commonly used indicators of cognitive demands in experimental interpreting studies. One possibility to assess interpreting performance is to analyse interpreting accuracy based on meaning units. The methodological approaches used thus far, however, have some drawbacks: (a) they are limited to an assessment of sense consistency with no indication of the logical cohesion of the rendition, (b) they do not take into account the difference between unintended and strategic omissions or, more generally, the prioritization of source speech information as an interpreting strategy, and (c) they do not allow for the observation of fluctuations of cognitive load or effects of fatigue. In this article, we will present a refined approach to unit-based accuracy analysis that may contribute to solving the issues mentioned above.
  
  This piques my interest, especially (b).
  
  口譯訊息的遺漏：刻意（運用口譯策略），還是無心（因爲無力）？
  
  源語訊息的權重：每個meaning unit肯定有不同權重，而且權重的認定很主觀。
  
  整個語篇論述的語意連貫、邏輯銜接、承轉(cohesion)，也是一大挑戰，如何判定？銜接詞是否僅是一個語義單位，給予某一權重，還是自成一格，必須另外設計評量方式？
  
  interpretating performance-analysis
Visit annotations in context

Tags

performance-analysis

interpretating

Annotators

ekliao

URL

tandfonline.com/doi/full/10.1080/0907676X.2022.2088296
Feb 2021
www.sciencedirect.com www.sciencedirect.com

Toward a new economics of science

1
1. jasminehollingworth 27 Feb 2021
  
  in BehSci
  
  Partha, D., & David, P. A. (1994). Toward a new economics of science. Research Policy, 23(5), 487–521. https://doi.org/10.1016/0048-7333(94)01002-1
  
  lang:en is:article economic science policy technology competitiveness security public policy public framework approach analysis open science performance
Visit annotations in context

Tags

security

is:article

economic

technology

framework

science

public policy

competitiveness

policy

lang:en

performance

approach

public

analysis

open science

Annotators

jasminehollingworth

URL

sciencedirect.com/science/article/abs/pii/0048733394010021
Jul 2020
www.nber.org www.nber.org

Mutual Fund Performance and Flows During the COVID-19 Crisis

1
1. Marlene_Wulf 28 Jul 2020
  
  in BehSci
  
  Pastor, L., & Vorsatz, M. B. (2020). Mutual Fund Performance and Flows During the COVID-19 Crisis (Working Paper No. 27551; Working Paper Series). National Bureau of Economic Research. https://doi.org/10.3386/w27551
  
  is:other lang:en COVID-19 analysis performance flow USA funding crisis sustainability
Visit annotations in context

Tags

USA

flow

lang:en

funding

performance

crisis

sustainability

analysis

COVID-19

is:other

Annotators

Marlene_Wulf

URL

nber.org/papers/w27551
May 2020
psycnet.apa.org psycnet.apa.org

APA PsycNet

1
1. Marlene_Wulf 12 May 2020
  
  in BehSci
  
  Can we count on parents to help their children learn at home? (2020, May 8). Evidence for Action. https://blogs.unicef.org/evidence-for-action/can-we-count-on-parents-to-help-their-children-learn-at-home/
  
  lang:en is:article response item approach calibration confidence judgment performance overconfidence modeling miscalibration data analysis
Visit annotations in context

Tags

is:article

modeling

lang:en

overconfidence

performance

item

approach

miscalibration

confidence

response

judgment

analysis

calibration

data

Annotators

Marlene_Wulf

URL

psycnet.apa.org/doiLanding

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL