Hypothesis

Cost (USD) to run the evaluation: GPT-5.4 (xhigh): $1,110, Claude Opus 4.6 (max): $1,055

运行一次 452 个任务的评测，GPT-5.4 花费 1110 美元，Claude Opus 4.6 花费 1055 美元——每个任务平均约 2.3 美元。而 Gemini 3 Flash 只需要 596 美元，实现了 27.7% 的成绩（vs 顶级模型的 33.3%）。这个性价比数据对 AI 选型决策极为关键：如果业务场景可以接受 27% 而非 33% 的成功率，Gemini 3 Flash 能节省近一半成本。在金融服务的大规模部署中，这个差异将被放大数千倍。

cost-analysis 2-dollars-per-task cost-performance model-selection

Tags

Annotators

URL