Hypothesis

2 Matching Annotations

Jun 2026
www.anthropic.com www.anthropic.com

When AI builds itself

2
1. fxp007 12 Jun 2026
  
  in Public
  
  our best model in November 2025 (Opus 4.5) beat the human choice 51% of the time; in April 2026 (Mythos Preview), this grew to 64%
  
  研究判断力的进化：从51%（略好于随机）到64%，6个月内提升13个百分点。但这个设计本身值得仔细审视：实验选取的是「人类做出了次优选择」的时刻（n=129），因此这不是无偏的人机对比，而是「在人类容易出错的情境下，模型犯同样错误的频率有多低」。即便如此，从51%到64%意味着：模型不只是在执行层超越人类，在判断层也开始建立优势——而判断层正是这篇文章认为「人类最后的比较优势」所在。
  
  数据研究判断非共识
2. fxp007 12 Jun 2026
  
  in Public
  
  our best model in November 2025 (Opus 4.5) beat the human choice 51% of the time; in April 2026 (Mythos Preview), this grew to 64%
  
  研究判断力的进化：从51%（略好于随机）到64%，6个月内提升13个百分点。但这个设计本身值得仔细审视：实验选取的是「人类做出了次优选择」的时刻（n=129），因此这不是无偏的人机对比，而是「在人类容易出错的情境下，模型犯同样错误的频率有多低」。即便如此，从51%到64%的提升意味着：模型不只是在执行层超越人类，在判断层也开始建立优势——而判断层正是这篇文章认为「人类最后的比较优势」所在。
  
  数据研究判断非共识
Visit annotations in context

Tags

非共识

数据

研究判断

Annotators

fxp007

URL

anthropic.com/institute/recursive-self-improvement

Tags

Annotators

URL