6,559 Matching Annotations
  1. May 2026
    1. I don’t buy the “~2% of new prosumer signups” thing, since everyone I’ve talked to is seeing the new pricing grid and the Internet Archive has already [snapped a copy](https://web.archive.org/web/20260422001250/https://claude.com/pricing).

      作者对Anthropic所说的“仅对2%的新用户进行小规模测试”的说法表示怀疑,这表明可能存在更大的影响范围。

  2. Apr 2026
    1. existing TTS methods often discard the exploration trajectory and treat the final answer as the only useful outcome

      在测试时扩展(Test-time scaling)领域,主流观点认为只有最终结果才是有价值的,探索过程只是达到结果的手段。但作者认为被忽视的探索轨迹实际上是一个丰富的数据源,可以加速智能体从经验中学习的能力。这一观点挑战了传统TTS方法的价值评估标准。

    1. den Proben, die Aug. V o g e l in Miinchen angestellt hat, wobei derselbe bemerkt, dass das Wasser aus verschie- denen Pumpbrunnen in quantitativer Hinaicbt nur geringe Vcrschiedenheit zeigte. Die Probe

      Australia (https://example.org/vocab/iso3166-1-alpha-3/AUS) Confidence: 95%

      den Proben, die Aug. V o g e l in Miinchen angestellt hat, wobei derselbe bemerkt, dass das Wasser aus verschie- denen Pumpbrunnen in quantitativer Hinaicbt nur geringe Vcrschiedenheit zeigte. Die Probe


      Peru (https://example.org/vocab/iso3166-1-alpha-3/PER) Confidence: 95%

      rgab, dass das Wasser eine 10 Milli- ramm Uebermangansaure zersetzende Menge or anischer gubstanaen per Liter enthielt, wogegen gutes f3runnen- wasser nur 1 bis 2 Milligramm Uebermangansiiure aer- setzen

    1. Over time, the risk grows that the document is no longer accessible at the loca-tion given as reference. Web servers that follow the HTTP protocol then give the notorious reply: ‘404 not found’. This resembles the situation of a book in a – very large – li

      info for annotation 1

      Over time, the risk grows that the document is no longer accessible at the loca-tion given as reference. Web servers that follow the HTTP protocol then give the notorious reply: ‘404 not found’. This resembles the situation of a book in a – very large – library that is not on the shelf at the position indicated in the cata-logue. How is it to be found?

    1. the robustness of these reasoning behaviors remains underexplored

      「推理行为的鲁棒性尚未被充分探索」——这句话是整个推理模型研究领域的集体盲点声明。过去两年,测试时计算(test-time compute)、长思维链(CoT)、o1/R1 类推理模型吸引了巨大关注,但几乎所有评测都在「孤立问题」环境下进行。在真实 Agent 部署场景中,「能否保持推理深度」这个最基本的可靠性问题,直到这篇论文才开始被系统研究。

  3. Mar 2026