4 Matching Annotations
  1. Last 7 days
    1. We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.

      大多数人认为AI模型的性能提升主要源于算法和架构的改进。但作者发现,模型在SWE-bench上的成功更多取决于它们是否在训练中见过这些问题,而非真正的编程能力提升。这一观点与行业普遍认为的'模型进步'叙事相悖,暗示当前AI发展评估可能存在严重偏差。

  2. Aug 2021
  3. Apr 2021
  4. Apr 2020