2 Matching Annotations
  1. Last 7 days
    1. We found weak evidence that Opus 4.0 and 4.1 had partially memorized cal, but no evidence Opus 4.6 had memorized it, despite performing best of all models considered.

      这一发现令人意外,因为性能最佳的模型反而没有表现出记忆效应。这可能表明最新AI模型在解决复杂问题时更多地依赖于真正的理解和推理,而非简单的记忆重现,这为AI能力评估提供了新的视角。

  2. Jul 2022