We found weak evidence that Opus 4.0 and 4.1 had partially memorized cal, but no evidence Opus 4.6 had memorized it, despite performing best of all models considered.
这一发现令人意外,因为性能最佳的模型反而没有表现出记忆效应。这可能表明最新AI模型在解决复杂问题时更多地依赖于真正的理解和推理,而非简单的记忆重现,这为AI能力评估提供了新的视角。