5 Matching Annotations
  1. May 2026
  2. Apr 2026
    1. you can run an inferior model for an infinite number of tokens, and it will never realize(*) that the lack of validation of the start window, if put together with the integer overflow, then put together with the fact the branch where the node should never be NULL is entered regardless, will produce the bug.

      作者通过OpenBSD SACK bug的例子提供了一个令人惊讶的发现:弱模型无论运行多久都无法理解复杂漏洞的因果关系。这揭示了AI在理解复杂系统交互方面的根本局限性,挑战了'无限计算可解决任何问题'的假设。

    1. We found weak evidence that Opus 4.0 and 4.1 had partially memorized cal, but no evidence Opus 4.6 had memorized it, despite performing best of all models considered.

      这一发现令人意外,因为性能最佳的模型反而没有表现出记忆效应。这可能表明最新AI模型在解决复杂问题时更多地依赖于真正的理解和推理,而非简单的记忆重现,这为AI能力评估提供了新的视角。

  3. Jul 2022