1 Matching Annotations
  1. Last 7 days
    1. Opus 4.6 turned the vulnerabilities it had found in Mozilla's Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.

      从「几百次中成功2次」到「181次成功+29次寄存器控制」——这不是一个量的提升,而是一个本质性的能力跃迁。漏洞利用开发是安全领域公认的最高技术门槛之一,需要对内存布局、编译器行为和CPU微架构有深刻理解。Opus 4.6的近零成功率意味着这个能力几乎不存在;Mythos Preview的181次意味着这个能力已经可靠地进入了可重复执行的范畴。