Hypothesis

We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.

「能力涌现」而非「刻意训练」是这篇报告最深刻的政策含义：漏洞发现和利用能力是通用推理能力的副产品，无法被单独抑制。这意味着任何试图「只训练防御能力而屏蔽进攻能力」的方法在根本上是不可行的——使模型更擅长修复漏洞的同样能力，也使它更擅长利用漏洞。这对AI安全治理的含义是：能力限制必须在模型部署层而非训练层实施。

capability-emergence dual-use ai-safety

Tags

Annotators

URL