Hypothesis

2 Matching Annotations

Jun 2026
red.anthropic.com red.anthropic.com

Claude Mythos Preview \ red.anthropic.com

1
1. fxp007 05 Jun 2026
  
  in Public
  
  Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).
  
  Tier 5（完全控制流劫持）的0→10跨越，发生在完全打好补丁的目标上，是这篇报告最令人震惊的数据点。控制流劫持意味着攻击者可以执行任意代码——这是漏洞利用的终极目标。此前的模型从未达到这个等级；Mythos Preview在一次评估中就实现了10次，分布在不同的开源项目上，意味着这不是一个幸运的偶然，而是系统性的能力。
  
  mythos control-flow-hijack security-benchmark
Visit annotations in context

Tags

control-flow-hijack

mythos

security-benchmark

Annotators

fxp007

URL

red.anthropic.com/2026/mythos-preview/
Apr 2026
www.theaivalley.com www.theaivalley.com

https://www.theaivalley.com/p/the-claude-mythos-era

1
1. fxp007 16 Apr 2026
  
  in Public
  
  The model reportedly scored 93.9% on SWE-bench Verified and 77.8% on SWE-bench Pro, but its strongest signal came from real-world results, including uncovering a 27-year-old flaw in OpenBSD, a 16-year-old vulnerability in FFmpeg, and autonomously chaining Linux kernel exploits without human input.
  
  这些惊人的安全漏洞发现能力表明AI已经超越了传统安全工具，能够自主发现几十年未被发现的漏洞。特别是能够自主链接Linux内核漏洞的能力，展示了AI在网络安全领域的革命性潜力，这可能彻底改变安全研究和漏洞修复的方式。
  
  ai-security benchmark-data real-world-results
Visit annotations in context

Tags

ai-security

benchmark-data

real-world-results

Annotators

fxp007

URL

theaivalley.com/p/the-claude-mythos-era

Tags

Annotators

URL

Tags

Annotators

URL