Hypothesis

2 Matching Annotations

May 2026
jack-clark.net jack-clark.net

Import AI 455: Automating AI Research

1
1. fxp007 15 May 2026
  
  in Public
  
  In 2022, GPT 3.5 could do tasks that might take a person about ~30 seconds. In 2023, this rose to 4 minutes with GPT-4. In 2024, this rose to 40 minutes (o1). In 2025, it reached ~6 hours (GPT 5.2 (High)). In 2026, it has already risen to ~12 hours (Opus 4.6).
  
  AI系统能独立完成任务的时间从2022年的30秒大幅增加到2026年的12小时，展示了AI自主工作能力的指数级增长。
  
  capability-scaling time-horizon
Visit annotations in context

Tags

capability-scaling

time-horizon

Annotators

fxp007

URL

jack-clark.net/2026/05/04/import-ai-455-automating-ai-research/
Apr 2026
aisle.com aisle.com

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

1
1. fxp007 17 Apr 2026
  
  in Public
  
  The capability rankings reshuffled completely across tasks. There is no stable best model across cybersecurity tasks. The capability frontier is jagged.
  
  这一发现揭示了AI安全能力的'锯齿状前沿'现象，不同模型在不同安全任务上的表现差异巨大。这表明不存在'一刀切'的最佳安全模型，而是需要根据具体任务选择合适的模型，这对AI安全系统的设计有重要启示。
  
  model-evaluation security-tasks capability-scaling
Visit annotations in context

Tags

capability-scaling

model-evaluation

security-tasks

Annotators

fxp007

URL

aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

Tags

Annotators

URL

Tags

Annotators

URL