2 Matching Annotations
  1. May 2026
    1. In 2022, GPT 3.5 could do tasks that might take a person about ~30 seconds. In 2023, this rose to 4 minutes with GPT-4. In 2024, this rose to 40 minutes (o1). In 2025, it reached ~6 hours (GPT 5.2 (High)). In 2026, it has already risen to ~12 hours (Opus 4.6).

      AI系统能独立完成任务的时间从2022年的30秒大幅增加到2026年的12小时,展示了AI自主工作能力的指数级增长。

  2. Apr 2026
    1. The capability rankings reshuffled completely across tasks. There is no stable best model across cybersecurity tasks. The capability frontier is jagged.

      这一发现揭示了AI安全能力的'锯齿状前沿'现象,不同模型在不同安全任务上的表现差异巨大。这表明不存在'一刀切'的最佳安全模型,而是需要根据具体任务选择合适的模型,这对AI安全系统的设计有重要启示。