9 Matching Annotations
  1. May 2026
    1. In 2022, GPT 3.5 could do tasks that might take a person about ~30 seconds. In 2023, this rose to 4 minutes with GPT-4. In 2024, this rose to 40 minutes (o1). In 2025, it reached ~6 hours (GPT 5.2 (High)). In 2026, it has already risen to ~12 hours (Opus 4.6).

      AI系统能独立完成任务的时间从2022年的30秒大幅增加到2026年的12小时,展示了AI自主工作能力的指数级增长。

    1. For example, this could bring a five hour (300 minute) time horizon down to a three minute time horizon. But while the time horizons are much shorter, the growth rate is about the same as the METR's main results, with roughly two doublings each year.

      作者提到视觉计算机使用任务的时间跨度可能比主要结果缩短40-100倍,但增长率相似,约为每年翻两倍。这一数据点揭示了AI在不同任务领域的能力差异,以及计算机使用任务的特殊挑战,这对理解AI自动化进程的复杂性提供了重要见解。

    2. By the end of the year, we expect AI to be able to do tasks roughly one day long with a 50% success rate. In comparison, I'd guess that this task would take several days for a person familiar with the paper and is able to play around with the web interface.

      作者引用了METR的时间预测数据,即到2026年底,AI完成一天长度任务的成功率约为50%。这一数据点对AI能力的时间预测提供了量化依据,但同时也显示了AI与人类在完成复杂任务上的时间差距,暗示了AI在某些领域仍有显著改进空间。

  2. Apr 2026
    1. AIが8時間近くにわたり自律的にリサーチを遂行し、構造化されたサマリースライドと数十ページの包括的な調査レポートを提供します。

      8 小时自主研究,最终输出结构化 PPT + 数十页完整报告——这个任务时长与 METR 的「时间地平线」框架高度吻合:8 小时恰好是当前顶级 AI Agent 能可靠完成的任务上限。Sakana 选择这个时长不是偶然,而是经过能力校准的精准产品设计——他们在构建一个刚好在当前 AI 能力边界内的产品。

    1. three METR researchers played themselves, with their current priorities, but pretending they had access to ~200-hour time horizon AIs – roughly what we expect 12–18 months from now.

      令人震惊的时间预测:METR 认为 200 小时时间地平线的 AI 将在 12-18 个月内出现——也就是 2027 年底前。当前(2026 年初)最强模型约为 12 小时时间地平线,这意味着在不到两年内,AI 能独立完成的任务复杂度将提升约 17 倍。这不是科幻预言,而是 METR 基于实测数据的指数外推——而他们已经在为这个未来做组织准备了。

    1. The task-completion time horizon is the task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability.

      令人惊讶的是,「时间地平线」衡量的不是 AI 花了多长时间,而是人类完成同等任务需要多久——这个设计决策揭示了评测哲学的深层选择:以人类劳动时间作为任务难度的标尺,而非 AI 的实际耗时。这意味着「2 小时时间地平线」是一个关于任务复杂度的声明,而不是关于 AI 速度的声明。两者经常被混淆,而这个混淆正是公众误解 AI 能力的根源之一。

  3. Jul 2018
    1. Timing as a

      Could the multiple temporalities that symbolize importance account for a source of tension between always online volunteers and those who show up for random periods of time?

      Deployments have fixed time periods for data collection but no scheduling mechanisms for volunteers. Does this create a source of friction when there is no mechanism to signal social intent or meaning?

      How does this problem get reflected in Reddy's TRH model or Mazmanian's porous time idea?

      How can you manage social coordination of rhythms/horizons when there is no signal to convey intent/commitment?

      What part of the SBTF social coordination is spectral, mosaic, rhythmic and/or obligated? And when is it not?