Hypothesis

The task-completion time horizon is the task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability.

令人惊讶的是，「时间地平线」衡量的不是 AI 花了多长时间，而是人类完成同等任务需要多久——这个设计决策揭示了评测哲学的深层选择：以人类劳动时间作为任务难度的标尺，而非 AI 的实际耗时。这意味着「2 小时时间地平线」是一个关于任务复杂度的声明，而不是关于 AI 速度的声明。两者经常被混淆，而这个混淆正是公众误解 AI 能力的根源之一。

time-horizon definition measurement-philosophy surprising

Tags

Annotators

URL