Hypothesis

two participants gave it 9/10 and one "11/10"

一个 2 小时的桌游式推演，三位顶级 AI 安全研究员给出了 9-11 分的评价——这本身就是一个信号：严肃的 AI 研究机构正在用「角色扮演」的方式准备未来。这种方法论（预演未来能力下的工作流）在其他领域有先例——军事桌游、灾难演习、情景规划——但将其用于 AI 能力演进，是 METR 独特的研究品味的体现。

tabletop-exercise future-preparation research-methodology surprising

Tags

Annotators

URL