16 Matching Annotations
  1. Last 7 days
    1. two participants gave it 9/10 and one "11/10"

      一个 2 小时的桌游式推演,三位顶级 AI 安全研究员给出了 9-11 分的评价——这本身就是一个信号:严肃的 AI 研究机构正在用「角色扮演」的方式准备未来。这种方法论(预演未来能力下的工作流)在其他领域有先例——军事桌游、灾难演习、情景规划——但将其用于 AI 能力演进,是 METR 独特的研究品味的体现。

    1. Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior.

      【启发】这句话提示了一种全新的 AI 研究范式:与其问「模型能做什么」,不如问「模型为什么这样做」。把情绪作为切入口去理解模型行为,本质上是把心理学方法论引入了 AI 可解释性研究。这对从业者的启发是:未来最有价值的 AI 研究,可能不在算法创新,而在「为已知现象寻找机制性解释」——就像这篇论文做的那样。

  2. Aug 2025
  3. Apr 2024
  4. Feb 2024
  5. Dec 2021
  6. Nov 2021
    1. (the VTA is also part ofthis system, but is too small to image with standard fMRImethods, but see [35] for successful imaging methods).

      All imaging studies face questions of validity and should (and many do) link to comprehensive details on instrumentation, methodology, and interpretation. Apparently, the professional consensus remains that, properly executed and interpreted, fMRI and other functional imaging techniques based on detection of oxygenation can lead to highly valid conclusions. (See Nautil.us article.)

  7. Jul 2021
  8. Jun 2021
  9. Oct 2020
  10. Sep 2020
  11. Jun 2020