Hypothesis

2 Matching Annotations

Jun 2026
deepmind.google deepmind.google

Untitled document

1
1. fxp007 12 Jun 2026
  
  in Public
  
  Most safety evaluations analyze models in isolation
  
  这是当前AI安全研究的结构性盲点。我们知道如何评估单个模型的安全性，但几乎没有工具评估智能体群体的集体行为。类比：你可以测试每个人类个体的理性程度，但无法从个体测试中预测市场崩溃或谣言扩散。复杂系统的涌现行为，从根本上不可从还原论方式预测——这正是这笔$10M资助的存在理由。
  
  涌现行为安全评估盲点复杂系统
Visit annotations in context

Tags

涌现行为

复杂系统

安全评估盲点

Annotators

fxp007

URL

deepmind.google/blog/investing-in-multi-agent-ai-safety-research/
sakana.ai sakana.ai

Untitled document

1
1. fxp007 12 Jun 2026
  
  in Public
  
  this dynamic adversarial process leads to the emergence of increasingly general strategies and reveals an intriguing form of convergent evolution, where different code implementations settle into similar high-performing behaviors
  
  这是全文最重要的实验结果：不同初始条件的独立演化路径，最终收敛到相似的行为策略。这与生物界鸟和蝙蝠各自独立演化出翅膀如出一辙。对 AI 研究者的启示：存在某种「最优策略的引力盆地」——无论从哪个起点出发，对抗压力会把系统推向相同的解。这意味着复杂能力的涌现可能比我们想象的更具必然性。
  
  收敛进化涌现非共识
Visit annotations in context

Tags

涌现

收敛进化

非共识

Annotators

fxp007

URL

sakana.ai/drq/

Tags

Annotators

URL

Tags

Annotators

URL