Hypothesis

20 Matching Annotations

Last 7 days
thesequence.substack.com thesequence.substack.com

https://thesequence.substack.com/p/the-sequence-radar-885-last-week

1
1. fxp007 03 Jul 2026
  
  in Public
  
  we need arenas where models reveal themselves under pressure, with imperfect information, feedback loops, and consequences.
  
  反直觉的观点：传统的静态排行榜可能正在失效。在复杂环境中，模型的智能应该体现为可执行的策略而非单纯的文本回答。将 AI 评测转化为类似足球比赛的高压动态博弈，揭示了未来评测体系向“后果驱动”和“多智能体交互”演进的趋势。
  
  counterintuitive ai-evaluation multi-agent
Visit annotations in context

Tags

counterintuitive

ai-evaluation

multi-agent

Annotators

fxp007

URL

thesequence.substack.com/p/the-sequence-radar-885-last-week
Jun 2026
sakana.ai sakana.ai

https://sakana.ai/fugu/

1
1. fxp007 26 Jun 2026
  
  in Public
  
  Instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns.
  
  大多数人认为多智能体系统需要预先定义的角色分工和工作流程，但作者认为Fugu系统能够自主发现并学习非直观但高效的协作模式，打破了传统AI系统设计中的预设框架思维。这种自组织能力挑战了当前多智能体系统设计的共识。
  
  non-consensus multi-agent self-organization
Visit annotations in context

Tags

self-organization

non-consensus

multi-agent

Annotators

fxp007

URL

sakana.ai/fugu/
hai.stanford.edu hai.stanford.edu

AI Coding Agents Fail at Teamwork | Stanford HAI

2
1. fxp007 04 Jun 2026
  
  in Public
  
  social intelligence – not coding skill – is the key bottleneck for AI collaboration
  
  【洞察】「社会智能而非编程能力，才是 AI 协作的关键瓶颈」——这是本研究最深刻的发现。Agent B 收到警告说代码会冲突，它的回复是「我理解你的担忧，我还是会这样做」，然后覆盖了 Agent A 的代码。这不是技术 bug，而是训练目标的系统性缺陷：LLM 被训练成「用语言描述任务」而不是「用语言进行社交协调」。未来 Agent 研究的核心挑战，是让 AI 学会信任、让步和妥协。
  
  social-intelligence coordination-failure multi-agent-training insight
2. fxp007 04 Jun 2026
  
  in Public
  
  Today's best coding agents lose nearly half their capability when paired up to share work.
  
  【令人震惊】斯坦福 CooperBench 发现：当两个顶级 Coding Agent 协作时，性能下降近 50%！这彻底打破了「Agent 越多越好」的直觉。更令人不安的是，失败集中在「中等难度」任务的甜区——恰好是最应该从协作中受益的区间。这对 Multi-Agent 架构设计者是一个严峻的警示：规模化 Agent 系统的瓶颈不在算力，而在「社会智能」。
  
  CooperBench 50-percent-drop multi-agent coordination-gap shocking
Visit annotations in context

Tags

coordination-gap

social-intelligence

insight

multi-agent-training

shocking

coordination-failure

CooperBench

50-percent-drop

multi-agent

Annotators

fxp007

URL

hai.stanford.edu/news/ai-coding-agents-fail-at-teamwork
May 2026
sakana.ai sakana.ai

Sakana AI

1
1. fxp007 08 May 2026
  
  in Public
  
  This foundational research is part of the core engine powering our multi-agent product: Sakana Fugu
  
  作者将他们的多智能体产品描述为'核心引擎'，暗示其重要性超过了单一模型方法，这挑战了当前市场上大多数AI产品基于单一大模型的架构设计理念。
  
  non-consensus product-design multi-agent
Visit annotations in context

Tags

non-consensus

product-design

multi-agent

Annotators

fxp007

URL

sakana.ai/trinity/
openai.com openai.com

https://openai.com/index/open-source-codex-orchestration-symphony/

1
1. fxp007 01 May 2026
  
  in Public
  
  Symphony also shines in large multi-agent workflows, where multiple agents work together on a single task.
  
  非共识观点：Symphony在大型多代理工作流程中表现出色，挑战了传统单代理任务的观念。
  
  non-consensus multi-agent-workflow
Visit annotations in context

Tags

multi-agent-workflow

non-consensus

Annotators

fxp007

URL

openai.com/index/open-source-codex-orchestration-symphony/
www.llmwatch.com www.llmwatch.com

https://www.llmwatch.com/p/ai-agents-of-the-week-papers-you-cbd

1
1. fxp007 01 May 2026
  
  in Public
  
  Both illustrate how decomposing complex tasks across specialized agents can address problems that monolithic models handle poorly.
  
  这一观点提出了多智能体架构在处理复杂任务中的优势，为解决单一模型难以处理的问题提供了新的解决方案。
  
  multi-agent-systems complex-tasks
Visit annotations in context

Tags

multi-agent-systems

complex-tasks

Annotators

fxp007

URL

llmwatch.com/p/ai-agents-of-the-week-papers-you-cbd
Apr 2026
www.kimi.com www.kimi.com

https://www.kimi.com/blog/kimi-k2-6

1
1. fxp007 26 Apr 2026
  
  in Public
  
  The architecture scales horizontally to 300 sub-agents executing across 4,000 coordinated steps simultaneously, a substantial expansion from K2.5's 100 sub-agents and 1,500 steps.
  
  大多数人认为AI系统的扩展主要依赖于增加单个模型的计算能力和参数规模，而非增加智能体的数量。作者提出的300个智能体并行执行的模式挑战了这一认知，暗示未来AI发展可能更侧重于'多智能体协作'而非'单一模型增强'，这可能会重新定义AI系统的架构设计原则。
  
  counterintuitive ai-scaling multi-agent-systems
Visit annotations in context

Tags

multi-agent-systems

ai-scaling

counterintuitive

Annotators

fxp007

URL

kimi.com/blog/kimi-k2-6
sakana.ai sakana.ai

https://sakana.ai/fugu-beta/

1
1. fxp007 24 Apr 2026
  
  in Public
  
  Sakana Fugu coordinates pools of frontier foundation models to achieve state-of-the-art performance across coding, mathematics, scientific reasoning, etc.
  
  大多数人认为最先进的AI系统应该是单一的大型基础模型，但作者认为通过协调多个前沿基础模型组成的系统可以达到更好的性能。这挑战了当前AI行业追求更大单一模型的趋势，提出了一个多模型协作的替代路径。
  
  non-consensus multi-agent foundation-models
Visit annotations in context

Tags

foundation-models

non-consensus

multi-agent

Annotators

fxp007

URL

sakana.ai/fugu-beta/
arxiv.org arxiv.org

https://arxiv.org/abs/2604.15034

2
1. fxp007 24 Apr 2026
  
  in Public
  
  Building on AGP, we present Autogenesis System (AGS), a self-evolving multi-agent system that dynamically instantiates, retrieves, and refines protocol-registered resources during execution.
  
  传统多代理系统通常在运行前就定义好所有组件和交互方式，但作者提出了一种在执行过程中动态实例化、检索和细化协议注册资源的系统。这与静态部署、预定义架构的主流AI系统设计理念背道而驰，暗示了一种更加动态和自适应的系统架构。
  
  non-consensus multi-agent dynamic-instantiation
2. fxp007 24 Apr 2026
  
  in Public
  
  Building on AGP, we present Autogenesis System (AGS), a self-evolving multi-agent system that dynamically instantiates, retrieves, and refines protocol-registered resources during execution.
  
  大多数人认为多智能体系统应该在设计阶段就确定各个智能体的角色和交互方式，而不是在执行过程中动态调整。但作者提出的AGS系统强调在运行时动态实例化、检索和细化协议注册的资源，这挑战了传统多智能体系统的设计范式，引入了一种更加灵活和动态的智能体协作方式。
  
  non-consensus multi-agent dynamic-systems
Visit annotations in context

Tags

dynamic-instantiation

dynamic-systems

non-consensus

multi-agent

Annotators

fxp007

URL

arxiv.org/abs/2604.15034
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/04/21/1135654/agent-orchestration-ai-artificial-intelligence/

1
1. fxp007 23 Apr 2026
  
  in Public
  
  And it’s not just office work. Multi-agent tools like Google DeepMind’s Co-Scientist let researchers use teams of AI agents to coordinate literature searches, generate and test hypotheses, design experiments, and more.
  
  大多数人可能认为人工智能在办公室工作中的应用仅限于数据处理，但作者提出，多智能体工具甚至可以用于研究工作，如文献搜索和实验设计。
  
  non-consensus ai-research-applications multi-agent-tools
Visit annotations in context

Tags

multi-agent-tools

non-consensus

ai-research-applications

Annotators

fxp007

URL

technologyreview.com/2026/04/21/1135654/agent-orchestration-ai-artificial-intelligence/
ai.meta.com ai.meta.com

https://ai.meta.com/blog/introducing-muse-spark-msl/

3
1. fxp007 17 Apr 2026
  
  in Public
  
  scaling Muse Spark with multi-agent thinking enables superior performance with comparable latency.
  
  这一结果挑战了传统认知，即增加推理时间必然导致延迟增加，表明多智能体并行可能是实现高效推理的关键，为未来AI架构设计提供了新思路。
  
  multi-agent latency-optimization
2. fxp007 17 Apr 2026
  
  in Public
  
  Contemplating mode provides significant capability improvements in challenging tasks, achieving 58% in Humanity's Last Exam and 38% in FrontierScience Research.
  
  这些具体数字展示了多智能体并行推理的惊人效果，接近人类水平的能力提升，暗示了AI协作模式可能成为解决复杂问题的关键路径，而非单纯扩大模型规模。
  
  multi-agent performance-metrics
3. fxp007 16 Apr 2026
  
  in Public
  
  scaling Muse Spark with multi-agent thinking enables superior performance with comparable latency.
  
  令人惊讶的是：通过扩展并行智能体的数量而非延长单个智能体的思考时间，Muse Spark能够在保持相近延迟的同时实现更优性能。这种多智能体协调的推理方式挑战了传统AI模型通过增加计算时间提高性能的范式，为高效推理提供了新思路。
  
  surprising multi-agent ai-scaling
Visit annotations in context

Tags

latency-optimization

ai-scaling

surprising

performance-metrics

multi-agent

Annotators

fxp007

URL

ai.meta.com/blog/introducing-muse-spark-msl/
news.smol.ai news.smol.ai

https://news.smol.ai/issues/26-04-08-not-much

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Meta also explicitly highlighted parallel multi-agent inference as a way to improve performance at similar latency
  
  令人惊讶的是，Meta明确强调了并行多代理推理作为在相似延迟下提高性能的方法。这表明AI系统正在从单一模型向多代理系统演进，可能是解决复杂问题的新范式，同时也暗示了未来AI系统架构的重大转变。
  
  surprising multi-agent-systems ai-architecture
Visit annotations in context

Tags

surprising

multi-agent-systems

ai-architecture

Annotators

fxp007

URL

news.smol.ai/issues/26-04-08-not-much
www.anthropic.com www.anthropic.com

Harness design for long-running application development

1
1. fxp007 09 Apr 2026
  
  in Public
  
  tuning a standalone evaluator to be skeptical turns out to be far more tractable
  
  深刻揭示了LLM自我评价的局限性：生成器难以对自身工作保持批判性。通过解耦生成与评估，并刻意调优独立评估器的“怀疑态度”，能有效打破AI自嗨的闭环。这种对抗式架构是提升输出质量的强效杠杆。
  
  self-evaluation multi-agent core-argument
Visit annotations in context

Tags

core-argument

multi-agent

self-evaluation

Annotators

fxp007

URL

anthropic.com/engineering/harness-design-long-running-apps
huggingface.co huggingface.co

Reasoning Shift: How Context Silently Shortens LLM Reasoning

1
1. fxp007 09 Apr 2026
  
  in Public
  
  we conduct a systematic evaluation of multiple reasoning models across three scenarios: (1) problems augmented with lengthy, irrelevant context; (2) multi-turn conversational settings with independent tasks; and (3) problems presented as a subtask within a complex task.
  
  三个测试场景的设计极具现实针对性：场景一对应「RAG 检索塞入大量背景文档」，场景二对应「多轮对话历史积累」，场景三对应「Agent 工作流中的子任务分解」。这三个场景恰好覆盖了当前 AI 产品的主流部署模式——这篇论文实际上是在说：我们正在大规模生产的所有 AI 产品，都可能在不知情的情况下运行着推理能力受损的模型。
  
  RAG multi-turn agent-workflow real-world-scenarios
Visit annotations in context

Tags

agent-workflow

multi-turn

RAG

real-world-scenarios

Annotators

fxp007

URL

huggingface.co/papers/2604.01161
Sep 2023
arxiv.org arxiv.org

1908.01046.pdf

1
1. mark.crowley 15 Sep 2023
  
  in Public
  
  Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
  
  autonomous-driving multi-agent-reinforcement-learning black-box-testing
Visit annotations in context

Tags

autonomous-driving

multi-agent-reinforcement-learning

black-box-testing

Annotators

mark.crowley

URL

arxiv.org/pdf/1908.01046.pdf
Dec 2022
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

marl

multi-agent-reinforcement-learning

conf-neurips-2022

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL