6 Matching Annotations
  1. May 2026
  2. Apr 2026
    1. But the real power of agents comes when they can work as a team. Instead of lone-wolf bots carrying out single tasks, such as using a browser to make a restaurant reservation or sending you a summary of your inbox, new tools can yoke together multiple agents, give each of them a different job, and orchestrate their behaviors so that they all pull together to complete more complex tasks than an individual agent could do by itself.

      主流观点可能认为人工智能代理将独立完成工作,但作者指出,它们的真正力量在于团队合作,通过协同工作完成比单个代理更复杂的任务。

    1. It maintains 97% skill compliance across 40 complex skills on MM Claw, each skill exceeding 2,000 tokens.

      97%的技能合规率是一个非常高的指标,特别是在处理超过2000个token的复杂技能时。这表明M2.7不仅能够理解复杂指令,还能在长时间任务中保持一致性和可靠性。对于需要构建复杂代理工作流的开发者来说,这一数据点特别有价值,因为它意味着模型可以可靠地执行多步骤、高复杂度的任务。

    1. we may see a growing divergence between the capabilities we can measure and the capabilities we actually care about.

      「可测量的能力」与「真正关心的能力」之间的分歧正在扩大——这是整篇文章最深刻的洞见。所有当前 benchmark 都偏向「干净、自包含、可自动评分」的任务,而真实工作是「混乱、跨系统、需人类判断」的。随着 AI 向长任务延伸,这个测量-现实之间的鸿沟不会缩小,只会加速扩大。这意味着未来关于「AI 能否替代某类工作」的争论,将越来越难以用数据解决——因为数据本身无法捕捉真实工作的本质。

  3. Aug 2020