Hypothesis

7 Matching Annotations

Jun 2026
www.anthropic.com www.anthropic.com

https://www.anthropic.com/news/claude-fable-5-mythos-5

1
1. fxp007 09 Jun 2026
  
  in Public
  
  The longer and more complex the task, the larger Fable 5's lead over our other models. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
  
  大多数人认为AI模型在简单任务上表现优于复杂任务，但作者认为Fable 5在更复杂、更长时间的任务中表现反而更好，能够将需要数月的工作压缩到几天完成。这挑战了人们对AI能力随任务复杂度增加而下降的普遍预期，暗示先进AI可能在复杂任务中展现出不成比例的能力提升。
  
  non-consensus ai-capabilities complex-tasks
Visit annotations in context

Tags

ai-capabilities

complex-tasks

non-consensus

Annotators

fxp007

URL

anthropic.com/news/claude-fable-5-mythos-5
May 2026
www.llmwatch.com www.llmwatch.com

https://www.llmwatch.com/p/ai-agents-of-the-week-papers-you-cbd

1
1. fxp007 01 May 2026
  
  in Public
  
  Both illustrate how decomposing complex tasks across specialized agents can address problems that monolithic models handle poorly.
  
  这一观点提出了多智能体架构在处理复杂任务中的优势，为解决单一模型难以处理的问题提供了新的解决方案。
  
  multi-agent-systems complex-tasks
Visit annotations in context

Tags

multi-agent-systems

complex-tasks

Annotators

fxp007

URL

llmwatch.com/p/ai-agents-of-the-week-papers-you-cbd
Apr 2026
openai.com openai.com

Introducing workspace agents in ChatGPT

1
1. fxp007 23 Apr 2026
  
  in Public
  
  Workspace agents can gather context from the right systems, follow team processes, ask for approval when needed, and keep work moving across tools.
  
  许多人可能认为 AI 工具难以理解和执行复杂的团队流程，但作者强调 workspace agents 能够理解和执行这些流程，挑战了 AI 在复杂任务中的能力限制。
  
  counterintuitive complex-tasks ai-process-automation
Visit annotations in context

Tags

ai-process-automation

complex-tasks

counterintuitive

Annotators

fxp007

URL

openai.com/index/introducing-workspace-agents-in-chatgpt/
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/04/21/1135654/agent-orchestration-ai-artificial-intelligence/

1
1. fxp007 23 Apr 2026
  
  in Public
  
  But the real power of agents comes when they can work as a team. Instead of lone-wolf bots carrying out single tasks, such as using a browser to make a restaurant reservation or sending you a summary of your inbox, new tools can yoke together multiple agents, give each of them a different job, and orchestrate their behaviors so that they all pull together to complete more complex tasks than an individual agent could do by itself.
  
  主流观点可能认为人工智能代理将独立完成工作，但作者指出，它们的真正力量在于团队合作，通过协同工作完成比单个代理更复杂的任务。
  
  counterintuitive team-ai-agents complex-tasks
Visit annotations in context

Tags

team-ai-agents

counterintuitive

complex-tasks

Annotators

fxp007

URL

technologyreview.com/2026/04/21/1135654/agent-orchestration-ai-artificial-intelligence/
firethering.com firethering.com

https://firethering.com/minimax-m2-7-agentic-model/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  It maintains 97% skill compliance across 40 complex skills on MM Claw, each skill exceeding 2,000 tokens.
  
  97%的技能合规率是一个非常高的指标，特别是在处理超过2000个token的复杂技能时。这表明M2.7不仅能够理解复杂指令，还能在长时间任务中保持一致性和可靠性。对于需要构建复杂代理工作流的开发者来说，这一数据点特别有价值，因为它意味着模型可以可靠地执行多步骤、高复杂度的任务。
  
  skill-compliance complex-tasks reliability
Visit annotations in context

Tags

skill-compliance

reliability

complex-tasks

Annotators

fxp007

URL

firethering.com/minimax-m2-7-agentic-model/
www.understandingai.org www.understandingai.org

Why it's getting harder to measure AI performance - Understanding AI

1
1. fxp007 09 Apr 2026
  
  in Public
  
  we may see a growing divergence between the capabilities we can measure and the capabilities we actually care about.
  
  「可测量的能力」与「真正关心的能力」之间的分歧正在扩大——这是整篇文章最深刻的洞见。所有当前 benchmark 都偏向「干净、自包含、可自动评分」的任务，而真实工作是「混乱、跨系统、需人类判断」的。随着 AI 向长任务延伸，这个测量-现实之间的鸿沟不会缩小，只会加速扩大。这意味着未来关于「AI 能否替代某类工作」的争论，将越来越难以用数据解决——因为数据本身无法捕捉真实工作的本质。
  
  measurement-reality-gap benchmark-limitation complex-tasks key-insight
Visit annotations in context

Tags

measurement-reality-gap

key-insight

benchmark-limitation

complex-tasks

Annotators

fxp007

URL

understandingai.org/p/why-its-getting-harder-to-measure
Aug 2020
covid-19.iza.org covid-19.iza.org

COVID-19 and the Labor Market

1
1. Grace1999 04 Aug 2020
  
  in BehSci
  
  Grözinger. N., Irlenbusch. B., Laske. K., Schröder. M., (2020). Innovation and Communication Media in Virtual Teams – An Experimental Study. Institute of Labor Economics. Retrieved from: https://covid-19.iza.org/publications/innovation-and-communication-media-in-virtual-teams-an-experimental-study/
  
  is:article lang:en COVID-19 Communication Complex problem solving Creativity Innovation Laboratory experiment Real-effort Communication media Collobarative tasks Behavior Science
Visit annotations in context

Tags

Real-effort

COVID-19

is:article

Communication

Innovation

Laboratory experiment

Collobarative tasks

Communication media

Behavior Science

lang:en

Creativity

Complex problem solving

Annotators

Grace1999

URL

covid-19.iza.org/publications/dp13218/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL