Hypothesis

A small but directionally consistent improvement on strict instruction following. Loose evaluation is flat. Both models already follow the high-level instructions — the strict-mode gap comes down to 4.6 occasionally mishandling exact formatting where 4.7 doesn't.

这一发现揭示了AI模型能力提升的一个微妙现象：微小但精确的改进可能比重大但模糊的改进更有价值。Claude 4.7只在严格指令遵循上有微小提升，但这种提升针对的是实际开发中常见的精确格式化问题，这挑战了人们对'重大突破'的执念，强调了'精准解决特定问题'的价值。

precision-vs-breadth ai-capability

Tags

Annotators

URL