A small but directionally consistent improvement on strict instruction following. Loose evaluation is flat. Both models already follow the high-level instructions — the strict-mode gap comes down to 4.6 occasionally mishandling exact formatting where 4.7 doesn't.
这一发现揭示了AI模型能力提升的一个微妙现象:微小但精确的改进可能比重大但模糊的改进更有价值。Claude 4.7只在严格指令遵循上有微小提升,但这种提升针对的是实际开发中常见的精确格式化问题,这挑战了人们对'重大突破'的执念,强调了'精准解决特定问题'的价值。