As part of this investigation, we ran more ablations (removing lines from the system prompt to understand the impact of each line) using a broader set of evaluations. One of these evaluations showed a 3% drop for both Opus 4.6 and 4.7.
大多数人认为微小的系统提示变更只会带来微不足道的影响,但作者展示了一个看似微不足道的提示变更(限制字数)却导致了3%的性能下降。这挑战了'小变更小影响'的直觉认知,揭示了AI系统中微小变化可能带来的非线性影响。