Reasoning models are generally assumed to be better at coding tasks, and they do score higher on Pass@1.
大多数人认为推理模型在编码任务上表现更好,但作者发现推理模型在最小化编辑任务上往往比非推理模型更倾向于过度编辑。
Reasoning models are generally assumed to be better at coding tasks, and they do score higher on Pass@1.
大多数人认为推理模型在编码任务上表现更好,但作者发现推理模型在最小化编辑任务上往往比非推理模型更倾向于过度编辑。
While the output passes the tests and is functionally correct, the diff is enormous, and none of those additions were asked for or even necessary.
大多数人认为代码修改应该越多越好,但作者认为过度修改(over-editing)虽然功能正确,但引入了不必要的改动。