Hypothesis

The depth of recursion becomes a tunable compute axis at inference time, requiring no retraining. A small model, by reading itself, can iterate toward answers that neither it nor any of its workers could reach in a single pass.

大多数人认为模型性能提升需要更大的参数规模或重新训练，但作者提出了一种反直觉的方法：通过递归调用自身，小模型可以在推理时自我迭代，达到单次推理无法达到的答案质量。这挑战了我们对模型规模与能力关系的传统认知。

counterintuitive model-scaling inference-time

Tags

Annotators

URL