Hypothesis

Almost every error is a copy error. The model has 100% accuracy on positions that actually change so it learned SUBLEQ perfectly but it just occasionally dropped a value when routing ~30 unchanged mem cells through attention.

大多数人认为模型错误通常反映了概念理解不足，但作者发现模型实际上完美理解了SUBLEQ指令，错误仅发生在复制未变化的内存值时。这挑战了我们对模型错误分析的理解，表明某些'错误'可能不是概念性而是机械性的。

non-consensus model-errors

Tags

Annotators

URL