2 Matching Annotations
  1. Last 7 days
    1. Training looped models is notoriously unstable. Two failure modes dominate: Residual explosion — the hidden state h_t grows unboundedly across loops; Loss spikes — training diverges suddenly due to large spectral norms in injection parameters.

      循环模型的训练稳定性问题是一个常被忽视的挑战。这一发现揭示了循环架构在实现时面临的关键技术难题,解释了为什么尽管理论上优越,但循环模型在实际应用中相对罕见。这种不稳定性可能是许多研究者放弃循环架构的重要原因。

  2. Apr 2020