3 Matching Annotations
  1. Last 7 days
    1. Gemma4-31B worked in an iterative-correction loop (with a long-term memory bank) for 2 hours to solve a problem that baseline GPT-5.4-Pro couldn't

      令人惊讶的是,较小的Gemma4-31B模型通过迭代修正循环和长期记忆库工作了2小时,解决了GPT-5.4-Pro无法解决的问题。这表明模型架构创新和推理能力可能比单纯的规模扩展更重要,为AI发展提供了新的方向。

    1. We replace persistent autograd graphs with stateless layer templates, binding weights dynamically as they stream in, eliminating persistent graph metadata while providing flexibility in scheduling.

      令人惊讶的是:研究团队摒弃了传统的持久化自动微分图,采用无状态层模板和动态权重绑定的创新方法,这不仅消除了图元数据开销,还提供了调度灵活性。这种架构层面的创新可能是实现单GPU训练百亿参数模型的关键突破。

  2. Apr 2026
    1. our DFC is architecturally designed with three distinct sections: A shared dictionary, A "French-only" section, An "English-only" section

      Dedicated Feature Crosscoder(DFC)的三段式架构设计是这项研究的核心技术突破:通过分别建立「共享词典」和两个「专属词典」,强制让模型差异特征有独立的表示空间,而非被混入共享特征中。令人惊讶的是,如此影响深远的安全工具,其设计思路竟然与字典编纂学高度同构。