12 Matching Annotations
  1. Apr 2026
    1. For higher-interactivity scenarios, execution time for MoE models is bound by expert weight load time. By splitting, or sharding, the experts across multiple GPUs across NVL72 nodes, this bottleneck is reduced, improving end-to-end performance.

      大多数人认为MoE模型的主要瓶颈在于计算能力,但作者指出专家权重加载时间是真正的瓶颈,并提出通过跨GPU分片专家权重来解决问题,这挑战了AI模型优化的传统认知,暗示了I/O可能比计算更重要。

  2. Jan 2024
  3. Oct 2019
  4. Jun 2019
    1. It is backed by a reserve of assets designed to give it intrinsic value;

      Libra is a basket of things which we can question their intrinsic values also. Libra will re-trigger "Intrinsic Value of MoE" discussions.

  5. Jul 2016
    1. Massive Open Online Courses (MOOCs) have been the subject of much hyperbole in the educational/eLearning world for a few years now, under the guise of spreading university-quality education to the masses for free (the hyperbole is dwindling down, but not completely).

      Cue Rolin Moe, who has investigated the MOOC hype so thoroughly. We may still follow a Gartner Cycle (Merton did warn us about self-fulfilling prophesies). But much of those phases have been documented.