7 Matching Annotations
  1. Last 7 days
    1. On a single H200 GPU with 1.5TB host memory, MegaTrain reliably trains models up to 120B parameters.

      令人惊讶的是:仅使用一块配备1.5TB主机内存的H200 GPU就能训练1200亿参数的模型,这打破了人们对大规模模型必须依赖多GPU集群的固有印象。这一技术突破可能使超大规模模型训练变得更加普及和经济。

  2. Apr 2026
    1. SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself.

      大多数人认为不同架构的模型会有不同的失败模式和弱点,但作者发现无论架构和参数规模如何,SOTA模型在相同困难样本上表现出高度一致的失败模式,这表明性能瓶颈源于训练数据的共同缺陷,而非架构差异,这一发现挑战了模型多样化的传统观点。

  3. Oct 2023
    1. this other sort of development also happened in the last couple years just clip models um and this enables us to do predictive 00:09:47 modeling across domains um what do I mean by that it means that you can understand and provide the model information in one modality and it can essentially translate it into another
      • for: definition, definition - CLIP models

      • definition: CLIP model

        • contrastive language-image pre-training (CLIP) model allows model information in one modality - predictive modeling in one domain to be translated to another domain
  4. Mar 2021
  5. Oct 2020
  6. May 2020