5 Matching Annotations
  1. Last 7 days
    1. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts

      Apertus is trained on over 1800 languages (!?) with 40% non English content, meaning many of them can only have had 1/100 or 1/1000 of a procent (1/10k, 1/100k) 60/1799 is 0,033%

    1. The model is named Apertus – Latin for “open” – highlighting its distinctive feature: the entire development process, including its architecture, model weights, and training data and recipes, is openly accessible and fully documented.

      Apertus committed to openness wrt all its aspects. Is it in the overview yet?