12 Matching Annotations
  1. Apr 2026
    1. 🔹 **DeepSeek-V4-Flash:** 284B total / 13B active params. Your fast, efficient, and economical choice.

      DeepSeek-V4-Flash的参数规模明显小于Pro版本:总参数2840亿,活跃参数130亿。参数效率比约为4.6%,略高于Pro版本。这种参数设计使其在保持性能的同时实现更快响应和更低成本,适合需要快速响应的应用场景。

    1. On a 150-class benchmark, the surrogate fully replaces the teacher

      大多数人认为复杂分类任务需要大型模型才能处理,小型代理模型只能处理简单任务。但作者展示了一个150类复杂任务中,小型代理模型完全能够替代教师模型,这挑战了'越大越好'的主流认知,证明了高效路由的潜力。

    1. One estimate puts Llama 3's information compression at just 0.07 bits per token meaning the model has only a hazy recollection of most of what it trained on

      这个惊人的数据点揭示了大型语言模型在信息处理上的低效率,挑战了我们对AI模型'学习'能力的理解。如果模型对其训练内容只有模糊记忆,那么我们是否需要如此庞大的参数规模?这值得深入研究。

    1. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens.

      这是一个令人惊讶的发现,表明即使是小型、廉价的模型也能实现与昂贵的专有模型相当的安全漏洞检测能力。这挑战了AI安全领域需要最前沿模型的假设,暗示了经济高效的AI安全解决方案的可能性。

    1. We present Attention Editing, a practical framework for converting already-trained large language models (LLMs) with new attention architectures without re-pretraining from scratch.

      这是一个令人惊讶的创新点,因为它解决了深度学习领域的一个关键挑战:如何在保持模型性能的同时改变已训练模型的架构。传统方法需要从头开始重新训练,这成本极高且不现实。Attention Editing框架允许在不重新预训练的情况下,将现有的LLMs转换为更高效的注意力架构,这可能会彻底改变模型部署和优化的方式。

    1. MinerU2.5, a 1.2B-parameter document parsing vision-language model, achieves state-of-the-art recognition accuracy with computational efficiency through a coarse-to-fine parsing strategy.

      令人惊讶的是:仅12亿参数的MinerU2.5模型就能通过粗到细的解析策略达到最先进的文档识别精度,同时保持计算效率。这挑战了'越大越好'的模型规模观念,展示了高效架构设计的重要性。

    1. the trained 4B model exceeding GPT-4.1 (49.4 percent) and GPT-4o (42.8 percent) despite being 50 times smaller

      大多数人认为GPT-4级别的性能需要同等规模或更大的模型才能实现,但作者展示了他们的4B模型不仅超过了GPT-4.1和GPT-4o,而且模型规模只有后者的1/50。这一发现挑战了AI领域中对模型规模的依赖,暗示了算法创新可能比单纯扩大模型规模更有效。

    1. In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression.

      大多数人认为AI模型性能提升主要依靠参数数量增加,但作者认为通过算法优化和人才聚集,AI模型可以实现450倍的参数压缩,这挑战了'更大参数等于更好性能'的行业共识。

    1. Gemma 4 outcompetes models 20x its size

      大多数人认为AI模型的性能与参数规模直接相关,更大的模型必然更强大。但作者指出Gemma 4能够超越比它大20倍的模型,这挑战了'越大越好'的主流认知,暗示效率优化可能比纯规模更重要。

  2. Jan 2023
    1. 个人学习可能取决于他人行为的主张突出了将学习环境视为一个涉及多个互动参与者的系统的重要性
  3. Dec 2020
    1. Better contribution workflow: We will be using GitHub’s contribution tools and features, essentially moving MDN from a Wiki model to a pull request (PR) model. This is so much better for contribution, allowing for intelligent linting, mass edits, and inclusion of MDN docs in whatever workflows you want to add it to (you can edit MDN source files directly in your favorite code editor).
  4. Nov 2018
    1. “My feeling at the time was this was a good idea,” Dr. Wachter says. “The trend toward our system being pushed to deliver better, more efficient care was going to be enduring, and the old model of the primary-care doc being your hospital doc … couldn’t possibly achieve the goal of producing the highest value.”

      How can care be made further efficient? E.g., integration, cost-sharing, payment-sharing, parent partners, nurse partners