22 Matching Annotations
  1. May 2026
    1. If most efficiency improvements came from a small handful of scale-dependent innovations, then existing models of the software intelligence explosion may be flawed.

      Explosion models fundamentally wrong

      Most AI safety models assume continuous innovation, but author shows progress from few scale-dependent innovations breaks these models.

  2. Apr 2026
    1. Three of the four metrics (ECI, log METR 50% time horizon, and a math-focused index we constructed from several math benchmarks) show strong evidence that progress has sped up relative to a global linear trend fit to data from 2023 onward.

      大多数人认为AI能力提升是渐进式的线性发展,但作者通过数据分析发现,在三个关键指标上,AI能力实际上已经加速,这挑战了人们对AI发展速度的普遍认知。这种加速现象发生在2023年之后,与推理模型的发布时间点吻合。

    2. Three of four metrics show strong evidence of acceleration, seemingly driven by reasoning models.

      大多数人认为AI能力提升是渐进式的线性增长,但作者通过数据分析发现,在四个关键能力指标中有三个出现了明显加速,且这种加速似乎与推理模型的出现直接相关。这挑战了人们对AI进步速度的普遍认知。

    3. Three of four metrics show strong evidence of acceleration, seemingly driven by reasoning models.

      大多数人认为AI能力的发展是持续稳定的线性增长,但作者通过数据分析发现,在四个关键指标中有三个显示出明显的加速趋势,这种加速是由推理模型驱动的。这一结论挑战了人们对AI进步速度的常规认知,表明2024年推理模型的引入可能标志着AI能力发展模式的转变。

    1. A core conviction at Sakana AI is that the most capable AI systems will not be monolithic models scaled in isolation, but collections of specialized agents working together.

      大多数人认为更强大的AI系统必然是更大规模、更复杂的单一模型,但作者明确表示最具能力的AI系统将不是孤立扩展的单一模型,而是多个专业化代理的集合。这直接挑战了当前AI领域追求更大单一模型的共识,提出了一个根本不同的研究方向。

    1. a free model that matches GPT-4o and runs entirely on your phone

      这一声明揭示了AI模型小型化和普及化的惊人速度,表明前沿AI技术从云端到移动设备的迁移只需23个月,这种压缩速度远超以往任何技术革命,将彻底改变AI的可用性和普及范围。

    1. Foundation model companies are doing the same. OpenAI launched a dedicated Healthcare & Life Sciences vertical... They're not selling APIs. They're becoming platforms.

      基础模型提供商从API供应商向垂直行业平台转型,揭示了AI价值链的根本重构,底层模型公司正通过垂直整合向上游价值链延伸。

    1. A small model trained on fewer than 2,000 examples from real lawyers, bankers, and consultants recently beat all but the best frontier models on corporate legal work, at a fraction of the price.

      这一发现挑战了'规模和计算能力胜过一切'的AI发展范式。高质量专业化数据训练的小型模型在特定领域表现优于通用大模型,暗示AI发展可能从'越大越好'转向'更专业、更高效'的新阶段。

    1. GLM-5V-Turbo 拿了 94.8 分,Claude Opus 4.6 是 77.3。差距不小。

      令人惊讶的是,在将UI设计稿还原成代码的测试中,GLM-5V-Turbo的得分(94.8)显著领先于Claude Opus 4.6(77.3),这表明它在视觉编码领域有着惊人的优势,几乎领先了17个百分点,这种差距在AI模型比较中是非常罕见的。

    1. Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements.

      令人惊讶的是:大型语言模型的训练目标正在从单纯满足用户偏好转向为公司创造收入,这种根本性的转变意味着AI系统可能不再以用户为中心,而是成为商业利益的工具,这反映了AI技术发展的潜在伦理危机。

  3. Feb 2026
    1. Low-cost Chinese AI models forge ahead, even in the US, raising the risks of a US AI bubble Nvidia’s latest earnings report reassured some. But Chinese AI models are fast gaining a following around the world, underlining concerns over an ‘AI bubble’ centered on high-investment, high-cost US models.
  4. Oct 2025
    1. Introduction: AI is now recently everywhere but we still need humans

  5. May 2025
    1. Anthropic researchers said this was not an isolated incident, and that Claude had a tendency to “bulk-email media and law-enforcement figures to surface evidence of wrongdoing.”

      for - question - progress trap - open source AI models - for blackmail and ransom - Could a bad actor take an open source codebase and twist it to do harm like find out about an rogue AI creator's adversary, enemy or victim and blackmail them? - progress trap - open source AI - criminals - exploit to identify and blackmail victiims

  6. Dec 2024
    1. when you want to use Google, you go into Google search, and you type in English, and it matches the English with the English. What if we could do this in FreeSpeech instead? I have a suspicion that if we did this, we'd find that algorithms like searching, like retrieval, all of these things, are much simpler and also more effective, because they don't process the data structure of speech. Instead they're processing the data structure of thought

      for - indyweb dev - question - alternative to AI Large Language Models? - Is indyweb functionality the same as Freespeech functionality? - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan - data structure of thought - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan

  7. Jan 2024
  8. Sep 2023
    1. in 2018 you know it was around four percent of papers were based on Foundation models in 2020 90 were and 00:27:13 that number has continued to shoot up into 2023 and at the same time in the non-human domain it's essentially been zero and actually it went up in 2022 because we've 00:27:25 published the first one and the goal here is hey if we can make these kinds of large-scale models for the rest of nature then we should expect a kind of broad scale 00:27:38 acceleration
      • for: accelerating foundation models in non-human communication, non-human communication - anthropogenic impacts, species extinction - AI communication tools, conservation - AI communication tools

      • comment

        • imagine the empathy we can realize to help slow down climate change and species extinction by communicating and listening to the feedback from other species about what they think of our species impacts on their world!
  9. Apr 2023
  10. Mar 2023
  11. Dec 2022
    1. Houston, we have a Capability Overhang problem: Because language models have a large capability surface, these cases of emergent capabilities are an indicator that we have a ‘capabilities overhang’ – today’s models are far more capable than we think, and our techniques available for exploring the models are very juvenile. We only know about these cases of emergence because people built benchmark datasets and tested models on them. What about all the capabilities we don’t know about because we haven’t thought to test for them? There are rich questions here about the science of evaluating the capabilities (and safety issues) of contemporary models. 
  12. Jun 2021
  13. Jan 2021
    1. Help is coming in the form of specialized AI processors that can execute computations more efficiently and optimization techniques, such as model compression and cross-compilation, that reduce the number of computations needed. But it’s not clear what the shape of the efficiency curve will look like. In many problem domains, exponentially more processing and data are needed to get incrementally more accuracy. This means – as we’ve noted before – that model complexity is growing at an incredible rate, and it’s unlikely processors will be able to keep up. Moore’s Law is not enough. (For example, the compute resources required to train state-of-the-art AI models has grown over 300,000x since 2012, while the transistor count of NVIDIA GPUs has grown only ~4x!) Distributed computing is a compelling solution to this problem, but it primarily addresses speed – not cost.