23 Matching Annotations
  1. May 2026
  2. Apr 2026
    1. this means that existing estimates overstate the returns to software R&D, and makes the software intelligence explosion seem much less likely.

      R&D Returns Overstated

      Accounting for compute bottlenecks suggests that returns to software R&D may be lower than previously estimated, reducing explosion likelihood.

    2. But I think we have enough evidence to think that software progress might really be several times a year, and to make a best guess contextualized with a lot of uncertainty.

      Progress Estimation

      Despite uncertainties, evidence suggests software progresses at several times per year, with estimates ranging from 2-50x annually.

    3. gpt-oss-20b does substantially better than GPT-3 on MMLU, despite using the same amount of training compute.

      Real-World Progress Example

      Comparing models with same compute but different performance (like GPT-3 vs gpt-oss-20b) provides concrete evidence of software progress.

    4. This means that almost all existing estimates of software progress were misleading.

      Measurement Problems

      Existing software progress estimates are misleading due to data quality improvements and scale-dependence factors not properly accounted for.

    5. these estimates rely on an overly conservative estimate of software progress of 3× per year

      Progress Underestimation

      Existing software intelligence explosion models may use conservative progress estimates, potentially underestimating explosion likelihood.

    6. Synthetic data can help push beyond this — a good example that Millidge raises is the Phi series of models.

      Synthetic Data Impact

      Synthetic data generation techniques like Phi models can dramatically improve efficiency beyond traditional distillation methods.

    7. If doubling cumulative research effort also doubles compute efficiency, then the returns to R&D are 1. If it quadruples, then the returns are 2.

      R&D Returns Measurement

      Returns to AI software R&D measure how research effort translates to compute efficiency gains, with >1 threshold for potential explosion.

    8. Almost all the evidence points to very fast software progress: each year, the training compute needed to get to the same capability declines several times — possibly even ten times or more.

      Rapid Efficiency Gains

      Software progress enables 2-10x annual compute efficiency gains, though estimates have wide confidence intervals due to data limitations.

    9. AI software progress is about reducing the training compute you need to get to the same level of capability, through better algorithms or data.

      Software Progress Definition

      Software progress enables achieving same AI capabilities with less compute through algorithmic or data improvements, a key efficiency driver.

    1. context management plus engineering improvements may well push the task horizon to weeks or even months.

      Action建议:将上下文管理与工程改进结合,以延长任务处理时间边界。这种方法可显著提升模型处理长期任务的能力。

    2. if a model cannot learn new things while performing a task, it will struggle when the task horizon grows very long.

      Action建议:评估持续学习技术时,关注模型在长任务序列中学习新事物的能力。这种评估标准更接近实际应用需求。

    3. new techniques may initially underperform existing ones but eventually surpass them — a pattern we've seen repeatedly, most recently in the wave of agentic coding progress

      Action建议:接受新技术初期表现不佳但最终超越的规律。这种预期管理有助于持续学习技术的研发决策和资源分配。

    4. We can treat the task horizon that an LLM can reliably handle as a north-star metric for model progress, analogous to transistor density in Moore's Law

      Action建议:采用任务完成边界作为衡量模型进步的北极星指标。这种量化方法有助于评估持续学习技术的实际效果和进展。

    5. The key reason for the confusion is that people think in terms of methods that each contribute a discrete piece to the system — pretraining, SFT, RL.

      Action建议:避免将持续学习视为独立方法的集合,而应关注其统一目标。这种方法论转变能减少概念混淆,提高研究效率。

    6. I'd view continual learning more as an "arrow" than a "line" — it's the collective effort to push the task horizon that an LLM can reliably handle.

      Arrow vs Line Perspective

      Action建议:将持续学习视为推动任务边界的集体努力,而非离散方法集合。这种视角帮助理解其方向性和系统性本质。

  3. Oct 2025
    1. I've been thinking about this stuff for decades, and I had not broached the topic of platonic patterns until until this year. And that's because I think it is now actionable.

      for - quote - platonic patterns are now actionable - Michael Levin - I've been thinking about this stuff for decades, and I had not broached the topic of platonic patterns until this year. - And that's because I think it is now actionable. - question - progress trap - moral questions and alarm bells? playing God? - Michael Levin

  4. Jan 2021
    1. When there are imperfections, we rely on users and our active community to tell us how the software is not working correctly, so we can fix it. The way we do that, and have done for 15 years now, is via bug reports. Discussion is great, but detailed bug reports are better for letting developers know what’s wrong.
  5. Apr 2020
  6. Feb 2020