A hearing is scheduled for May 19
可执行行动:定于 5 月 19 日举行听证会,这为关注该案件进展的各方提供了一个具体的行动点。
A hearing is scheduled for May 19
可执行行动:定于 5 月 19 日举行听证会,这为关注该案件进展的各方提供了一个具体的行动点。
Claude skews high-income; Meta AI skews low-income
这一标题揭示了文章的核心观点,即不同的AI模型在收入分布上存在显著差异,这一发现可能对AI服务的公平性和可及性产生重要影响。
this means that existing estimates overstate the returns to software R&D, and makes the software intelligence explosion seem much less likely.
R&D Returns Overstated
Accounting for compute bottlenecks suggests that returns to software R&D may be lower than previously estimated, reducing explosion likelihood.
But I think we have enough evidence to think that software progress might really be several times a year, and to make a best guess contextualized with a lot of uncertainty.
Progress Estimation
Despite uncertainties, evidence suggests software progresses at several times per year, with estimates ranging from 2-50x annually.
gpt-oss-20b does substantially better than GPT-3 on MMLU, despite using the same amount of training compute.
Real-World Progress Example
Comparing models with same compute but different performance (like GPT-3 vs gpt-oss-20b) provides concrete evidence of software progress.
This means that almost all existing estimates of software progress were misleading.
Measurement Problems
Existing software progress estimates are misleading due to data quality improvements and scale-dependence factors not properly accounted for.
these estimates rely on an overly conservative estimate of software progress of 3× per year
Progress Underestimation
Existing software intelligence explosion models may use conservative progress estimates, potentially underestimating explosion likelihood.
Synthetic data can help push beyond this — a good example that Millidge raises is the Phi series of models.
Synthetic Data Impact
Synthetic data generation techniques like Phi models can dramatically improve efficiency beyond traditional distillation methods.
If doubling cumulative research effort also doubles compute efficiency, then the returns to R&D are 1. If it quadruples, then the returns are 2.
R&D Returns Measurement
Returns to AI software R&D measure how research effort translates to compute efficiency gains, with >1 threshold for potential explosion.
Almost all the evidence points to very fast software progress: each year, the training compute needed to get to the same capability declines several times — possibly even ten times or more.
Rapid Efficiency Gains
Software progress enables 2-10x annual compute efficiency gains, though estimates have wide confidence intervals due to data limitations.
AI software progress is about reducing the training compute you need to get to the same level of capability, through better algorithms or data.
Software Progress Definition
Software progress enables achieving same AI capabilities with less compute through algorithmic or data improvements, a key efficiency driver.
context management plus engineering improvements may well push the task horizon to weeks or even months.
Action建议:将上下文管理与工程改进结合,以延长任务处理时间边界。这种方法可显著提升模型处理长期任务的能力。
if a model cannot learn new things while performing a task, it will struggle when the task horizon grows very long.
Action建议:评估持续学习技术时,关注模型在长任务序列中学习新事物的能力。这种评估标准更接近实际应用需求。
"The set of efforts aimed at breaking past the feasible horizon of current techniques."
Action建议:明确定义持续学习为突破当前技术可行边界的努力集合。这种定义有助于确定研究方向和评估进展。
new techniques may initially underperform existing ones but eventually surpass them — a pattern we've seen repeatedly, most recently in the wave of agentic coding progress
Action建议:接受新技术初期表现不佳但最终超越的规律。这种预期管理有助于持续学习技术的研发决策和资源分配。
We can treat the task horizon that an LLM can reliably handle as a north-star metric for model progress, analogous to transistor density in Moore's Law
Action建议:采用任务完成边界作为衡量模型进步的北极星指标。这种量化方法有助于评估持续学习技术的实际效果和进展。
The key reason for the confusion is that people think in terms of methods that each contribute a discrete piece to the system — pretraining, SFT, RL.
Action建议:避免将持续学习视为独立方法的集合,而应关注其统一目标。这种方法论转变能减少概念混淆,提高研究效率。
I'd view continual learning more as an "arrow" than a "line" — it's the collective effort to push the task horizon that an LLM can reliably handle.
Arrow vs Line Perspective
Action建议:将持续学习视为推动任务边界的集体努力,而非离散方法集合。这种视角帮助理解其方向性和系统性本质。
I've been thinking about this stuff for decades, and I had not broached the topic of platonic patterns until until this year. And that's because I think it is now actionable.
for - quote - platonic patterns are now actionable - Michael Levin - I've been thinking about this stuff for decades, and I had not broached the topic of platonic patterns until this year. - And that's because I think it is now actionable. - question - progress trap - moral questions and alarm bells? playing God? - Michael Levin
When there are imperfections, we rely on users and our active community to tell us how the software is not working correctly, so we can fix it. The way we do that, and have done for 15 years now, is via bug reports. Discussion is great, but detailed bug reports are better for letting developers know what’s wrong.
Alerts are actionable, not informational: We believe that an alert should provide concise and accurate security advice. For an unsafe account, that means resetting your password.
it is worth opening a merge request with the minimal viable change instead of opening an issue encouraging open feedback on the problem without proposing any specific change directly.
The nature of MRs facilitate discussions around a proposed solution to a problem that is actionable. An MR is actionable, while an issue will take longer to take action on.