Hypothesis

13 Matching Annotations

Apr 2026
www.claudecodecamp.com www.claudecodecamp.com

https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you

1
1. fxp007 24 Apr 2026
  
  in Public
  
  A small but directionally consistent improvement on strict instruction following. Loose evaluation is flat. Both models already follow the high-level instructions — the strict-mode gap comes down to 4.6 occasionally mishandling exact formatting where 4.7 doesn't.
  
  这一发现揭示了AI模型能力提升的一个微妙现象：微小但精确的改进可能比重大但模糊的改进更有价值。Claude 4.7只在严格指令遵循上有微小提升，但这种提升针对的是实际开发中常见的精确格式化问题，这挑战了人们对'重大突破'的执念，强调了'精准解决特定问题'的价值。
  
  precision-vs-breadth ai-capability
Visit annotations in context

Tags

precision-vs-breadth

ai-capability

Annotators

fxp007

URL

claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you
github.com github.com

https://github.com/fxp/aegis-core

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Tracks the evolution of LLM security capabilities across benchmarks (CyberGym, Cybench, etc.), calculates capability doubling times, detects emergence patterns, and monitors cost-efficiency trends.
  
  这个功能模块代表了AI安全研究的前沿方向，不仅关注当前能力，还追踪能力演化和效率变化。计算'能力倍增时间'特别值得关注，这可能揭示AI安全能力发展的加速趋势，对预测未来安全挑战具有重要意义。
  
  benchmarking capability-tracking ai-evolution
Visit annotations in context

Tags

capability-tracking

benchmarking

ai-evolution

Annotators

fxp007

URL

github.com/fxp/aegis-core
www.xiaohu.ai www.xiaohu.ai

https://www.xiaohu.ai/c/xiaohu-ai/wan2-7-video

2
1. fxp007 17 Apr 2026
  
  in Public
  
  从视频生成器升级为导演工具套件
  
  这一表述隐含着一个重要假设：AI已经具备了理解并执行复杂创作流程的能力。作者假设AI工具已经超越了简单的内容生成，能够理解导演工作的完整流程和决策逻辑，这是一个相当大胆的技术能力假设。
  
  ai-assumptions technical-capability
2. fxp007 17 Apr 2026
  
  in Public
  
  从视频生成器升级为导演工具套件
  
  这一表述揭示了一个令人惊讶的事实：AI工具正在从'执行单一任务'向'理解复杂创作流程'转变。这表明AI不再仅仅是内容生成工具，而是开始具备对整个创作过程的系统理解，这是AI创作能力进化的一个重要里程碑。
  
  ai-capability creative-tools
Visit annotations in context

Tags

technical-capability

creative-tools

ai-capability

ai-assumptions

Annotators

fxp007

URL

xiaohu.ai/c/xiaohu-ai/wan2-7-video
x.com x.com

https://x.com/TheRundownAI/status/2043707723192176907

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Andon Labs started by giving an AI control of a vending machine at Anthropic's office.
  
  这个开篇揭示了AI能力发展的渐进式路径，从简单控制到复杂决策的惊人速度。一个AI从管理自动售货机开始，短短时间内就发展到能自主经营实体企业，展示了AI能力指数级增长的潜力。
  
  ai-capability incremental-progress
Visit annotations in context

Tags

incremental-progress

ai-capability

Annotators

fxp007

URL

x.com/TheRundownAI/status/2043707723192176907
aisle.com aisle.com

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens.
  
  这是一个令人惊讶的发现，表明即使是小型、廉价的模型也能实现与昂贵的专有模型相当的安全漏洞检测能力。这挑战了AI安全领域需要最前沿模型的假设，暗示了经济高效的AI安全解决方案的可能性。
  
  ai-security model-capability cost-efficiency
Visit annotations in context

Tags

model-capability

cost-efficiency

ai-security

Annotators

fxp007

URL

aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
www.minimax.io www.minimax.io

https://www.minimax.io/models/text/m27

1
1. fxp007 17 Apr 2026
  
  in Public
  
  M2.7 demonstrates excellent performance in real-world software engineering, including end-to-end project delivery, log analysis for bug hunting, code security, and machine learning tasks.
  
  这一声明暗示AI模型已经超越了简单的代码生成，能够完成完整的软件开发生命周期，这代表了AI在工程领域应用的重大突破，可能重新定义软件开发的未来模式。
  
  ai-capability software-engineering
Visit annotations in context

Tags

ai-capability

software-engineering

Annotators

fxp007

URL

minimax.io/models/text/m27
hai.stanford.edu hai.stanford.edu

https://hai.stanford.edu/ai-index/2026-ai-index-report/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  AI capability is not plateauing. It is accelerating and reaching more people than ever.
  
  这一声明挑战了AI发展可能趋于平缓的普遍预期，表明技术进步实际上正在加速。这种加速不仅体现在性能指标上，还体现在采用率的惊人增长上，暗示AI正处于指数级增长阶段，可能带来前所未有的社会变革。
  
  ai-capability acceleration paradigm-shift
Visit annotations in context

Tags

paradigm-shift

acceleration

ai-capability

Annotators

fxp007

URL

hai.stanford.edu/ai-index/2026-ai-index-report/
epoch.ai epoch.ai

Keeping up with the GPTs | Epoch AI

1
1. fxp007 09 Apr 2026
  
  in Public
  
  Tang Jie (CEO of Zhipu AI) even recently said: "The truth may be that the gap [between US and Chinese AI] is actually widening."
  
  智谱 CEO 唐杰亲口承认差距可能正在扩大——这句话的分量极重。在中国 AI 公司普遍对外宣称「与美国差距不大」的舆论环境下，一位领军者公开说出这句话，是罕见的清醒与坦诚。这与本文的核心论点完全吻合：算力差距在出口管制和国内芯片滞后的双重压力下，短期内很难缩小。对智谱内部的战略制定而言，这句话的代价和勇气都值得深思。
  
  Zhipu-AI Tang-Jie capability-gap candid-admission China-AI
Visit annotations in context

Tags

capability-gap

candid-admission

China-AI

Zhipu-AI

Tang-Jie

Annotators

fxp007

URL

epoch.ai/gradient-updates/keeping-up-with-the-gpts/
Dec 2022
www.jasonwei.net www.jasonwei.net

137 emergent abilities of large language models — Jason Wei

1
1. wiobyrne 10 Dec 2022
  
  in Public
  
  Emergent abilities are not present in small models but can be observed in large models.
  
  Here’s a lovely blog by Jason Wei that pulls together 137 examples of ’emergent abilities of large language models’. Emergence is a phenomenon seen in contemporary AI research, where a model will be really bad at a task at smaller scales, then go through some discontinuous change which leads to significantly improved performance.
  
  gpt-3 machine learning ai emergent abilities capability overhang
Visit annotations in context

Tags

emergent abilities

ai

machine learning

capability overhang

gpt-3

Annotators

wiobyrne

URL

jasonwei.net/blog/emergence
jack-clark.net jack-clark.net

Import AI 310: AlphaZero learned Chess like humans learn Chess; capability emergence in language models; demoscene AI.

1
1. wiobyrne 10 Dec 2022
  
  in Public
  
  Houston, we have a Capability Overhang problem: Because language models have a large capability surface, these cases of emergent capabilities are an indicator that we have a ‘capabilities overhang’ – today’s models are far more capable than we think, and our techniques available for exploring the models are very juvenile. We only know about these cases of emergence because people built benchmark datasets and tested models on them. What about all the capabilities we don’t know about because we haven’t thought to test for them? There are rich questions here about the science of evaluating the capabilities (and safety issues) of contemporary models.
  
  capability overhang ai language models gpt-3
Visit annotations in context

Tags

ai

capability overhang

language models

gpt-3

Annotators

wiobyrne

URL

jack-clark.net/2022/11/28/import-ai-310-alphazero-learned-chess-like-humans-learn-chess-capability-emergence-in-language-models-demoscene-ai/
www.theverge.com www.theverge.com

ChatGPT proves AI is finally mainstream — and things are only going to get weirder

2
1. wiobyrne 10 Dec 2022
  
  in Public
  
  As the metaphor suggests, though, the prospect of a capability overhang isn’t necessarily good news. As well as hidden and emerging capabilities, there are hidden and emerging threats. And these dangers, like our new skills, are almost too numerous to name.
  
  gpt-3 ai capability overhang
2. wiobyrne 10 Dec 2022
  
  in Public
  
  There’s a concept in AI that I’m particularly fond of that I think helps explain what’s happening. It’s called “capability overhang” and refers to the hidden capacities of AI: skills and aptitudes latent within systems that researchers haven’t even begun to investigate yet. You might have heard before that AI models are “black boxes” — that they’re so huge and complex that we don’t fully understand how they operate or come to specific conclusions. This is broadly true and is what creates this overhang.
  
  gpt-3 ai capability overhang
Visit annotations in context

Tags

ai

capability overhang

gpt-3

Annotators

wiobyrne

URL

theverge.com/2022/12/8/23499728/ai-capability-accessibility-chatgpt-stable-diffusion-commercialization

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL