9 Matching Annotations
  1. Last 7 days
    1. On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve.

      13%的性能提升在AI领域是显著的飞跃,特别是解决了前代模型完全无法处理的任务,这表明AI能力的非线性发展可能已经到来,而非简单的线性进步。

    1. Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.

      这是一个惊人的发现,表明AI已经能够完成通常需要人类工程师数周时间才能完成的复杂编程任务。这不仅挑战了我们对AI当前能力的认知,也暗示了软件工程领域可能即将发生重大变革。这种级别的自主编程能力远超当前主流AI编程助手的表现。

    1. GLM-5V-Turbo 拿了 94.8 分,Claude Opus 4.6 是 77.3。差距不小。

      令人惊讶的是,在将UI设计稿还原成代码的测试中,GLM-5V-Turbo的得分(94.8)显著领先于Claude Opus 4.6(77.3),这表明它在视觉编码领域有着惊人的优势,几乎领先了17个百分点,这种差距在AI模型比较中是非常罕见的。

    1. Coding is the dominant use case for AI by nearly an order of magnitude. It's abundantly clear in the [reported explosive growth] of companies like Cursor, as well as the [hyper growth] of tools like Claude Code and Codex.

      令人惊讶的是:编程已成为AI在企业中最主要的应用场景,其规模远超其他用例近一个数量级。工程师使用AI工具可以将生产力提高10-20倍,这一惊人的效率提升解释了为什么企业愿意如此迅速地采用AI编程工具,也颠覆了人们对软件开发工作流程的传统认知。

  2. Jan 2026
    1. blogger Fabrizio Ferri Benedetti on their 4 modes of using AI in technical writing. - watercooler conversations, to get code explained - text suggestions while writing/coding (esp for repeating patterns in your work - providing context / constraints / intent to generate first drafts, restructure content, or boilerplate commentary etc. - a robotic assembly line, to do checks, tests and rewrites. MCP/skills involved.

      Not either/or but switching between modes

    1. I love the asynchronous coding agent category. They’re a great answer to the security challenges of running arbitrary code execution on a personal laptop and it’s really fun being able to fire off multiple tasks at once—often from my phone—and get decent results a few minutes later.

      async coding agents: prompt and forget

  3. Apr 2015
  4. May 2014