12 Matching Annotations
  1. May 2026
    1. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work.

      大多数人认为人类可读的论文同样适合AI理解,但作者认为传统论文对人类读者是可容忍的,但对AI理解研究过程却造成了'工程税',这反映了当前学术出版系统在AI时代的不适应性。

    1. Our early versions of agentic work was only asking Codex to implement the task. That approach proved too limiting. Codex is perfectly capable of creating multiple PRs as well as reading review feedback and addressing it.

      大多数人认为AI只能执行简单的、单一的任务,但作者认为AI已经能够处理复杂的、多步骤的工作流程,包括创建多个PR和回应代码审查。这个观点挑战了人们对AI能力的传统认知,表明AI已经进化到能够理解并执行复杂的软件工程任务。

  2. Apr 2026
    1. Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source financial matching engine. Over a 13-hour execution, the model iterated through 12 optimization strategies, initiating over 1,000 tool calls to precisely modify more than 4,000 lines of code.

      大多数人认为AI在复杂工程任务中仍需要人类专家的指导和监督,难以独立完成大规模系统重构。但作者展示了AI能够自主分析、优化并重构一个运行8年的金融系统,这挑战了人们对AI工程能力的传统认知,暗示AI可能已经具备系统级架构设计和优化的能力。

    1. NEC aims to build one of Japan's largest AI-native engineering teams, who will use Claude Code in their work.

      大多数人认为AI会取代大量工程师职位,但作者认为AI实际上是在创造新的工程角色和技能需求,因为NEC正在积极建立一支大规模的AI原生工程团队,这表明AI工具正在增强而非替代工程能力,创造新的就业机会。

    1. AI writes the code. Tests verify correctness. More code enables more features.

      这个简洁描述揭示了AI在软件开发中的完整闭环:AI生成代码,测试验证正确性,更多代码创造更多功能。这种自增强循环可能使软件开发成为AI最具颠覆性的应用领域。

    1. M2.7 demonstrates excellent performance in real-world software engineering, including end-to-end project delivery, log analysis for bug hunting, code security, and machine learning tasks.

      这一声明暗示AI模型已经超越了简单的代码生成,能够完成完整的软件开发生命周期,这代表了AI在工程领域应用的重大突破,可能重新定义软件开发的未来模式。

    1. Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.

      这是一个惊人的发现,表明AI已经能够完成通常需要人类工程师数周时间才能完成的复杂编程任务。这不仅挑战了我们对AI当前能力的认知,也暗示了软件工程领域可能即将发生重大变革。这种级别的自主编程能力远超当前主流AI编程助手的表现。

    1. Same clinical question, two framings. One as a patient, one as a doctor.

      令人惊讶的是:完全相同的医疗问题,仅因提问者身份从"患者"变为"医生",AI就会给出截然不同的回答。这种简单的措辞变化就能触发或绕过安全限制,表明AI的安全机制极其脆弱且容易被规避。

    1. GLM-5.1 achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

      令人惊讶的是:GLM-5.1在软件工程代理任务上取得了最先进的性能,特别是在代码仓库生成和真实终端任务方面大幅领先其前代模型。这表明AI在理解和执行复杂软件工程任务方面取得了质的飞跃。

    1. There's an old saying that content is king. With agents, context is.

      在 LLM 时代,这是对“上下文窗口”重要性最精辟的注解。Agent 不具备人类的隐性知识和环境感知能力,因此显式的上下文(如 context.json)成为了其行动的基石。这提醒我们,在设计 AI 辅助系统时,构建高质量的上下文生成机制往往比优化模型本身更为关键。

  3. Feb 2026
    1. Owning a $5M data center
      • comma.ai operates its own $5M data center in-office to handle model training, metrics, and data storage, avoiding the "cloud tax."
      • The facility consumes approximately 450kW at peak; power costs in San Diego (over 40c/kWh) totaled over $540,000 in 2025.
      • Cooling is achieved using pure outside air with dual 48” intake and exhaust fans, utilizing a PID loop to manage temperature and humidity.
      • The compute cluster consists primarily of 600 GPUs across 75 "TinyBox Pro" machines built in-house for cost efficiency and easier repairability.
      • Storage is handled by several racks of Dell R630/R730 servers with ~4PB of total SSD storage, favoring speed and random access over redundancy.
      • The software stack is kept simple to ensure 99% uptime, utilizing Ubuntu (pxeboot), Salt for management, and "minikeyvalue" for distributed storage.
      • By owning their hardware, comma.ai estimates they saved $20M+ compared to equivalent compute costs in a public cloud environment.

      Hacker News Discussion

      • Users discussed the spectrum of infrastructure, ranging from pure Cloud (low cap-ex, high op-ex) to colocation and on-prem (high cap-ex, high skill requirement).
      • A primary concern raised was "brain drain"—on-prem setups can become "legacy debt" if the senior engineers who built the custom systems leave without documenting unwritten knowledge.
      • Commenters noted that AWS and other cloud providers are incentivized to keep architectures complex (microservices, serverless) to increase billing, whereas on-prem encourages efficiency.
      • There was a debate regarding "software freedom" and the "WhatsApp effect," where small, highly motivated teams can outperform massive corporations by using lean, self-hosted stacks.
      • Some users highlighted that while AWS pricing is expected to rise due to hardware costs, the "Quality of Life" and managed services still justify the cost for many startups without comma's scale.

      comma-ai #self-hosting #datacenter #hardware-engineering

  4. Nov 2021