13 Matching Annotations
  1. Last 7 days
    1. LLMs accelerate the wrong part

      【洞察】「LLM 加速了错误的部分」——这句话点破了 AI 编程工具的根本问题:它们加速了代码的「生成」(原本不是瓶颈),却无法加速代码的「理解、审查和维护」(真正的瓶颈)。与 a16z 报告的「10-20x 生产力提升」数据对照:生产力的提升是真实的,但被提升的维度是否是最应该被提升的维度,是一个完全不同的问题。

  2. Apr 2026
    1. The real bottleneck in AI right now is not compute but rather data quality

      这一论点颠覆了当前AI行业对计算资源投入的过度关注,提出了一个令人惊讶的视角:我们可能一直在解决错误的问题。如果数据质量是真正瓶颈,那么整个AI研发的重点需要重新评估。

    1. humans became the bottleneck, and how Ryan's team shifted from reviewing code directly to building systems, observability, and context that let agents review, fix, and merge work autonomously

      这一洞察揭示了AI开发中的关键转变:人类不再是代码生产者,而是系统架构师和观察者,这重新定义了软件工程中的价值创造方式。

    1. A 606 MiB model at ~49 tokens/s consumes ~30 GB/s of memory bandwidth, close to the c6i.2xlarge's DRAM limit. No amount of SIMD tricks will help when the CPU is stalled waiting for model weights to arrive from DRAM.

      这一数据揭示了现代CPU推理的关键瓶颈:内存带宽限制。代理最初尝试的SIMD微优化无法突破这一根本限制,这表明理解硬件特性和系统瓶颈对于有效优化至关重要。这一发现挑战了传统上认为计算是主要瓶颈的观念,强调了内存效率在AI推理中的核心地位。

    1. if AI can do only 50 percent of a human's tasks, the importance of the non-automatable tasks likely goes up since they become the bottlenecks, increasing their relative value.

      「部分自动化悖论」:当 AI 完成一半工作时,剩余不可自动化的工作反而变得更重要、更值钱——因为它们成了生产的瓶颈。这意味着 AI 的局部进展可能不会均匀地分配收益,而是集中在那些「恰好不能被自动化」的稀有能力持有者身上。这是一个对「AI 替代论」的精妙反驳,也是理解「AI 时代哪种技能更值钱」的正确框架。

    1. a future project might take ~42 days of wall-clock time, with ~8 hours of agent work (not counting running the evals) and 1000 serial hours of human IC work, evals execution, and review.

      「瓶颈-执行比」超过 100:1——这是这篇文章最令人震惊的数字。一个 42 天的项目中,AI 执行工作仅占 8 小时,其余 1000 小时都是串行的人类瓶颈(审查、实验等待、反馈收集)。这意味着即便拥有无限 AI 执行能力,项目速度的实际瓶颈依然是「人类审批链」——组织架构,而非技术能力,将成为 AI 时代的核心竞争力。

    2. Most people estimated around 3-5x uplift compared to Feb 2026 (i.e. doing 1-2 weeks of work during this 2-day period).

      3-5 倍的组织效率提升——但这来自 17 倍时间地平线的 AI。效率提升与能力提升之间的换算比率约为 TH^0.39,意味着 AI 能力提升的大部分收益被「组织瓶颈」消耗掉了。令人惊讶的是,当执行速度接近无限时,人类组织的协调摩擦、审查流程、实验等待,成为了主要的速度限制因素——而非 AI 本身的能力。

    1. For higher-interactivity scenarios, execution time for MoE models is bound by expert weight load time. By splitting, or sharding, the experts across multiple GPUs across NVL72 nodes, this bottleneck is reduced, improving end-to-end performance.

      大多数人认为MoE模型的主要瓶颈在于计算能力,但作者指出专家权重加载时间是真正的瓶颈,并提出通过跨GPU分片专家权重来解决问题,这挑战了AI模型优化的传统认知,暗示了I/O可能比计算更重要。

  3. May 2024
  4. Jul 2022
    1. I bet with the advent of computers and the digitalizing of reference material there was a spike in the amount of verbatum quotes that are used instead of summarizing the thought into your own words.

      It's a reasonable assumption that with the rise of digital contexts and the ease of cut and paste that people excerpting or quoting material are more likely to excerpt and quote longer passages because it is now easier to do.


      Has anyone done research on showing that this is the case?

  5. Aug 2020
  6. May 2020