AI is no longer limited by ideas. It's limited by money, compute, and who can scale fastest.
大多数人认为AI发展的主要瓶颈是技术创新和算法突破。但作者认为当前AI发展的真正限制因素是资金、计算能力和规模化速度,而非创意或技术本身,这挑战了人们对AI发展驱动力的传统认知。
AI is no longer limited by ideas. It's limited by money, compute, and who can scale fastest.
大多数人认为AI发展的主要瓶颈是技术创新和算法突破。但作者认为当前AI发展的真正限制因素是资金、计算能力和规模化速度,而非创意或技术本身,这挑战了人们对AI发展驱动力的传统认知。
Sam Altman has reportedly told staff that Spud could "really accelerate the economy"
大多数人认为AI是工具,会逐渐改变经济。但作者暗示OpenAI的Spud模型可能具有如此颠覆性的能力,能够实质性地加速整个经济发展,这远超出了大多数人对AI当前能力的认知,暗示AI可能比预期更快地成为经济增长的主要驱动力。
both companies are hinting that these models are a real step forward, not just small upgrades.
大多数人认为AI模型的进步是渐进式的,每次迭代只有小幅提升。但作者认为OpenAI和Anthropic即将发布的模型(Spud和Claude Mythos)代表了真正的突破性进展,而非常规升级,这暗示AI发展可能即将迎来一个加速期。
Gemma points in the opposite direction: smaller models, local compute, more ownership.
大多数人认为AI发展必然走向更大、更集中的模型,但作者认为Google的Gemma 4代表了相反趋势。这挑战了AI发展的主流叙事,暗示未来AI可能分散到个人设备上,减少对大型基础设施的依赖,这与行业共识形成鲜明对比。
A founder in LA reportedly scaled Medvi toward $1.8B in annual sales with basically one full-time employee.
大多数人认为建立十亿美元级别的公司需要庞大的团队和复杂的管理结构,但作者认为AI已使'一人独角兽'成为可能。这挑战了传统创业理念,暗示AI可能彻底改变企业规模与人力需求之间的关系,颠覆我们对商业增长的基本认知。
Employees still own a surprisingly large 19.35%. SoftBank comes in at 11.66%, followed by VC and institutional investors at 7.83%, Amazon at 4.66%, NVIDIA at 3.47%
大多数人认为OpenAI的股权结构相对简单,主要由微软和非营利基金会控制,但作者揭示了员工持股比例高达19.35%,以及多家科技公司都有显著持股,这挑战了人们对OpenAI治理结构的普遍认知。
And once models get good at that, the question stops being whether they can make beautiful images. It becomes whether people still notice when something was never real to begin with.
大多数人关注AI图像模型能创造出多么逼真的内容,但作者提出了一个反直觉的观点:真正的挑战不是创造真实,而是人们能否分辨出什么是真实的,这挑战了人们对AI图像模型进步方向的认知。
Most people talk about OpenAI like it's basically 'owned by Microsoft,' but the actual cap table is much more spread out.
大多数人认为OpenAI主要由微软控制,但作者揭示了其股权结构实际上非常分散,微软仅占26.79%,这挑战了公众对OpenAI所有权结构的普遍认知,解释了为什么公司决策常常显得方向不一致。
The first wave of image models was mostly about making cool-looking images. This next phase is about making ordinary things look real.
大多数人认为AI图像模型的发展重点是创造越来越逼真的幻想艺术或创意内容,但作者认为下一阶段的重点是让普通日常事物看起来真实,这挑战了人们对AI图像发展方向的普遍认知。
We are building a world where machines write the code, machines choose the dependencies, and machines ship the updates. The AI agents are building the software. If we don't secure the supply chain they rely on, the AI agents are cooked.
大多数人认为AI将提高软件开发的效率和安全性,但作者警告说,如果我们不保护AI代理所依赖的供应链,这些代理本身就会成为攻击目标。这挑战了AI发展必然带来安全提升的主流观点,提出了一个反直觉的警告。
Socket, an a16z portfolio company, detected the malicious dependency in the Axios attack within 6 minutes of its publication. That's roughly 63,000 times faster than the industry average.
大多数人认为供应链攻击需要数月甚至数年才能被发现,但作者展示了新型安全工具可以在几分钟内检测到攻击,比行业平均水平快63000倍。这表明安全检测范式正在从基于CVE的静态检查转向基于行为的实时分析。
The autonomous coding agents now entering production can install dependencies, execute builds, and open pull requests without a human ever touching the keyboard. They optimize for 'does this work?' not 'is this safe?'
大多数人认为AI编码助手会提高开发效率和安全性,但作者指出这些自主代理实际上优先考虑功能而非安全性,且操作速度极快,使安全审查窗口压缩至几乎为零。这挑战了AI辅助开发的普遍乐观看法。
Hallucinated packages are the sleeper threat. LLMs regularly invent package names that don't exist. One study found that nearly 20% of AI-recommended packages were fabrications, and 43% of those hallucinated names appeared consistently across queries.
大多数人认为AI推荐的包都是真实存在的,但作者揭示了AI经常推荐不存在的包,这已成为一种新的攻击向量。攻击者利用这一现象注册'幻觉包'并植入恶意代码,这种'slopsquatting'技术让AI本身成为供应链攻击的放大器。
AI agents select known-vulnerable dependency versions 50% more often than humans. Worse, the vulnerable versions they pick are harder to fix, requiring major-version upgrades far more frequently.
大多数人认为AI编码助手会比人类更安全地选择依赖项,但作者发现AI实际上选择已知漏洞版本的概率比人类高50%,而且这些漏洞更难修复。这是因为AI优化的是'功能是否工作'而非'是否安全',这挑战了AI辅助开发的安全假设。
Talent density : the biggest prizes in capitalism attract the best minds in the field. These are the fastest growing software companies in history.
大多数人认为AI发展主要靠算法突破和计算资源,但作者强调人才密度是推动AI压缩的关键因素,暗示了人才竞争比资本和算法更重要,这与行业普遍重视技术投入的观点相悖。
At this rate, the phone in your pocket will run today's frontier models before you upgrade it.
大多数人认为手机硬件需要不断升级才能运行最新的AI功能,但作者认为技术压缩速度如此之快,以至于现有手机在升级前就能运行曾经的顶级模型,这颠覆了人们对硬件更新周期的认知。
In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression.
大多数人认为AI模型性能提升主要依靠参数数量增加,但作者认为通过算法优化和人才聚集,AI模型可以实现450倍的参数压缩,这挑战了'更大参数等于更好性能'的行业共识。
Within three to four months, you can run a model with similar performance on your laptop; 23 months later, you can run the same model on your phone.
大多数人认为前沿AI技术需要很长时间才能普及到消费级设备,但作者认为前沿模型只需3-4个月就能在笔记本上运行,23个月就能在手机上实现,这种技术下放的速度远超行业普遍预期。
a free model that matches GPT-4o and runs entirely on your phone
大多数人认为顶级AI模型需要庞大的计算资源和云端支持,但作者认为免费模型Gemma 4 E4B已经能在手机上完全运行并匹敌GPT-4o的性能,这打破了人们对AI模型大小和资源需求的固有认知。
Someone who builds premium dating apps, let's say, might use AI coding tools to create in one day what used to take three days. That means the worker is more productive. The worker's employer, spending the same amount of money, can now get more output. So then will the employer want more employees or fewer?
大多数人认为AI提高生产力必然带来就业增长,但作者提出了一个反直觉的问题:当工人效率提高,雇主可能会选择减少而非增加员工。这种质疑挑战了'技术进步-就业增长'的线性因果关系假设。
We need, like, a Manhattan Project to collect this... Fields that are not exposed now will become exposed in the future, so you just want to track these statistics across the entire economy.
大多数人认为应对AI就业影响应该专注于当前受威胁最大的行业,但作者认为我们需要像曼哈顿计划一样全面收集所有行业的价格弹性数据,包括目前尚未受到AI影响的领域。这种前瞻性视角挑战了危机应对的常规思维。
Exposure alone is a completely meaningless tool for predicting displacement
大多数人认为通过分析工作任务的AI暴露程度可以预测哪些工作会被取代,但作者认为这种单一指标完全无意义,因为它忽略了价格弹性和需求变化等关键因素。这挑战了当前AI就业影响研究的主流方法。
in the past year Huawei has overtaken Nvidia as the leading source of AI computing power in China, at least in terms of rated FLOP/s
大多数人可能认为Nvidia在中国市场仍然占据主导地位,但作者认为华为已经超过Nvidia成为中国AI计算能力的主要来源。这一发现挑战了人们对Nvidia在中国市场不可动摇地位的认知,表明本土替代技术可能比预期更快地获得市场份额。
We estimate that as of the end of 2025, Chinese companies collectively own just over 5% of the cumulative computing power of the leading AI chips sold in recent years
考虑到中国AI产业的快速发展和政府对AI的大力投资,大多数人可能认为中国拥有更大比例的全球AI计算能力,但作者认为中国公司仅拥有约5%的全球AI计算能力。这一数字远低于人们的预期,挑战了关于中国AI技术实力的普遍认知。
Many frontier AI developers, including Anthropic and OpenAI, acquire almost all of their compute from hyperscalers and other cloud providers.
大多数人可能认为领先的AI公司会拥有自己的计算基础设施以保持竞争优势,但作者认为OpenAI和Anthropic等前沿AI公司几乎完全依赖超大规模云服务提供商获取计算能力。这表明AI创新可能比想象中更加依赖大型科技公司的基础设施,而非独立的计算资源。
Google holds the equivalent of around 5 million Nvidia H100 GPUs in compute capacity, roughly 25% of the world's total!
大多数人可能认为Nvidia是AI计算能力的最大拥有者,因为他们的芯片被广泛使用,但作者认为谷歌通过其自研TPU芯片拥有相当于500万块H100 GPU的计算能力,占全球总量的25%。这表明自研芯片战略可能比购买商用芯片更能建立计算优势。
We estimate that over 60% of global AI compute (in terms of total computing power) is owned by the five US hyperscalers, led by Google.
大多数人认为AI芯片的分布会更加分散,或者被专门的AI公司如OpenAI和Anthropic所主导,但作者认为全球AI计算能力的大部分被少数几家美国超大规模科技公司控制,这挑战了人们对AI产业结构的认知。这种集中化意味着少数几家公司对AI发展的方向有不成比例的影响力。
複雑なリサーチは、単一のクエリに対する回答の集積ではなく、アイデアの生成から、裏付けとなる証拠の探索、矛盾の解消、そして最終的なレポートとしての構造化まで、一連のプロセスを完遂する必要があります。
大多数人认为AI研究助手应该专注于提供快速、直接的答案,但作者强调复杂研究需要完整的'从想法到结构化报告'的完整过程。这与当前AI助手追求即时回答的主流认知相悖,暗示了质量比速度更重要,这是一个非共识的AI应用观点。
推論時により長く、深く思考させることでよりよいアウトプットを引き出せる。これが推論スケーリングの本質です。
大多数人认为AI应该追求更快的响应速度和更高的效率,但作者认为AI应该'长时间深度思考'才能产生更好的输出。这与当前AI行业追求即时响应的主流认知相悖,提出了一个反直觉的观点:计算效率的提升反而应该用于增加思考深度而非速度。
For higher-interactivity scenarios, execution time for MoE models is bound by expert weight load time. By splitting, or sharding, the experts across multiple GPUs across NVL72 nodes, this bottleneck is reduced, improving end-to-end performance.
大多数人认为MoE模型的主要瓶颈在于计算能力,但作者指出专家权重加载时间是真正的瓶颈,并提出通过跨GPU分片专家权重来解决问题,这挑战了AI模型优化的传统认知,暗示了I/O可能比计算更重要。
NVIDIA yields unmatched inference throughput across the broadest range of workloads, from massive LLMs to advanced vision language models, to generative recommender systems and more, on industry-standard benchmarks.
大多数人认为AI领域存在多个竞争平台在不同领域各有所长,但作者声称NVIDIA在所有工作负载上都表现出色,这挑战了多元化竞争的行业共识,暗示了NVIDIA可能比普遍认为的更具统治力。
Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak chip specifications.
大多数人认为AI性能主要由芯片规格决定,但作者强调硬件、软件和模型的协同设计才是关键,这挑战了以芯片为中心的行业认知,暗示了全栈优化比单纯追求芯片性能更重要。
By applying compute otherwise that goes unutilized to predict and verify additional tokens in parallel (up to three in this implementation), throughput at high interactivity is increased.
大多数人认为计算资源应该用于当前任务,但作者提出利用未充分利用的计算资源并行预测额外令牌的创新方法,这挑战了传统计算资源分配的常识,暗示了AI计算效率的全新可能性。
NVIDIA was the first and only platform to submit DeepSeek-R1 results on MLPerf Inference when the benchmark debuted last year.
大多数人认为AI基准测试会吸引多家竞争平台参与,但作者强调NVIDIA是唯一提交DeepSeek-R1结果的平台,这暗示了NVIDIA在AI基准测试中的垄断地位,与行业多元化竞争的普遍认知相悖。
This means 2.7x more tokens from the same GB300 NVL72-based infrastructure and power footprint, reducing the cost to manufacture each token by more than 60%.
大多数人认为硬件升级是提高AI性能的主要方式,但作者认为通过软件优化可以在相同硬件上实现2.7x的性能提升和60%以上的成本降低,这挑战了行业对硬件升级的依赖。这种观点暗示软件优化可能比硬件升级更具成本效益。
Using vLLM high-throughput LLM serving on DGX Spark provides a high-performance platform for the largest Gemma 4 models
大多数人认为运行最大的Gemma 4模型需要专门的硬件和复杂的部署流程。但作者声称vLLM可以在DGX Spark上高效运行这些大型模型,暗示推理优化技术可能已经达到了一个临界点,使得复杂模型部署变得更加简单和高效。
The E4B and E2B are the newest edition of on-device and mobile designed models first launched with Gemma 3n.
大多数人认为移动设备上的AI模型需要大幅简化功能才能高效运行。但作者暗示Gemma 4的E4B和E2B版本在移动设备上仍然保持了多模态能力,包括文本、音频、视觉和视频处理,这挑战了移动AI能力的传统认知。
The bundle includes four models, including Gemma's first MoE model, which can all fit on a single NVIDIA H100 GPU and supports over 140 languages.
大多数人认为支持140多种语言的多模态模型需要大量计算资源,无法在单个GPU上运行。但作者声称这些模型可以全部适配在单个H100 GPU上,这挑战了我们对大型多语言模型资源需求的认知,暗示模型效率可能大幅提升。
Modern physical AI agents are evolving rapidly with Gemma 4 models that integrate audio, multimodal perception, and deep reasoning capabilities.
大多数人认为物理AI代理仍处于早期阶段,主要执行简单任务。但作者暗示Gemma 4已经使物理AI代理能够理解语音、解释视觉上下文并智能推理,这代表了对当前机器人技术能力的重大提升,可能会加速AI实体化的进程。
The 31B and 26B A4B variants are high-performing reasoning models suitable for both local and data center environments.
大多数人认为大型语言模型(31B参数)只能在数据中心环境中运行,但作者声称这些模型可以在本地环境中高效运行。这一观点与行业共识相悖,暗示边缘计算能力可能比我们想象的更强大,可能会改变AI部署的格局。
NVFP4 enables 4-bit precision while maintaining nearly identical accuracy to 8-bit precision, increasing performance per watt and lowering cost per token.
大多数人认为降低模型精度会显著牺牲性能,但作者声称Gemma 4通过NVFP4量化技术实现了4位精度与8位精度几乎相同的准确率。这一反直觉的结论挑战了传统量化会大幅降低模型性能的认知,暗示NVIDIA可能在量化技术方面取得了突破性进展。
By using SAM, the Alta team has been able to process more than 20 million images without incurring exorbitant costs, allowing them to focus on building the best possible product for their users.
大多数人可能认为初创公司需要依赖昂贵的第三方API来处理大量图像,但作者通过使用开源SAM模型,实现了大规模图像处理而不产生巨额成本。这一观点挑战了'高质量AI服务必须昂贵'的行业共识,展示了开源模型在成本效益方面的优势。
If we knew that every image uploaded was a beautiful model shot, segmentation would be far easier, but because of the nature of user-uploaded content, we need the best possible segmentation.
大多数人可能认为高质量的专业照片是AI图像处理的理想输入,但作者暗示即使是'完美'的模特照片实际上比用户上传的真实内容更容易处理。这一观点挑战了人们对'理想训练数据'的假设,暗示真实世界数据的'不完美'实际上构成了更严峻的技术挑战。
Fashion in particular has one of the most complex image datasets, especially because of the inconsistent nature of user-uploaded content.
大多数人可能认为时尚图像处理相对简单,因为时尚行业通常追求完美呈现。但作者认为时尚领域实际上拥有最复杂的图像数据集,因为用户上传的内容极不一致。这一反直觉观点揭示了时尚AI技术面临的独特挑战,挑战了人们对时尚图像处理难度的普遍认知。
Built from the same world-class research and technology as Gemini 3
大多数人认为Google会将其最先进技术保留在专有Gemini模型中,而开源版本会有所降级。但作者声称Gemma 4与Gemini 3使用'相同的世界级研究和技术',挑战了'开源版本是次级产品'的普遍认知。
Engineered from the ground up for maximum compute and memory efficiency
大多数人认为高性能AI模型必然需要大量计算资源和内存。但作者强调Gemma 4的边缘模型是'从头开始为最大计算和内存效率而设计',暗示即使在资源受限的环境中也能实现高级AI功能,这与行业对AI资源需求的普遍认知相悖。
The edge models feature a 128K context window, while the larger models offer up to 256K
大多数人认为边缘设备/移动设备上的AI模型功能受限,尤其是在处理长上下文方面。但作者声称即使在移动设备上,Gemma 4也能提供128K的上下文窗口,挑战了边缘AI能力有限的普遍认知。
Gemma 4 outcompetes models 20x its size
大多数人认为AI模型的性能与参数规模直接相关,更大的模型必然更强大。但作者指出Gemma 4能够超越比它大20倍的模型,这挑战了'越大越好'的主流认知,暗示效率优化可能比纯规模更重要。
Byte for byte, the most capable open models
大多数人认为开源模型在性能上无法与闭源/专有模型相提并论,但作者声称Gemma 4是'字节对字节最强大的开源模型',挑战了这一行业共识。这暗示开源模型在特定指标上已经超越了商业闭源模型,是一个非传统的观点。
Teams at companies like Notion, Ramp, Braintrust, and Wasmer are already using Codex to accelerate their engineering workflows.
大多数人可能认为AI编程工具主要被大型科技公司采用,但作者认为即使是像Notion、Ramp这样的非传统科技公司也在将Codex整合到其核心工程工作流中,这挑战了人们对AI编程工具采用者类型的传统认知,表明其适用范围比预期更广泛。
Within ChatGPT Business and Enterprise, the number of Codex users has grown 6x since January.
大多数人可能认为企业AI工具的采用是渐进式的,但作者认为Codex在企业环境中的采用呈爆炸性增长(6倍增长),这表明AI编程助手可能比预期更快地从实验性工具转变为生产力核心,挑战了人们对AI技术企业采用速度的常规认知。
Codex-only seats have no rate limits, and usage is billed on token consumption.
大多数人认为AI服务通常会设置使用限制以控制成本,但作者认为Codex无速率限制的按token计费模式是可行的,因为这提供了更透明的成本结构和更灵活的使用体验,这可能反映了OpenAI对自身技术效率和用户需求的信心。
Priority areas include safety evaluation, ethics, robustness, scalable mitigations, privacy-preserving safety methods, agentic oversight, and high-severity misuse domains.
大多数人认为AI安全研究主要集中在防止恶意使用和确保系统对齐人类价值观上。但作者将隐私保护方法列为优先领域,这表明OpenAI正在将隐私视为安全的核心组成部分,而非一个独立考虑的因素,这与传统上将隐私和安全视为两个不同领域的观点相悖。
Fellows will receive API credits and other resources as appropriate, but will not have internal system access.
在AI安全领域,许多人认为要真正研究系统安全,必须获得对内部系统的完全访问权限。作者明确表示研究员将无法访问内部系统,这挑战了传统AI安全研究的假设,暗示OpenAI认为安全研究可以在没有完全系统访问的情况下进行,或者他们有其他方法来评估安全性。
Fellows will work closely with OpenAI mentors and engage with a cohort of peers.
大多数人认为AI安全研究应该是高度保密和孤立的,特别是涉及高级AI系统安全的研究。但作者强调与OpenAI导师的紧密合作和同行交流,表明OpenAI正在采取一种开放协作的AI安全研究方法,这与行业通常的封闭研究模式形成鲜明对比。
We prioritize research ability, technical judgment, and execution over specific credentials.
在学术界和科技行业,学历和传统资历通常被视为最重要的筛选标准。作者明确表示优先考虑实际能力而非特定资历,这挑战了行业普遍的人才评估体系,暗示OpenAI正在寻找非传统路径的创新者,而非仅看名校背景的精英。
We are especially interested in work that is empirically grounded, technically strong, and relevant to the broader research community.
大多数人认为AI安全研究应该是高度理论化和抽象的,但作者强调需要实证基础和技术强度,这表明OpenAI正在将AI安全研究从纯理论领域转向更注重实际应用和可验证成果的方向,这与传统AI安全研究的精英主义倾向形成对比。
The vast majority of the new compute will be sited in the United States, making this partnership a major expansion of our November 2025 commitment to invest $50 billion in strengthening American computing infrastructure.
大多数人认为AI计算基础设施将全球化分布,但Anthropic选择将绝大多数计算能力设在美国,这与常见的全球化技术部署趋势相悖,挑战了人们对AI基础设施地理分布的主流认知,反映了地缘政治对技术部署的深远影响。
Claude remains the only frontier AI model available to customers on all three of the world's largest cloud platforms: Amazon Web Services (Bedrock), Google Cloud (Vertex AI), and Microsoft Azure (Foundry).
大多数行业观察者认为顶级AI模型会通过独家合作伙伴关系锁定到单一云平台,但Anthropic选择了全面覆盖策略,这挑战了常见的平台锁定商业模式,暗示了AI基础设施市场可能比预期的更加开放和竞争。
We train and run Claude on a range of AI hardware—AWS Trainium, Google TPUs, and NVIDIA GPUs—which means we can match workloads to the chips best suited for them.
大多数人认为AI公司会依赖单一硬件供应商以获得最佳性能,但Anthropic采用多平台策略,挑战了行业共识。这种多元化方法虽然增加了复杂性,但提供了更好的性能和弹性,暗示了AI计算的未来可能更加分散而非集中。
over 500 business customers were each spending over $1 million on an annualized basis. Today that number exceeds 1,000, doubling in less than two months.
大多数人对AI企业客户的采用速度持保守态度,但Anthropic的高价值客户数量在短短两个月内翻倍,表明企业对AI的采用速度和投资规模远超行业预期,挑战了AI企业市场缓慢发展的普遍认知。
Demand from Claude customers has accelerated in 2026. Our run-rate revenue has now surpassed $30 billion—up from approximately $9 billion at the end of 2025.
大多数人认为AI公司仍处于烧钱阶段,但Anthropic的收入增长速度惊人,从2025年底的90亿美元年化收入飙升至2026年的300亿美元,这表明AI商业化速度远超市场预期,挑战了AI公司长期亏损的共识观点。
Figure 2. Four mechanisms support concurrent task execution in CORPGEN: hierarchical planning, isolated subagents, tiered memory, and adaptive summarization.
特别的微软
Clean Room as a Service
信创福音?
Some retailers / brands offer this on their website already, but it’s limited to their SKUs. We see an opportunity for AI consultants that have deep knowledge on a product category across different brands, and that learn more context on each user and their preferences over time (for example, if it helps you buy a sofa, it can later tailor chair recommendations to things that match).
一个符合需求的收纳箱
深势科技
https://www.dp.tech/knowledge
OJBench
https://arxiv.org/pdf/2506.16395v1 Comparison between CPP and Python
Multi Hop
PWA-enabled dashboard
类metero框架的机会?
A managed runtime environment
BaaS又复活了
consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute)
RL for post-train, time spent thinking for inference? How?