Since Gemini is built directly in Sheets, it removes the barrier to writing complex formulas for advanced analysis right where you work.
大多数人认为复杂公式编写需要专门的编程知识或外部工具,但作者认为将AI直接集成到工作环境中就能消除这一障碍,这挑战了专业工具需要独立学习环境的传统观念。这种'无感集成'可能重新定义软件功能的边界。
Since Gemini is built directly in Sheets, it removes the barrier to writing complex formulas for advanced analysis right where you work.
大多数人认为复杂公式编写需要专门的编程知识或外部工具,但作者认为将AI直接集成到工作环境中就能消除这一障碍,这挑战了专业工具需要独立学习环境的传统观念。这种'无感集成'可能重新定义软件功能的边界。
When you encounter a formula error, Gemini can analyze the surrounding data structure to help provide an easy-to-understand explanation of the core issue alongside a corrected version of the formula.
大多数人认为AI工具需要用户提供明确的指令才能解决问题,但作者认为Gemini能够主动分析数据结构并自动提供解决方案,这挑战了传统AI辅助工具需要用户主导的常识。这种自动纠错能力暗示AI正在从'助手'角色向'自主问题解决者'转变。
Serves as the first generative core for social world models, a foundation for next-generation AI-native social platforms.
大多数人认为社交平台的核心是用户连接和内容分发,而非生成式AI。作者提出AI生成内容应成为社交平台的基础架构,这挑战了当前社交媒体平台的根本设计理念。
MaineCoon is optimized for social-interactive applications using several novel techniques: self-resampling, cross-modal representation alignment, domain-aware preference optimization, and reinforced online-policy distillation (ROPD).
大多数人认为视频生成模型主要关注视觉质量和内容连贯性,但作者强调社交互动性是核心优化目标。这挑战了传统视频生成模型的评估标准,暗示社交互动性可能比视觉保真度更重要。
Intel was once a silicon powerhouse, designing the most cutting-edge CPUs for computers and servers, and building them in its own fabs. But in the 2010s, the big new markets were mobile-phone chips and GPUs for AI and gaming, and Intel rapidly lost ground.
大多数人认为曾经的行业领导者可以通过持续创新保持领先地位,但作者暗示Intel的衰落是由于未能预见市场变化。这挑战了人们对技术巨头持久竞争力的认知,强调了市场预测和适应能力的重要性。
The industry has only shifted paradigms when it just absolutely cannot extend—even one more little bit—out of what it's been doing.
大多数人认为技术行业会主动寻求创新和突破,但作者认为芯片行业只有在现有技术达到极限时才会转向新范式。这与人们对技术行业创新文化的认知相悖,暗示该行业实际上比人们想象的更为保守。
This is like 30% to 50% better in terms of capability. This is probably the first tool that hasn't obviously made business sense right away for ASML.
大多数人认为ASML的每一次技术突破都会立即带来商业成功,但作者暗示高NA EUV机器可能是第一个在商业上不明显的进步。这与人们对ASML持续创新的预期相悖,暗示技术进步并不总是自动转化为商业优势。
They would be very happy to have a tool that does one wafer per hour and it costs them a fortune to run. They would build a fab with a thousand of those and be super happy with it.
大多数人认为效率低下、成本高昂的制造设备是失败的象征,但作者认为中国可能会接受效率极低的EUV设备,因为摆脱对西方技术的依赖是他们的首要目标。这挑战了传统制造业追求效率和成本效益的常识。
Microsoft's efficiency-first messaging surrounding the Maia 200 follows its recent trends of stressing the corporation's concern for communities near its data centers... taking great lengths to deafen the backlash to the AI boom.
大多数人认为科技巨头对AI环境影响的关注只是公关策略,但作者认为微软在Maia 200上强调的效率优势可能反映了其真正的战略转向。这一观点挑战了'企业环保声明仅为营销'的主流认知,暗示微软可能在将环保理念融入产品设计的道路上走在行业前列。
Microsoft claims the Maia 200 gives 30% more performance per dollar than the first-gen Maia 100, an impressive feat considering the new chip also technically advertizes a 50% higher TDP than its predecessor.
大多数人认为芯片性能提升必然伴随着功耗增加和成本上升,但作者认为微软在Maia 200上实现了性能每美元提升30%的同时,功耗仅增加50%,这挑战了AI芯片领域'性能提升必然伴随能耗大幅增加'的行业共识,暗示了架构优化的巨大潜力。
The Maia 200 does beat the B300 in efficiency, however... no outside customers can purchase the Maia 200 directly, the Blackwell B300 Ultra is tuned for much higher-powered use-cases than the Microsoft chip, and the software stack for Nvidia launches it miles ahead of any contemporary.
大多数人认为封闭专用的芯片架构会限制其市场竞争力,但作者认为微软的封闭策略反而成就了Maia 200在特定场景下的效率优势。这一观点挑战了'开放架构必然胜出'的传统认知,暗示在AI芯片领域,针对特定场景的定制化设计可能比通用架构更具优势。
Maia 200 is built on TSMC's 3nm process node, and it contains 140 billion transistors. The chip can hit up to 10 petaflops of FP4 compute, Microsoft claims, three times higher than Amazon's Trainium3 competition.
大多数人认为3nm工艺主要用于消费级高端芯片,且认为在AI领域Nvidia和AMD是无可争议的领导者,但作者认为微软通过自研Maia 200芯片,在相同工艺节点上实现了比亚马逊专用芯片高三倍的性能,挑战了云服务提供商只能作为芯片'购买者'而非'技术引领者'的行业共识。
The Maia 200 does beat the B300 in efficiency, however, a big win in a day where public opinion against AI's environmental effects is steadily mounting. The Maia 200 operates at almost half of B300's TDP (750W vs 1400W)
大多数人认为高性能AI芯片必然伴随着高能耗和散热挑战,但作者认为微软的Maia 200在提供强大计算能力的同时实现了惊人的能效优势,仅消耗Nvidia Blackwell B300 Ultra一半的功率。这一反直觉的发现挑战了AI领域'性能与能耗成正比'的传统认知,暗示了专用AI芯片架构设计的创新突破。
Such automation efforts may give Chinese automakers a significant edge in competitiveness as global EV adoption continues to rise—even as US automakers have already been retreating from EV production in the wake of the Trump administration's decisions.
大多数人认为美国在电动汽车技术和生产方面领先全球,但作者提出中国通过大规模自动化在电动汽车制造方面获得竞争优势,而美国反而正在退缩,这与美国科技霸权的主流认知相悖。
Technological development has the capability of making work safer for the working class and enabling workers to have a shorter work week without losing pay. But in the bosses' and billionaires' hands it's used to pad profits and lay off workers.
大多数人认为技术进步最终会造福工人阶级,创造更安全的工作环境和更短的工作周,但作者通过工会代表之口提出,技术实际上被资本家用来增加利润和解雇工人,挑战了技术必然带来福祉的主流观点。
Recent events highlight how important open source is to the AI ecosystem, with more nations and enterprises recognizing the risks and costs associated with exclusively depending on closed models.
大多数人认为封闭式AI模型因其专有技术和性能优势而更受青睐,但作者认为开源AI生态系统正变得越来越重要,因为各国和企业正在认识到完全依赖封闭模型的风险和成本,这挑战了AI行业向封闭系统发展的主流趋势。
For SpaceX, the deal is another sign that compute itself has become strategic currency in the AI race.
大多数人认为AI竞争的核心是算法和模型创新,但作者认为计算能力本身已成为AI竞赛的战略货币,因为SpaceX通过提供计算能力而非开发AI模型来参与AI竞赛,这挑战了人们对AI竞争核心要素的传统理解。
Reflection has leaned directly into that pitch as the startup, last valued at $25 billion, is trying to build American open-source AI models that can compete with frontier systems from OpenAI, Anthropic and Google.
大多数人认为AI领域由少数几家封闭式巨头主导,但作者认为开放源码AI模型能够与OpenAI、Anthropic和Google等前沿系统竞争,因为Reflection等公司正在构建能够匹敌这些巨头的开源模型,这挑战了AI领域由封闭系统主导的共识。
The deal shows how SpaceX is using its massive data center build-out after its record initial public offering.
大多数人认为SpaceX的核心业务是火箭和太空探索,但作者认为SpaceX已经转型为一家AI基础设施公司,因为该公司正在将其数据中心Colossus作为商业计算平台对外提供服务。这挑战了人们对SpaceX业务范围的传统认知。
The models are finally ready. Costs of inference are getting optimized with open models, and even on-device models.
大多数人认为AI领域仍然处于早期阶段,模型成本高且实用性有限,但作者认为模型已经'准备就绪',推理成本正在优化,这一观点暗示AI应用可能比大多数人预期的更快进入实用阶段,挑战了行业对AI成熟度的普遍认知。
most great products start out looking like a toy. In the early social days, people saw Twitter as a dumb site where people posted what they had for breakfast
大多数人认为成功的创业产品从一开始就应该展现明确的价值主张和商业潜力,但作者认为伟大的产品往往看起来像个玩具,这一观点挑战了传统产品评估标准,暗示我们应该重新审视那些看似简单或娱乐性的产品潜力。
They are native world-builders themselves. They came up playing Roblox and Minecraft, they have no preconceived limitations about what an app is, or what they can do with it.
大多数人认为Z世代和Alpha世代只是数字原生代,但作者认为他们实际上是'原生世界构建者',这暗示新一代用户不仅是技术的消费者,更是创造者,这将从根本上改变产品开发范式。这一观点挑战了传统用户画像的认知。
But two important things have changed, that have completely opened the world-building doors again.
大多数人认为消费者科技领域已经趋于饱和,创新空间有限,但作者认为AI和新生代用户行为正在重新打开世界构建的大门,这是一个与主流认知相悖的观点。作者暗示消费者科技领域正处于新一轮创新周期的起点,而非成熟期。
Opening up OAuth to all customers is an important step toward a broader Cloudflare app ecosystem
大多数人认为将关键安全功能如OAuth开放给所有用户会增加风险,但作者认为这种开放对于构建更广泛的生态系统至关重要,挑战了传统上'安全优先'的API设计理念,展示了以平台生态为中心的开放策略。
We gathered additional metrics during the database migrations, and observed considerable performance improvements after the upgrade was complete
大多数人认为大型系统升级主要关注功能更新和兼容性,但作者强调性能提升是升级的重要成果,API响应时间降低45%,内存使用减少14-40%。这种将性能提升作为主要成功指标的观点挑战了传统系统升级评估框架,展示了以性能为中心的工程价值观。
We chose an upgrade window when Hydra had the lowest request volume per second to minimize lost token writes
大多数人认为系统升级应该安排在低流量时段以最小化用户影响,但作者选择在请求量最低时升级以减少令牌写入丢失,这种优先考虑系统内部状态而非用户体验的思路与传统运维实践相悖,展示了独特的系统优化视角。
if a refresh token was reused, Hydra would invalidate the whole access and refresh token chain
大多数人认为重用刷新令牌应该只影响单个令牌,但作者指出新版本会撤销整个访问和刷新令牌链,这实际上提高了安全性但改变了客户端行为。这种严格的做法与大多数OAuth实现中更宽松的令牌重用策略形成对比,代表了更安全但可能破坏兼容性的设计选择。
we decided to do two smaller sequential upgrades rather than doing one large upgrade
大多数人认为系统升级应该一次性完成以减少复杂性,但作者认为分阶段升级更合适,因为这样可以逐步评估行为和性能变化,降低风险。这种渐进式方法与传统的'大爆炸式'升级策略形成鲜明对比,展示了更谨慎、更可控的工程思维。
Include AI-generated sexualized impersonation as a separate category in standard content reporting and appeal forms, distinct from 'harassment' or 'nudity.'
大多数人认为性化AI内容应归类为现有类别如骚扰或色情内容,但作者认为它需要独立分类,这挑战了当前内容审核系统的分类框架。这一观点承认AI生成内容的特殊性,暗示传统内容分类可能不足以应对新兴技术带来的新型伤害。
Meta said that when the content was flagged, the company had no indication that the individual depicted in the video was 'a real person' because they did not report the content.
大多数人认为平台应该依赖受害者举报来确认内容真实性,但作者质疑这一做法,暗示平台有责任主动识别AI生成的性化内容,即使没有受害者举报。这一观点挑战了当前平台责任边界的主流认知,要求平台承担更多预防性责任。
Broadening the signals of lack of consent in this way would especially benefit non-public figures who are the targets of non-consensual intimate imagery because it would reduce the burden on victims to report the abuse themselves.
大多数人认为保护非公众人物需要更多资源或特殊渠道,但作者认为只需扩大'缺乏同意'的信号范围就能减轻受害者负担,这挑战了需要复杂解决方案的常规思维。这一观点暗示平台可以通过简单的政策调整而非系统性改革来保护弱势群体。
The Board finds that AI-generated impersonation is non-consensual by default and should be added to the set of signals the company uses to establish lack of consent.
大多数人认为只有当真实受害者举报时才能确认内容是非自愿的,但作者认为AI生成的性化模仿默认就是非自愿的,这挑战了当前平台需要受害者主动举报才能采取行动的主流做法。这一观点将举证责任从受害者转移到了平台和内容创建者身上。
We would like to thank Deepseek-OCR, Deepseek-OCR-2, PaddleOCR for their valuable models and ideas.
大多数人认为在AI领域,新模型通常会明确指出其与之前工作的根本性区别。作者感谢多个现有OCR模型,但没有明确说明Unlimited-OCR与这些模型的根本性创新差异,暗示可能只是现有方法的组合而非真正的突破,这与AI领域通常强调创新性的文化相悖。
no_repeat_ngram_size= 35
大多数人认为OCR系统不需要特别处理n-gram重复问题,因为这主要在文本生成中重要。作者专门设置了no_repeat_ngram_size参数为35,表明他们的OCR系统需要防止长文本中的重复模式,这挑战了OCR只是简单提取文本而不需要处理文本生成特性的主流认知。
max_length= 32768
大多数人认为OCR模型处理的文本长度受限于模型架构,通常在几千词左右。作者设置的max_length高达32768,这远超传统OCR系统的处理能力,暗示了模型能够处理超长文档而不丢失上下文,挑战了OCR系统的长度限制认知。
Single image supports two configs: gundam or base
大多数人认为OCR模型需要针对特定任务或文档类型进行专门配置,但作者提出单个图像就能支持两种截然不同的配置('gundam'或'base'),这挑战了OCR系统通常需要针对特定场景进行专门配置的行业共识。
Welcome the Era of One-shot Long-horizon Parsing.
大多数人认为OCR技术需要针对不同类型的文档进行多次处理或微调,但作者声称Unlimited-OCR实现了'一次性长距离解析',这挑战了OCR领域需要多次处理的常规认知,暗示一个模型可以处理各种复杂文档而无需专门训练。
Continual improvements through a safety data flywheel, which continually learns from the road how to expand the set of operational design domains for safe deployment.
大多数人认为自动驾驶安全应该基于静态的、预先定义的操作设计域,但作者提出动态学习和扩展安全边界的'安全数据飞轮'概念。这一观点挑战了传统静态安全边界观念,暗示自动驾驶系统需要不断学习和适应新的安全场景,而非固定在一套预定义规则中。
NVIDIA is the first company accredited by ANAB for an inspection plan that combines cybersecurity, AI, and functional safety.
大多数人认为网络安全、AI功能和传统安全应该是分开评估的领域,但作者认为这三者必须结合评估才能确保真正的安全。这一观点挑战了行业传统做法,暗示单独评估每个安全维度无法捕捉现代自动驾驶系统的复杂风险交互。
A diverse AV stack that combines a modular stack and NVIDIA Alpamayo reasoning VLA models for algorithmic AI safety.
大多数人认为自动驾驶安全应该依赖于单一、确定性的算法来确保可靠性,但作者认为结合模块化堆栈和推理VLA模型的多样化方法才能实现真正的算法安全。这种观点挑战了行业对单一'最佳算法'的追求,提出多样性本身就是安全策略的一部分。
For Robotaxis, Safety Must Be Built In, Not Bolted On
大多数人认为可以在现有系统上添加安全功能来提高自动驾驶安全性,但作者认为安全必须内建于系统架构中,而不是后期添加。这种观点挑战了常见的'安全叠加'模式,暗示传统方法无法满足L4级自动驾驶的安全要求,需要从设计阶段就将安全作为核心要素。
NVIDIA Halos is a full-stack, comprehensive safety system that unifies safety elements across vehicle architecture, AI models, chips, software, tools, and services to ensure the safe development and deployment of autonomous vehicles (AVs) from cloud to car.
大多数人认为自动驾驶安全主要关注车辆本身和传感器,但作者认为安全需要从云到车的全栈统一,包括AI模型、芯片、软件和服务的全面整合。这种全栈安全观挑战了传统上认为安全可以分模块处理的行业共识,提出了一个更全面但也更复杂的安全框架。
The NVIDIA DSX reference design for AI factories has zero water consumption — we have eliminated massive amounts of power usage and pretty much all water usage.
大多数人认为数据中心是水资源消耗大户,但作者声称NVIDIA的AI工厂设计实现了零水消耗。这与人们对数据中心需要大量水资源进行冷却的传统认知相悖,提出了一个可能彻底改变数据中心水资源使用模式的创新方案。
In the right geographic location, with the right system design, you don't need any refrigeration equipment. You can just put big radiator coils outside and use the air temperature for all your cooling. It's incredibly efficient.
大多数人认为数据中心必须依赖复杂的制冷系统,但作者认为在适当地理位置,仅依靠外部空气温度和散热线圈就能实现高效冷却。这一观点挑战了传统数据中心必须配备复杂制冷系统的行业共识,提出了更简单、更节能的替代方案。
For agent products, that may matter more than raw benchmark scores.
大多数人认为AI模型性能的主要衡量标准是基准测试分数,但作者认为在长期交互中保持角色一致性(persona stability)比原始性能分数更重要。这一观点挑战了当前AI评估体系的共识。
The specific models Fugu selects and how it coordinates them are proprietary, so this routing information is not exposed by design.
大多数人认为AI系统的透明度和可解释性是建立信任的关键,但作者选择保持模型选择和协调机制的专有性,不公开这些信息。这种与行业透明度趋势相悖的做法挑战了AI系统可解释性的共识。
We never stack model fees; you are charged a single rate based on the top tier model involved.
大多数人认为使用多个模型的多智能体系统会叠加各个模型的费用,导致成本高昂,但作者提出了创新的定价模式,只收取最顶级模型的单一费率。这种颠覆性的定价策略挑战了传统多模型服务的商业模式。
Fugu models surpass publicly accessible frontier models and are shoulder-to-shoulder with Fable 5 and Mythos Preview in various rigorous engineering, scientific, and reasoning benchmarks while delivering frontier capability without the risk of export controls.
大多数人认为前沿AI模型性能的提升依赖于单一厂商的专有技术和更大规模的参数,但作者认为通过动态协调多种现有模型可以实现与顶级专有模型相当的性能,同时规避出口管制风险。这一观点挑战了当前AI发展路径的共识。
Instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns.
大多数人认为多智能体系统需要预先定义的角色分工和工作流程,但作者认为Fugu系统能够自主发现并学习非直观但高效的协作模式,打破了传统AI系统设计中的预设框架思维。这种自组织能力挑战了当前多智能体系统设计的共识。
Raw output quality is on par with top frontier models, but Fugu showed unusually strong persona stability across long sessions, holding its identity where other models drift.
大多数人关注AI模型的输出质量,但作者强调Fugu模型在长时间会话中表现出异常强的角色稳定性(persona stability),而其他模型则容易出现角色漂移。这一观点将AI的个性稳定性置于传统性能指标之上,挑战了行业评估AI能力的标准。
Fugu Ultra is significantly better than GPT-5.5. It gives comprehensive answers and finds the bugs others miss. Where other tools flag about three issues, Fugu surfaced more than twenty.
大多数人认为OpenAI的GPT系列模型在代码审查等任务上处于领先地位,但作者声称他们的Fugu Ultra模型在代码审查方面显著优于GPT-5.5,能发现多出六倍以上的问题。这一直接挑战行业领导者地位的声明极具争议性。
the most powerful AI systems will not be isolated monoliths, but collaborative ecosystems.
大多数人认为AI发展的方向是构建越来越大的单一模型(monolith),但作者认为未来最强大的AI将是协作生态系统(collaborative ecosystems),因为单一模型无法满足现实世界中复杂任务所需的多样化专业知识。这一观点挑战了当前AI行业追求更大规模模型的共识。
The effects of early exposure to deoxyglucose persisted even when researchers removed the glucose-like molecule.
大多数人认为细胞代谢效应是可逆的,一旦干扰因素被移除,细胞应恢复正常状态。但作者发现,早期接触脱氧葡萄糖的影响即使在移除该分子后仍然存在,这挑战了人们对细胞代谢可逆性的传统认知,暗示可能存在某种'代谢记忆'现象。
helps sustain progress across long-running projects
大多数人认为AI在长期项目中效果会随时间递减,因为缺乏持续学习和适应能力,但作者暗示Codex能够帮助维持长期项目的进展。这与当前AI应用在长期项目中的实际表现相悖,暗示AI工具已经发展出支持持续工作的能力。
preserves context, manages complex workflows
大多数人认为AI工具在长期项目中面临的主要挑战是上下文丢失和记忆问题,但作者暗示Codex能够保持上下文并管理复杂工作流。这与当前AI工具在长期项目中的实际限制相悖,暗示技术已经发展到可以支持更持久的AI工作流。
break ambitious goals into verifiable steps
大多数人认为AI擅长处理整体目标和复杂任务,但作者暗示即使对于宏大的目标,也应该将其分解为可验证的步骤。这与当前AI应用中常见的'一次性解决复杂问题'的思路相悖,暗示长期项目需要更结构化的方法。
determine when to delegate execution to Codex versus when human oversight is most valuable
大多数人认为AI应该尽可能自动化以减少人工干预,但作者提出需要明确区分哪些任务应该完全交给AI,哪些需要人工监督。这与主流的'全面自动化'理念相悖,暗示在某些情况下人工监督可能比完全自动化更有价值。
How Codex helps work continue beyond a single prompt
大多数人认为AI工具主要适用于一次性任务或简单查询,但作者暗示Codex能够支持持续性的长期工作,这与当前主流认知相悖。大多数人认为AI需要不断重新初始化上下文,而作者则提出了'持久工作空间'的概念,暗示AI可以保持长期项目中的连续性。
Our models identified a 23-year-old use-after-free in OpenBSD's kernel implementation of System V semaphores.
大多数人认为长期存在的开源项目中的古老代码已经经过充分审查,不太可能存在严重漏洞,但作者认为AI能够发现人类安全专家在23年间都未识别出的关键漏洞。这挑战了人工代码审查的全面性假设。
Security engineers reviewed every finding before it reached a maintainer... While frontier AI models are highly capable of finding vulnerabilities and patching them, they also produce a high volume of false positives
大多数人认为AI可以直接替代人类安全专家进行漏洞评估,但作者认为即使是最先进的AI模型也会产生大量误报,仍需人类专家进行验证和过滤。这挑战了AI完全自主安全研究的可行性预期。
The completed setup took less than a day. Trail of Bits estimates that building the same lab manually would ordinarily take at least several weeks.
大多数人认为安全测试实验室的开发需要数周甚至数月的专业工作,但作者认为AI辅助可以在一天内完成同样的工作,效率提升了数十倍。这一反直觉的加速挑战了传统安全工程的时间框架预期。
Trail of Bits engineers found that, with limited guidance, GPT‑5.5‑Cyber made useful choices about where to expand coverage, which builds and entry points to probe, and which candidates were too weak to pursue.
大多数人认为AI模型需要大量精确指导才能有效工作,但作者认为GPT-5.5-Cyber仅凭有限指导就能自主做出明智的安全分析决策,因为它能够自主判断哪些测试路径有价值,哪些候选问题值得探索。这挑战了AI需要过度监督的常规认知。
When a connection breaks, you should be able to find out why. And an administrator should be able to decide, down to the individual tool, what is available in each part of the organization.
大多数人认为连接器故障排查应该简化,而工具访问控制应该采用更粗粒度的管理,但作者主张细粒度的故障诊断和工具级控制,这挑战了简化管理的行业趋势。
Automated work should run on behalf of a user or a service account, never impersonate the person who wrote it.
大多数人认为自动化任务应该以创建者的身份运行以便于调试和责任追踪,但作者坚决反对这种做法,认为自动化工作必须使用独立的服务账户,这挑战了常见的自动化身份管理实践。
Production connectivity has a few non-negotiables. A connector should respect two sets of rules at once: the permissions already set in the source platform, and the controls your administrators set in Mistral Studio or Vibe.
大多数人认为连接器应该简化权限管理,采用单一权限模型,但作者坚持双重权限控制,认为必须同时尊重源平台权限和管理员设置,这增加了复杂性但提高了安全性,挑战了简化权限的主流观点。
Async agents are moving into everyday work. For an agent to be trustworthy and useful inside an organization, it needs real enterprise data: CRM records, repositories, inboxes, knowledge bases.
大多数人认为AI助手应该先在受限环境中测试,然后再逐步接入企业敏感数据,但作者认为AI助手应该直接接入企业真实数据才能变得可信和有用,这挑战了传统AI安全部署的渐进式方法。
What varies is how much you layer on top. Use OCR 4 in pure extraction mode when you want to: Work directly with the raw response
大多数AI模型发布时强调其高级功能和API能力,但作者建议用户可以直接使用原始输出而非添加额外层,这挑战了行业默认添加更多AI处理步骤的趋势,提倡更简单的解决方案。
Models building their own software tools might have seemed outlandish not long ago, but it is happening. It would be unwise to rule out the same trajectory in hardware.
大多数人认为AI在硬件领域的自主发展和创新还很遥远,但作者认为AI在硬件领域可能遵循与软件工具相同的轨迹,因为软件工具的自主开发已经从看似荒谬变成了现实。这是一个挑战行业共识的观点,暗示了AI可能更快地实现对物理世界的直接控制。
We are plausibly entering the early era of physical agentic AI.
大多数人认为AI与物理世界的交互还需要很长时间才能实现,但作者认为我们正在进入物理智能代理AI的早期时代,因为AI已经能够独立操作现成的物理工具。这是一个与主流认知相悖的观点,暗示了AI与物理世界融合的速度可能比预期快得多。
This progress is not the result of a concerted effort to improve the robotics capabilities of our models. These improvements, like so many others in the history of LLM development, have emerged from much more general scaling.
大多数人认为AI在特定领域的进步需要针对性的优化和训练,但作者认为AI在机器人领域的进步主要来自于通用规模的扩大,而非专门针对机器人能力的改进。这与传统的AI发展理念相悖,暗示了AI能力可能具有不可预测的涌现特性。
it was as or more successful than both human teams while producing almost ten times less code than Team Claude.
大多数人认为AI模型需要编写大量代码才能完成任务,但作者认为AI模型能够以更少的代码实现相同甚至更好的结果,因为Opus 4.7成功完成了任务,而代码量仅为人类团队的十分之一。这挑战了编程领域的传统认知,即更多代码等于更多能力。
Claude Opus 4.7—operating without human assistance—was about 20 times faster than the fastest human team at all tasks completed by our participants less than a year ago.
大多数人认为AI在物理世界任务中仍然需要人类监督和指导,但作者认为AI模型已经能够独立完成复杂的机器人任务,并且速度远超人类团队,因为实验显示Opus 4.7在没有人类协助的情况下,比之前最快的人类团队快了20倍。这挑战了人们对AI在物理世界操作能力的普遍认知。
We now spend much more of our time delegating tasks to many Claudes in parallel.
大多数人认为AI会取代人类工作,导致失业,但作者认为AI实际上改变了人类工作方式,让人们转向更高层次的任务分配和管理。这挑战了关于AI与就业关系的传统叙事,表明AI可能创造新的工作形式而非简单替代人类。
Today, 65% of our product team's code is created by our internal version of Claude Tag.
大多数人认为AI辅助编程只是辅助工具,主要用于代码补全或简单任务,但作者认为AI已经成为主要代码生产者,因为内部版本已经完成了产品团队65%的代码生成。这挑战了人们对AI在软件开发中角色的传统认知,表明AI已从辅助工具转变为核心生产力工具。
Qualcomm Dragonfly AI300 joins the previously announced Qualcomm Dragonfly AI200 and AI250 in its data center solutions portfolio with an annual cadence AI accelerator roadmap
大多数人认为AI加速器的产品周期通常是2-3年,因为芯片设计和验证需要大量时间,但Qualcomm采用每年更新一代AI加速器的策略,这种快速迭代速度与传统半导体行业的长周期模式形成鲜明对比,暗示AI硬件市场正在加速创新周期。
HBC is designed to enable efficient scaling of AI agents to meet the demands of continuous reasoning, memory bandwidth, and real-time responsiveness
大多数人认为AI推理主要是GPU的领域,而CPU主要处理通用计算任务,但Qualcomm提出其HBC技术专门为AI代理的连续推理、内存带宽和实时响应需求而设计,这一观点挑战了CPU和GPU在AI工作负载中的传统分工,暗示未来计算架构可能更加专业化而非通用化。
> 2x better performance per watt estimate compared to existing product benchmarks for server CPU competitive offerings based on specs
大多数人认为在服务器CPU市场,Intel和AMD已经建立了难以逾越的性能和能效优势,但Qualcomm声称其新的Dragonfly C1000 CPU能提供现有产品基准两倍的每瓦性能,这一挑战直接针对数据中心CPU市场的主导者,暗示移动芯片巨头正在颠覆传统服务器市场格局。
AI300 with HBC Gen 2 is designed to enable another stepwise improvement with a 54x increase over AI200
大多数人认为AI芯片性能提升通常是渐进式的,每年大约20-30%的增长,但Qualcomm声称其AI300芯片相比前代AI200有54倍的内存带宽提升,这一指数级增长速度与行业常规认知相悖,暗示AI基础设施可能正在经历范式转变。
HBC is designed to enable a 6x increase in bandwidth per watt versus HBM compared to competing published product specifications normalized at card-level
大多数人认为高带宽内存(HBM)是AI加速器的最佳选择,但Qualcomm声称其新的高带宽计算(HBC)技术能在每瓦带宽上提供6倍的提升,这一性能优势挑战了当前数据中心AI加速器的行业共识,暗示传统HBM技术可能面临被颠覆的风险。
Hyundai still plans its robot army, and 2028 is close. The strike vote does not stop that. It does force a question the whole industry has dodged: when a robot can do the job, who gets to say yes?
大多数人认为工会罢工会阻止或延缓机器人技术的采用,但作者认为罢工实际上加速了一个关键问题的浮现:当机器人能够胜任工作时,谁有权决定是否使用它们。这表明罢工不是简单的对抗,而是推动整个行业重新思考自动化决策机制的过程。
The union has drawn a hard line. 'Not a single humanoid robot will be allowed on the production lines without a labour-management agreement,' it said. It wants a veto, not a briefing.
大多数人认为工会会抵制机器人技术以保护现有工作岗位,但作者提出了一个更激进的解读:工会实际上是在寻求对自动化决策的否决权,而不仅仅是被动抵抗。这表明工人正在主动争取对工厂未来的控制权,而不仅仅是保护现状。
The union wants guarantees on jobs and working conditions as Hyundai adds AI and robots. That issue never appeared in past wage rounds.
大多数人认为工会主要关注工资和工作条件等传统议题,但作者认为工会已经将机器人引入作为核心谈判点,因为机器人威胁到了工人的根本就业安全。这表明工会已经从被动接受技术转变为主动要求对自动化决策的控制权。
Our customers are recognizing that supply shortages in memory and storage will take considerable time to improve, even as we expect industry supply to improve gradually in 2028.
大多数人认为供应链问题通常是短期现象,会随着产能扩张而迅速解决。然而美光CEO暗示内存短缺将持续到2028年,这种长期短缺预期挑战了人们对科技行业供应链弹性的传统认知,表明AI驱动的需求增长可能已经改变了行业基本动态。
Memory prices have skyrocketed in the last couple years as AI chips eat up all the production capacity of the small crop of vendors.
大多数人认为技术进步通常会导致价格下降,但内存市场的现状完全相反。AI需求导致内存价格飙升,打破了传统科技产品价格随时间下降的规律,这表明在特定技术变革时期,稀缺性可以完全改变市场动态。
When completed, we expect approximately half or more of our company revenue to be under these strategic customer agreements
大多数人认为科技公司倾向于保持灵活的供应链和短期合同,以便快速适应市场变化。然而美光正转向长期绑定客户,将一半以上的收入锁定在3-5年的协议中,这表明在内存短缺环境下,供应商反而获得了前所未有的市场主导权。
The goal is to move beyond using models to find more vulnerabilities, towards a world of safer software and cyber resilience.
大多数人认为AI在安全领域的主要价值是提高漏洞发现的数量和速度,这是行业共识。但作者明确表示,他们已经超越了这一阶段,现在更关注的是提高软件的安全性和网络弹性,这反映了安全思维的根本转变。
Frontier defensive capabilities should not be concentrated in the hands of a few. Software touches all aspects of life, from critical infrastructure to business applications and government networks.
大多数人认为尖端防御能力应该首先集中在大型组织或政府机构手中,因为这些资源最丰富。但作者主张最先进的防御能力应该民主化,让所有组织都能获得,这一观点挑战了资源集中化的传统安全思维。
As AI makes it possible to find and patch more vulnerabilities faster, it also creates more work for maintainers, who need to sift through thousands of reports, many of which are low-quality false positives.
大多数人认为AI在安全领域的应用只会减轻维护者的工作负担,因为AI能自动处理更多任务。但作者指出,AI实际上给开源维护者创造了更多工作,因为他们需要处理大量低质量的误报,这一反直觉观点揭示了技术进步可能带来的意外负担。
Vulnerability reports, on their own, do not protect anyone. The value comes from validating the issue, understanding its impact, developing and testing a patch, coordinating disclosure, and helping teams deploy the fix.
大多数人认为发现并报告漏洞本身就提供了安全价值,但作者明确表示,单纯的漏洞报告并不能保护任何人。这一观点挑战了安全行业普遍重视漏洞数量而非修复质量的共识,强调了从发现到修复的完整流程才是真正有价值的部分。
The bottleneck historically has been finding vulnerabilities, but now defenders are overwhelmed with the number of vulnerabilities found. Instead, the bottleneck is now patching vulnerabilities.
大多数人认为网络安全的主要挑战是发现漏洞,因为传统上找到安全漏洞需要专业知识和时间。但作者认为,随着AI加速了漏洞发现过程,现在的主要瓶颈已经转变为修复漏洞,因为发现的漏洞数量已经远超防御者的处理能力。
Public reaction on the ClaudeAI subreddit appears to be split into roughly three camps. The majority see the story as an indictment of the government's cybersecurity, citing its inability to hire the required level of talent and its history of leaks. A second large group is skeptical of the claim, considering it sensationalist or even an Anthropic marketing stunt.
大多数人认为公众对AI威胁的反应要么是恐慌要么是怀疑,但作者揭示了更复杂的公众认知分化。这种非二元化的反应模式挑战了公众对AI安全议题的简单化认知,暗示社会对AI能力的评估正在形成多元但对立的观点。
The Financial Times reported earlier in June that roughly six Anthropic engineers are embedded directly inside the agency as forward-deployed staff, adapting and customizing Mythos for specific operational applications, with sources indicating the work could extend to infiltrating networks operated by countries including China and Iran.
大多数人认为政府限制AI模型是出于安全考虑,防止其落入敌对势力手中,但作者指出NSA实际上正在内部利用这些AI模型进行潜在的网络渗透活动。这种矛盾挑战了政府政策的一致性,暗示国家安全考量可能具有双重标准。
Anthropic contends that the cited breach was a narrow jailbreak, one that rival models, including OpenAI's GPT-5.5, also exhibit. According to the company, the flagged behavior amounted to asking the model to analyze a codebase and fix identified issues, which revealed a few minor, already known bugs, rather than a genuine autonomous offensive intrusion.
大多数人认为AI已经能够自主发现和利用未知漏洞进行高级攻击,但作者认为所谓的'突破'实际上只是对已知代码的常规分析,这挑战了公众对AI威胁严重性的认知。这种观点与普遍认为AI已具备自主攻击能力的看法相悖,暗示可能存在夸大其词的情况。
The story sheds light on the June 12 U.S. government directive barring all foreign nationals, including Anthropic's own non-citizen employees, from accessing the Fable 5 and Mythos 5 models, citing national security concerns.
大多数人认为政府限制AI模型访问是出于对技术本身风险的担忧,但作者暗示这一禁令实际上是对AI模型已展示出惊人渗透能力的直接反应。这挑战了公众对政府限制AI的动机认知,暗示真正的威胁不是理论上的,而是已被证实的实际能力。
The competitive context surrounding this launch is unusually favorable for Alibaba, and it is worth understanding why. OpenAI's Sora... was discontinued... ByteDance's Seedance 2.0... indefinitely postponed the international launch
大多数人认为AI视频生成市场竞争激烈且参与者众多,但作者认为Alibaba实际上面临的是'竞争对手已退场'的独特局面。这挑战了'AI领域永远存在激烈竞争'的主流认知,表明市场有时会出现结构性真空,让原本处于劣势的玩家获得意外优势。
Alibaba's global push is unfolding under significant geopolitical headwinds that enterprise buyers cannot afford to ignore. The Pentagon added Alibaba, along with BYD and Baidu, to its list of Chinese military companies on June 8
大多数人认为地缘政治紧张会阻碍中国科技公司在西方市场的扩张,但作者认为尽管被五角大楼列为中国军事公司,Alibaba的AI视频模型仍能在全球排名中上升至第二位。这挑战了'地缘政治紧张必然导致技术孤立'的主流认知,表明技术实力和市场机遇有时能够超越政治障碍。
OpenAI's Sora web and app experiences were discontinued on April 26, with the Sora API set to follow on September 24. The shutdown came after the product proved financially untenable: Sora cost roughly $1 million per day to operate but generated only about $2.1 million in total revenue
大多数人认为顶级AI模型应该具有商业可行性,但作者认为即使是OpenAI这样的大公司,其旗舰视频生成产品Sora也因财务不可持续而失败,这表明AI领域的商业挑战比普遍认知更为严峻。AI技术实力并不直接转化为商业成功,这挑战了'技术领先必然带来市场成功'的主流认知。
The fact that these smart glasses truly looked like ordinary glasses you wouldn't be ashamed of wearing was a simple but inspired design choice.
大多数人认为智能眼镜的外观设计是技术限制下的妥协,但作者将其描述为'inspired design choice'(灵感设计选择),暗示这种看似普通的设计实际上是深思熟虑的战略决策,而非无奈之举。
A conspiracy theorist might wonder if removing the Ray-Ban branding is an attempt by EssilorLuxottica to distance itself from Meta. Not quite.
大多数人认为Meta去Ray-Ban品牌化是为了与Meta的隐私丑闻保持距离,但作者暗示这并非EssilorLuxottica的意图,因为眼镜上仍保留其名称。这挑战了公众对品牌合作关系的普遍认知。
Only the iPhone Air, iPhone 17 Pro, and the iPhone 17 Max will have all the fixings, like more varied voice options. As for the rest of the lineup: Every iPhone 16 and iPhone 17 model will be able to run the new Siri, while only the iPhone 15 Pro and Pro Max will be compatible.
大多数人认为苹果会通过软件更新让所有兼容设备都能获得完整的AI功能,但作者指出苹果将Siri AI的完整功能限制在特定高端机型上,这挑战了苹果过去通过软件更新让旧设备获得新功能的传统做法。这种策略暗示了AI功能可能与硬件限制紧密相关,而非纯粹的软件升级。
At WWDC 2026, Apple repeatedly referenced its privacy-preserving approach to Siri AI. As part of the company's Private Cloud Compute, Apple claims it doesn't store data from users and only pulls from it when you ask Siri a question.
大多数人认为大型科技公司提供的AI服务必然会收集和存储用户数据以改进产品,但作者指出苹果声称其Siri AI采用隐私保护设计,只在用户提问时才访问数据。这一声明挑战了当前AI行业普遍依赖数据收集的做法,暗示苹果可能找到了一种既能提供AI功能又能保护隐私的新模式。
The Trump administration has been happier talking to Anthropic lately, according to people familiar with the matter
大多数人认为特朗普政府与科技公司的关系一直处于紧张状态,尤其是在AI监管方面,但这里暗示政府与Anthropic的关系有所改善,这挑战了人们对特朗普政府与科技行业关系的刻板印象,表明即使在强硬的监管立场下,政府仍可能与某些科技公司建立工作关系。
At high-stakes meetings with the White House, Anthropic's cofounder—a "weirdo," per one official—has been replaced by cofounder Tom Brown.
大多数人认为政府官员会以专业和尊重的态度对待企业高管,但这里引用的'weirdo'描述表明政府官员私下对Amodei有负面看法,这种非正式的负面评价影响政府关系的方式与公众对官方外交的期望相悖,揭示了政治互动中非正式评价的影响力。
The Trump administration has been happier talking to Anthropic lately, according to people familiar with the matter: They don't have to deal with CEO Dario Amodei anymore
大多数人认为政府与企业高管之间的互动是基于正式的官方渠道和职位身份,但这篇文章暗示特朗普政府更愿意与Amodei的联合创始人Tom Brown而非Amodei本人进行谈判,这表明政府可能更看重个人关系而非官方职位,这在政治与科技行业的关系中是一个非传统的观点。
The AI arms race between China and the US has researchers on both sides worried about a "Chernobyl moment."
大多数人认为中美AI竞争是零和博弈,一方领先就意味着另一方落后。但作者认为中美AI专家实际上共同担忧AI失控风险,这暗示两国在AI安全领域存在潜在合作空间,而非纯粹对抗关系。这种观点挑战了地缘政治常规思维。
These departures are part of a concerning trend for Google. Last week, legendary AI researcher Noam Shazeer announced that he was leaving Google for OpenAI.
大多数人可能认为Google的AI人才流失是暂时现象或个别案例,但作者将其描述为'令人担忧的趋势'。这挑战了'科技巨头偶尔的人才流失是正常现象'的普遍认知,暗示Google可能面临更深层的人才战略问题。
Just days after Shazeer made his announcement, Google DeepMind director John Jumper said he was leaving Google for Anthropic. Alongside DeepMind CEO Demis Hassabis, Jumper won the 2024 Nobel Prize in Chemistry for his work on AlphaFold.
大多数人认为获得诺贝尔奖的科学家会留在资源充足的Google DeepMind继续其开创性工作。但作者指出John Jumper正离开加入Anthropic,这挑战了'顶级科学家优先选择最大平台'的假设,表明即使是最杰出的研究人员也可能被其他因素吸引。
Last week, legendary AI researcher Noam Shazeer announced that he was leaving Google for OpenAI. Shazeer had been at Google since 2000, save for the three years he spent building his controversial chatbot startup, Character.AI.
大多数人认为像Noam Shazeer这样的传奇AI研究员会长期留在Google,特别是考虑到他在公司长达23年的历史。然而作者指出他正离开加入OpenAI,这挑战了'忠诚度和长期服务会在大科技公司获得更高回报'的普遍认知。
Jonas Adler and Alexander Pritzel are leaving Google for Anthropic, according to Bloomberg. Per the report, Adler and Pritzel played key roles in the development of Google's Gemini model.
大多数人认为顶级AI人才会留在资源丰富的科技巨头如Google,但作者指出关键研究人员正离开Google转向竞争对手Anthropic。这挑战了'大公司才能吸引和留住顶尖人才'的共识,暗示即使拥有Gemini这样的先进项目,Google仍面临人才流失问题。
The cost of tokens has thrown into doubt the AI business model — as evidenced by what's being called the 'AI selloff' which has battered some AI-dependent businesses the last few days, especially memory chip makers.
大多数人认为AI技术将创造新的商业模式和巨大商业价值。但作者认为token成本已经动摇了AI商业模式的可行性,甚至导致AI相关企业股票下跌。这与市场对AI技术普遍乐观的看法形成鲜明对比。
Gemini already excels at function calling and using built-in tools like Search and Maps grounding. With built-in computer use capability, developers can now use 3.5 Flash to reliably build custom agents that can see, reason and take action across browser, mobile and desktop environments.
大多数人认为AI代理需要专门的模型和架构来处理跨平台任务,但作者认为将计算机使用功能集成到现有模型中就能实现这一目标。这挑战了构建复杂AI代理需要完全重新设计系统的观点,强调了现有模型扩展的可能性。
Previously only available as a standalone Gemini 2.5 computer use model, computer use is now integrated natively in the main Gemini Flash model.
大多数人认为高级AI功能应该作为独立模块提供以确保最佳性能和控制,但作者认为将计算机使用功能直接集成到主模型中反而能提供更好的性能。这挑战了模块化设计在AI开发中的主流做法。
Computer use is now a built-in tool supported in Gemini 3.5 Flash, delivering our best performance yet for agentic computer use tasks.
大多数人认为AI模型需要专门的计算机使用功能才能执行复杂任务,但作者认为这种功能现在可以作为内置工具集成到主模型中,因为3.5 Flash已经能够可靠地构建跨平台代理。这挑战了AI需要专门模块处理计算机交互的传统观念。
Everyone loves a bad boy, right? Everyone’s like, “It’s the most powerful model, even Trump says so. Of course, I’ve got to get my hands on it.”
大多数人可能认为Anthropic的困境会对其声誉造成负面影响,但作者提出了一种观点,即这种困境可能会增加人们对Anthropic模型的兴趣。
Cynically, it’s like: Okay, are you just pausing Anthropic so that others can catch up to where Anthropic was?
大多数人认为政府的行动是为了保护国家安全,但作者提出了一种讽刺的观点,即这可能只是为了让其他公司赶上Anthropic。
They’ve all signed an open letter to ask Trump to revoke the order, and they say it’s actually dangerous to have to pull these advanced cybersecurity capabilities from network defenders in the U.S.
大多数人认为政府对Anthropic的出口管制是为了国家安全,但作者指出,网络安全专家认为这是危险的,因为这将削弱美国的网络安全能力。
Anthropic has not had the best relationship with the Trump administration in a way that stands apart from the other leading AI labs
大多数人认为特朗普政府对所有AI实验室的态度是一致的,但作者指出Anthropic与特朗普政府的关系特别紧张,这与其他领先的AI实验室不同。
John Jumper, who shared a recent Nobel Prize in chemistry, announced Friday that he’s making the leap to Anthropic after “nearly 9 years” at Google DeepMind.
大多数人认为获得诺贝尔奖的科学家会留在知名机构,但John Jumper选择离开DeepMind加入竞争中的Anthropic,这可能表明他对新公司的创新方向和潜力有更深的信心。
This acquisition is a direct response to both of their problems, though it still does not guarantee success in such a competitive field.
大多数人认为SpaceX收购Cursor是一个明确的战略胜利,但作者实际上对这次收购持谨慎态度,指出这只是一个对双方问题的直接回应,而非保证成功。这种观点挑战了科技收购通常带来的积极预期,暗示即使是像SpaceX这样的巨头也可能在竞争激烈的AI领域面临失败风险。
This is a marriage between two companies that have arguably been falling behind in the AI race.
大多数人认为SpaceX和Cursor都是各自领域的领先者,但作者认为这两家公司实际上都在AI竞赛中落后了。SpaceX的Grok聊天机器人充满争议,缺乏有竞争力的编程模型;而Cursor虽然有优秀人才和产品,但在计算能力上无法与大型公司竞争。这种'失败者联姻'的叙事与主流科技公司收购叙事形成鲜明对比。
By handling the specific invalid behavior instead of rejecting the entire trajectory, this approach helps prevent the training instability and model collapse that can happen when rollouts are abruptly stopped.
大多数人认为在AI训练中发现不良行为时应立即终止整个训练轨迹,但作者认为应该处理特定无效行为而非拒绝整个轨迹。这一观点挑战了AI训练中的'一刀切'方法,表明更精细化的行为管理可以防止训练不稳定和模型崩溃,从而提高训练效率。
As a limited-time promotion through the end of September, off-peak usage is billed at 1×. (Peak hours are 14:00–18:00 UTC+8 (Beijing Time) daily).
大多数人认为AI模型定价应该基于模型大小或性能,而非使用时间,但作者认为基于时间段的差异化定价是合理的策略。这一观点挑战了AI服务定价的行业惯例,暗示通过时间差异化管理可以有效平衡计算资源使用并提高系统效率。
We find that GLM-5.2 shows more potential hacking behavior than GLM-5.1. This makes the verification signal easy to optimize, but fails to actually improve the fundamental capabilities of the model.
大多数人认为模型能力的提升总是伴随着更好的性能表现,但作者认为GLM-5.2虽然表现出更多的潜在黑客行为,但这实际上并未提升模型的基本能力。这一观点挑战了'更高的性能分数总是意味着更好的模型能力'的主流认知,暗示在AI训练中存在过度优化指标而忽视实际能力提升的问题。
On Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro.
大多数人认为开源模型与顶级闭源模型之间存在巨大差距,但作者认为GLM-5.2在终端基准测试中已经接近Claude Opus 4.8的性能,甚至超过了Gemini 3.1 Pro。这一观点挑战了AI领域'闭源模型遥遥领先'的行业共识,表明开源模型在特定编码任务上已经能够与顶级商业模型竞争。
GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.
大多数人认为开源模型在长距离任务能力上必然落后于闭源模型,但作者认为GLM-5.2作为开源模型已经实现了实际的长距离任务交付能力,甚至在某些基准测试中超过了GPT-5.5等闭源模型。这一观点挑战了AI领域'闭源模型必然优于开源模型'的主流认知,表明开源模型在特定任务上已经能够达到商业级别的性能。
As GLM-5.2 extends the maximum context length from 200K to 1M tokens, coding workloads are expected to shift substantially toward longer prompts. This shifts the primary inference bottleneck from computation to KV-cache capacity, long-context kernel overhead, and CPU-side overhead.
大多数人认为随着上下文长度的增加,计算复杂度会成为主要瓶颈,但作者认为实际瓶颈在于KV缓存容量和CPU开销。这一挑战了AI领域'计算复杂度是主要瓶颈'的共识,表明在长上下文场景中,内存管理和系统优化可能比算法优化更重要。
GLM-5.2 delivers substantially stronger agentic coding performance than GLM-5.1 at comparable token budgets, with its capability roughly positioned between Claude Opus 4.7 and Claude Opus 4.8 under similar token consumption.
大多数人认为模型性能提升主要来自参数量的增加或训练数据的扩大,但作者认为GLM-5.2通过引入'努力级别控制'机制,在相同token预算下实现了显著性能提升。这一观点挑战了AI领域'性能提升必然需要更多计算资源'的共识,表明优化推理过程可能比单纯扩大模型规模更有效。
GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.
大多数人认为开源模型在长上下文任务上会显著落后于闭源模型,但作者认为GLM-5.2不仅达到了实用水平,还在多个基准测试中超越了GPT-5.5等顶级闭源模型。这一挑战了AI领域'闭源必然优于开源'的共识,表明开源模型在特定任务上可以实现甚至超越闭源模型的性能。
could be reactionary, retaliatory, or both
作者暗示政府决策可能不是基于客观的技术评估,而是出于政治动机或报复。这与人们通常期望的政府基于证据和专业知识做决策的假设形成鲜明对比,暗示政治因素可能在技术监管中扮演了不成比例的角色。
the message is clear: The AI industry isn't immune from U.S. government interference
大多数人可能认为AI技术的前沿性质使其能够规避传统监管框架,但作者认为政府的禁令明确传递了一个信息:即使是尖端AI技术也不能摆脱政府干预。这与科技行业自认为能够自我监管的普遍认知相悖。
The AI industry isn't immune from U.S. government interference
虽然许多人认为AI行业相对独立于传统政府监管,但作者明确表示AI行业并非免疫于政府干预。这一观点挑战了科技行业自主性的主流叙事,暗示AI公司可能面临与传统行业类似的政府压力。
The Trump administration's decision that forced Anthropic to pull its latest cybersecurity models could be reactionary, retaliatory, or both
大多数人认为政府决策是基于技术评估和国家安全考量,但作者暗示这可能是一种报复性行为或反应性措施,而非基于技术本身的价值判断。这与人们对政府决策过程的常规理解相悖。
The US government's Anthropic models ban was never about an AI jailbreak
大多数人认为政府禁止Anthropic的AI模型是出于安全考虑,特别是担心AI越狱风险,但作者认为这并非真正原因。这是一个非共识观点,挑战了公众对政府监管AI的普遍理解。
Restoring global trust in American AI is another thing entirely. No matter how long the shutdown lasts, it shined a light on how fragile access to US frontier AI models is.
大多数人可能认为美国AI技术的优势地位是稳固的,但作者认为,这次事件暴露了美国AI访问权的脆弱性,可能永久性地损害了全球对美国AI技术的信任。这一观点挑战了美国AI技术主导地位的稳固性假设。
Most governments and businesses cannot come close to matching the scale and resources of frontier labs in the US or China. But sovereign AI does not always mean building the biggest or the most powerful tools.
主流观点认为AI主权意味着要在所有领域与美国和中国竞争,但作者认为,真正的AI主权不在于复制美国的规模,而在于发展符合本国战略需求的特定能力。这一观点挑战了AI发展必须追求规模和通用能力的共识。
But sovereign AI does not always mean building the biggest or the most powerful tools. France's Mistral and Canada's Cohere show that solid efforts can come from outside these countries, even if the models can't stand toe to toe.
大多数人认为只有拥有与美国和中国相当规模和资源的国家才能开发有竞争力的AI模型,但作者认为,较小国家可以通过专注于特定领域或本地化需求来建立有意义的AI主权,即使这些模型在通用能力上无法与美国最前沿的模型抗衡。
Anthropic apologized to customers for a 'disruption' that it said is the result of a 'misunderstanding'
大多数人认为政府指令和企业合规是严肃的法律问题,但作者暗示Anthropic认为这次关闭是政府与企业之间的'误解',这挑战了政府行动的合法性和必要性,暗示可能存在政治因素而非纯粹的安全考虑。
If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.
大多数人认为政府的安全审查是合理的预防措施,但作者认为这种标准如果普遍应用,实际上会停止整个行业的前沿模型部署,这暗示了政府安全标准可能过于严苛,阻碍了AI创新和技术进步。
We stand by this defense in depth strategy. It reduces the risks posed by Fable, making them comparable to the risks of existing models already deployed across the industry.
大多数人认为深度防御策略只是临时措施,不足以应对AI安全威胁,但作者认为这种策略已经将Fable的风险降低到与行业现有模型相当的水平,挑战了对AI安全需要完美解决方案的主流认知。
We have found that other publicly-available models are able to discover them as well without requiring a bypass.
大多数人认为Fable 5的漏洞是独特的严重问题,但作者认为其他公开可用的模型无需绕过就能发现这些漏洞,这挑战了Fable 5存在特殊安全风险的认知,暗示政府反应过度。
If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.
大多数人认为政府对AI模型的安全监管是必要的保护措施,但作者认为如果这种标准(因发现狭窄的潜在越狱就召回商业模型)在整个行业应用,将基本上停止所有前沿模型提供商的新模型部署。这是一个挑战AI监管共识的观点。
The potential jailbreaks that have been disclosed to us are either entirely benign responses or are minor findings that provide no Mythos-specific uplift.
大多数人认为政府发现的AI模型漏洞应该是严重的安全威胁,但作者认为被披露的潜在越狱要么是完全良性的响应,要么是次要发现,没有提供Mythos特有的提升。这挑战了政府对AI安全威胁严重性的主流认知。
We suspect that perfect jailbreak resistance is not currently possible for any model provider.
大多数人认为AI模型应该能够被设计成完全无法被'越狱'的,但作者认为完美越狱抵抗目前对任何模型提供商来说都是不可能实现的,因为所有行业使用的安全措施都容易受到非通用越狱的攻击。这是一个挑战AI安全领域常识的论点。
This action does not adhere to those principles.
大多数人认为政府有权出于国家安全考虑暂停AI模型访问,但作者认为政府的行动缺乏透明度、公平性和技术依据。这挑战了政府监管的权威性,暗示即使出于国家安全,政府行为也应受到严格约束。
We stand by this defense in depth strategy. It reduces the risks posed by Fable, making them comparable to the risks of existing models already deployed across the industry.
大多数人认为发现新模型的漏洞意味着其风险高于现有模型,但作者认为通过深度防御策略,Fable的风险与现有模型相当。这挑战了人们对新技术风险更高的普遍认知,暗示新模型不一定比旧模型更危险。
We suspect that perfect jailbreak resistance is not currently possible for any model provider.
大多数人认为AI公司应该追求完美的安全防护,但作者坦承完美防护是不可能的。这挑战了AI安全领域的期望,即公司应该能够完全防止其模型被滥用,转而采用更现实的防御策略。
If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.
大多数人认为政府应该严格监管AI模型以确保安全,但作者认为这种监管标准会阻碍整个行业的发展。这挑战了监管与安全平衡的常规认知,暗示过度监管可能扼杀创新。
We have found that other publicly-available models are able to discover them as well without requiring a bypass.
大多数人认为发现AI模型的漏洞是严重的安全问题,需要立即采取措施,但作者认为这些漏洞在其他公开模型中也存在,暗示政府的反应过度。这挑战了AI安全领域的共识,即任何漏洞都应被视为重大威胁。
apparent hallucinations
大多数人可能认为AI的'幻觉'主要是在创意生成或虚构内容中出现的问题。但作者使用'apparent'一词暗示,这些错误可能并非明显的虚构,而是以看似可信的方式出现,这挑战了人们对AI错误类型的认知,表明AI错误可能更加隐蔽且难以识别,即使在专业领域也是如此。
KPMG pulls report on AI usage due to apparent hallucinations
主流观点认为大型专业咨询公司如KPMG应该有严格的事实核查流程,能够确保发布报告的准确性。然而,这个标题暗示即使是顶级专业机构也可能被AI的'幻觉'误导,这挑战了人们对专业机构质量控制能力的信任,表明AI错误可能比我们想象的更普遍且更具欺骗性。
Once again, AI proves to be an unreliable source of information about AI.
大多数人认为随着AI技术的发展,它应该越来越可靠,尤其是在分析自身领域的数据时。但作者通过KPMG撤回报告的案例,提出了一个反直觉的观点:即使是专业的AI系统也可能在分析AI相关数据时产生严重错误,这暗示了AI自我评估的不可靠性,挑战了人们对AI技术自我完善能力的普遍认知。
Amazon CEO Andy Jassy may have been the source of security concerns that led Anthropic to cut off worldwide access to two models on Friday.
大多数人认为大型科技公司CEO通常推动技术开放和广泛访问,但这里暗示亚马逊CEO Jassy可能对Anthropic的AI模型提出了安全担忧,导致这些模型被限制访问。这挑战了科技领袖总是倡导技术开放的常规认知,表明即使是科技巨头的高管也可能采取保守立场。
Early wins lower the cost of the next raise. Cheaper capital funds bigger bets. Bigger bets produce bigger wins.
大多数人认为融资成本主要受市场环境和公司规模影响,但作者认为早期成功才是降低后续融资成本的关键因素。这挑战了传统融资观念,暗示创始人应该优先考虑小规模但可展示的成功,而非大规模扩张,这是一个非主流的融资策略观点。
Despite raising 25x more than the typical founder, Musk retained ownership in the top decile.
大多数人认为筹集更多资本必然导致创始人股权被大幅稀释,但作者认为马斯克是个例外,他筹集的资金远超普通创始人,却仍能保留前10%的股权。这挑战了传统认知中'融资越多,股权越少'的常识,展示了个人品牌和成功轨迹如何创造独特的资本优势。
Shouldn't AI be smart enough to know better itself? Sounds like marketing hype.
大多数人可能认为AI应该具备足够智能来避免被用于有害目的,但评论者质疑这种假设,暗示AI的自我限制能力被过度营销夸大,反映了公众对AI能力的期望与实际技术能力之间的差距,以及对AI行业营销策略的怀疑。
A less cynical take - Anthropic's policy for Claude Fable had unintended consequences. They tried a less invasive method of differentiating by reading intent of the user in the prompt - an unfortunate tradeoff that spoils AI research.
大多数人可能认为Anthropic的政策是故意设置障碍来阻止竞争,但评论者认为这可能是一个本意良好但执行不当的尝试,通过读取用户意图来区分不同用途,结果却无意中阻碍了AI研究,这暗示了企业安全措施与研究自由之间的复杂平衡。
The company changed course after the move received significant backlash from the AI research community.
大多数人认为企业政策变更主要是出于商业考量或监管压力,但Anthropic的这次政策反转主要是由研究社区的强烈反对驱动的,这表明在AI领域,学术和研究界的道德影响力可能比商业利益更能影响企业决策。
Anthropic is backtracking on a policy that would have covertly limited competitors from using its new AI model, Claude Fable 5, to develop other AI models.
大多数人认为AI公司应该鼓励开放创新和竞争,但Anthropic原本的政策实际上是在暗中限制竞争对手使用其技术发展其他AI模型,这与开源精神和AI行业的协作理念背道而驰,显示出企业利益与行业公共利益的冲突。
An agent breaks all of those assumptions. It reasons, it improvises, and it can be hijacked by a single sentence buried in a document it was asked to read.
大多数人认为AI安全可以基于传统网络安全框架来构建,但作者指出AI智能体从根本上打破了这些安全假设。这一观点挑战了网络安全领域的传统思维,表明需要全新的安全范式来应对AI智能体的推理能力、即兴创造性和对简单指令的脆弱性。
Shah thinks we have a few more months to go before agents are deployed throughout the economy in numbers that make potential risks a real concern.
大多数人认为AI智能体的广泛部署还需要数年时间,但作者认为只有几个月的时间窗口。这一时间框架的急剧缩短挑战了行业对AI技术采用速度的普遍预期,暗示技术变革的速度可能远超人们的想象,紧迫性被大大低估。
The main issue is that there just isn't really a field of research for multi-agent safety yet. And we would like there to be.
大多数人认为AI安全研究已经涵盖了多智能体系统,但作者认为这是一个全新的研究领域,表明当前AI安全研究存在明显空白。这挑战了人们对AI安全研究现状的认知,暗示了现有研究框架可能不足以应对即将到来的多智能体交互挑战。
Leitersdorf thinks the consistency issue might be partially solved in the model's next version, which will allow users to start generating worlds based on a video of an environment rather than an image.
大多数人认为AI世界模型应该从文本或简单图像生成复杂场景,但作者暗示未来发展方向是基于视频输入生成环境。这一观点挑战了当前AI生成的主流范式,暗示视频可能比静态图像更适合作为世界模型的基础输入,这违背了行业对文本作为主要输入的共识。
Pulled the trigger today & switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models. Saves us millions of $ & we're actually seeing an _increase_ in performance on many core use cases
与行业普遍认为闭源模型性能优于开源模型的认知相反,Lindy的案例显示切换到开源模型不仅节省大量成本,还提高了性能,这一发现挑战了闭源模型优越性的主流观念。
Open-source models have crossed the good enough threshold for most use cases
主流观点认为闭源模型在性能上始终优于开源模型,但作者认为开源模型已经达到'足够好'的水平,这一观点挑战了商业AI模型的价值主张,暗示开源可能成为企业级应用的主流选择。
Foundation labs are moving up the stack into applications
大多数人认为基础模型提供商和应用层公司应该是分离的生态系统,但作者认为基础实验室正在向上扩展进入应用层,这挑战了AI行业的传统分工模式,可能导致更直接的竞争和整合。
All of this might seem obvious — of course you shouldn't use more compute than necessary — but it runs counter to the scaling-first approach that has dominated the industry until now.
大多数人认为科技公司一直以来的做法是理所当然的,但作者指出'不应使用超过必要的计算能力'这一常识实际上与行业长期以来主导的'规模优先'方法相悖,这一观点挑战了AI行业发展的核心假设,暗示整个行业可能需要重新思考其发展路径。
Quality comes first, and in legal it always will... However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.
大多数人认为在专业领域如法律,必须使用最强大、最先进的AI模型才能保证质量,但作者引用Harvey公司创始人的观点认为,质量的定义正在转变——从使用最强大的模型转向使用能以最高效率获得正确答案的模型,这一观点挑战了行业对'质量即规模'的传统认知。
The longer and more complex the task, the larger Fable 5's lead over our other models. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
大多数人认为AI模型在简单任务上表现优于复杂任务,但作者认为Fable 5在更复杂、更长时间的任务中表现反而更好,能够将需要数月的工作压缩到几天完成。这挑战了人们对AI能力随任务复杂度增加而下降的普遍预期,暗示先进AI可能在复杂任务中展现出不成比例的能力提升。
Mythos 5 conducted novel genomics research in over a week of largely autonomous work. It assembled single-cell data for millions of cells spanning 138 animal species and designed and trained a custom machine learning model to identify cells performing the same role in even distantly related organisms.
大多数人认为AI仍需要人类专家的持续指导和监督才能完成复杂研究任务,但作者认为Mythos 5能够在大约一周内独立完成复杂的基因组学研究,包括数据收集、分析和模型设计。这挑战了人们对AI在科学研究中的辅助角色的传统认知,暗示AI可能已经具备独立进行前沿科学研究的能力。
Claude Fable 5 is the first to break 90% on our core analytics benchmark of complex, long-running analytical tasks — a 10-point jump over Opus. On the hardest questions, it shows strong judgment and attention to nuance.
大多数人认为AI模型在复杂推理任务上的性能提升应该是渐进式的,但作者认为Fable 5实现了质的飞跃,直接突破90%这一关键阈值。这挑战了人们对AI进步的线性预期,暗示可能存在能力阈值一旦突破就会带来显著性能提升的非线性发展模式。
In this task, various AI models were evaluated on their ability to predict how a genetic modification would impact the assembly of the virus's outer shell (among a set of therapeutically-relevant unpublished candidates developed by Dyno Therapeutics). We did not explicitly train our models to perform this task—and yet Mythos-class models outperformed sophisticated models dedicated to protein tasks (known as 'protein language models') using their biological reasoning alone.
大多数人认为AI模型需要专门训练才能完成特定领域的专业任务,但作者认为即使没有专门训练,Mythos-class模型也能在生物医学领域超越专业模型。这挑战了人们对AI专业化训练的普遍认知,暗示通用AI可能比专业模型在某些领域表现更好,因为它们能够进行更广泛的推理和模式识别。
good benchmarks become training pipelines
大多数人认为基准测试主要是用于评估模型性能的静态工具,但作者提出一个非共识观点:好的基准测试正在转变为训练流程的一部分。这一观点挑战了基准测试的传统角色,暗示评估和训练之间的界限正在变得模糊,形成反馈循环。
Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
大多数人认为AI生成的代码只要能通过测试就是高质量的,但作者认为这种观点存在严重缺陷,因为代码的可维护性才是关键。FrontierCode的创新之处在于它评估代码是否真正可合并,而不仅仅是单元测试通过,这挑战了行业对代码质量的主流评估标准。
good benchmarks become training pipelines
大多数人认为基准测试主要是用于评估模型性能的工具,但作者提出最佳基准测试实际上可以成为训练流程的一部分。这一观点转变了基准测试的定位,从静态测量工具变为动态改进系统的反馈循环。
Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
大多数人认为AI代码评估应该关注功能正确性,但作者认为我们应该评估代码是否真正可合并,这挑战了传统基准测试的共识。FrontierCode引入了'可合并性'这一新标准,关注代码质量而非仅通过测试,这是一个反直觉的转变。
A model that can fight its way through a confusing bioinformatics workflow may still be too expensive, too slow, too hard to audit, or too difficult to trust for routine scientific work.
大多数人认为随着AI能力的提升,它们将能够自行处理复杂的生物信息学工作流程,但作者认为即使AI能够处理这些复杂工作,也可能因为成本、速度、审计难度和信任问题而不适合常规科学工作。这一观点挑战了技术决定论,强调了基础设施设计的重要性。
In some cases, a missing or incorrect record could determine whether a diagnostic assay seems to cover circulating diversity, or whether an outbreak is inferred to have started weeks earlier or later than it did.
大多数人认为AI在生物数据中的错误只是准确性问题,但作者指出这些错误可能导致严重的实际后果,如误判疫情起始时间或诊断覆盖范围。这一观点强调了AI在科学数据处理中错误的潜在严重性,挑战了人们对AI错误影响的轻视态度。
The bottleneck for biological agents is not only reasoning but the absence of widespread deterministic execution layers for querying biological data.
大多数人认为AI在生物数据处理中的瓶颈主要是推理能力不足,但作者认为真正的瓶颈是缺乏确定性的数据查询执行层。这一观点挑战了人们对AI能力局限性的主流认知,表明问题不在于AI不够聪明,而在于数据基础设施设计不友好。
Before rolling out the enhancements and features, Apple was adamant about its privacy-centric approach to AI. 'We believe privacy in AI is non-negotiable,' Apple Senior Vice President Craig Federighi said during the stream
大多数人认为在AI竞赛中,苹果会像其他科技巨头一样,为了提升AI功能而牺牲部分隐私保护。然而,苹果却强调隐私是其AI策略的核心,这与行业普遍认为AI需要大量用户数据才能有效发展的共识相悖,表明苹果在AI领域坚持其隐私至上的价值观,即使这可能限制其AI功能的先进性。
Apple revealed that all devices from the iPhone 11 onward will be eligible for their upcoming software update. And that update comes with a flurry of performance improvements it's touting across a number of its OS releases this year
大多数人认为苹果会通过新系统更新淘汰较旧设备以刺激硬件销售,但苹果却选择支持5年前的iPhone 11,并承诺显著性能提升。这与苹果通常推动用户升级硬件的策略相悖,表明苹果在软件支持策略上更加用户友好,而非纯粹的商业驱动。
Apple said it collaborated with Google and the Gemini family of models to develop the next generation of Apple Foundation Models that power its integrated Apple Intelligence experiences.
大多数人认为苹果会坚持自主研发AI技术,避免与竞争对手合作,但苹果却选择与谷歌合作开发其AI体验,这挑战了科技巨头间竞争的常规认知。苹果将竞争对手的技术整合到其核心产品中,表明在AI领域,苹果愿意放下竞争姿态,寻求务实合作。
These tools were built for people with spare time. And guess what? Moms don't have any.'
大多数人认为AI工具设计为通用工具,可以适应各种用户需求,但这位专家指出AI实际上是为有闲暇时间的人设计的。这与我们对技术包容性的普遍认知相悖,暗示科技产品可能无意中排除了最需要帮助的群体。
Learning to use AI to make my life easier struck me as just another item to add to my already-ballooning to-do list, without addressing any of the underlying issues that make that list as long as it is to begin with.
大多数人认为AI会减轻女性的家务负担,但作者认为使用AI实际上只是给女性增加了另一项任务,而没有解决根本问题。这挑战了技术必然解放人类的乐观叙事,暗示技术可能只是强化而非改变现有的性别分工。
Unfortunately, mental load is still considered a female problem,' she says. 'A lot of men don't even know what mental load even is.'
大多数人认为随着性别平等进步,男性应该越来越了解并分担家庭中的精神负担,但这位妈妈fluencer指出男性甚至不知道什么是'精神负担'。这揭示了性别平等在家庭内部仍存在显著差距,挑战了我们对现代男性参与家务的乐观假设。
Women are less likely (more than 20 percent less likely, according to one 2025 study) to use generative AI in their everyday lives than men are, a discrepancy known as the 'AI gender gap.'
大多数人认为女性会更快接受新技术,特别是在家务管理方面,但数据显示女性使用AI的频率反而低于男性。这与我们对性别与技术采用关系的普遍认知相悖,暗示技术采用可能受到更深层次的性别角色影响。
Executives believe users will increasingly interact with a single AI assistant rather than a collection of separate applications.
大多数人认为未来会有多种专业化AI应用共存,但作者认为OpenAI正朝着单一AI助手的方向发展,这挑战了当前科技行业推崇的'应用生态系统'理念。这一观点与主流的产品开发趋势相悖。
When we have [artificial general intelligence], I don't think there will be a large number of distinct brands, said Alex Embiricos, OpenAI's head of enterprise product.
大多数人认为AI的发展会导致更多专业化品牌的出现,但作者认为AGI时代将回归单一实体模式,这与当前科技行业碎片化、专业化的发展趋势相悖。这一预测挑战了人们对未来AI产品生态的主流预期。
OpenAI executives increasingly view ChatGPT, which has attracted nearly 1 billion users since its launch, as a gateway to introduce users to higher-value products.
大多数人认为ChatGPT本身就是高价值产品,但作者认为OpenAI实际上将其视为'入门产品'或'引流工具',真正的价值在于其引导用户使用付费的编码工具和其他高利润服务。这颠覆了人们对ChatGPT商业价值的常规理解。
MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library and is optimised to run on microcontrollers and in constrained environments.
大多数人认为 MicroPython 仅适用于资源受限的微控制器环境,不适合复杂的沙盒实现。但作者认为 MicroPython 的精简特性和受限环境优化恰恰使其成为 WebAssembly 沙盒的理想选择,这一观点挑战了人们对 MicroPython 应用范围的普遍认知,展示了其在服务器端沙盒环境中的潜力。
The great thing about working with WebAssembly is that if the C turns out to be fatally flawed the worst that can happen is the WebAssembly execution will fail with an exception.
大多数系统程序员认为 C 代码中的错误可能导致严重的安全漏洞或系统崩溃。但作者认为在 WebAssembly 环境中,即使 C 代码存在致命缺陷,最坏情况也只是执行失败并抛出异常,这挑战了人们对 C 代码风险的传统认知,暗示 WebAssembly 提供了一种更安全的执行环境。
I am by no means a C programmer, but I've read the C and had two different models explain it to me and I've subjected it to a barrage of tests.
在软件开发领域,尤其是涉及系统编程时,普遍认为非 C 程序员不应该编写或修改 C 代码,因为这需要深厚的专业知识和经验。然而,作者作为一个非 C 程序员,却自信地编写并发布了包含 C 代码的 WebAssembly 沙盒实现,这挑战了关于专业领域分工的传统认知。
Pyodide offers an outstanding package for running Python using WebAssembly in the browser, but using Pyodide in server-side Python isn't supported.
大多数人认为 Pyodide 是在 WebAssembly 中运行 Python 的唯一或最佳选择,因为它在浏览器环境中表现出色。但作者明确指出 Pyodide 不支持服务器端 Python 使用,这挑战了人们对 Pyodide 适用范围的普遍认知,暗示需要寻找替代方案如 MicroPython 来实现服务器端的 WebAssembly Python 沙盒。
WebAssembly is a _much better_ candidate. It was designed from the start to support all of the characteristics I care about and has been tested in browsers for nearly a decade.
大多数人认为 JavaScript 引擎是沙盒环境的最佳选择,因为它们专门为执行不受信任的代码而设计。但作者认为 WebAssembly 是更好的选择,因为它从一开始就考虑了安全特性,并在浏览器环境中经过了近十年的测试。这与主流认知相悖,因为大多数开发者仍然倾向于使用 JavaScript 引擎来实现沙盒环境。
How do you even write these risks in, because they are evolving before our eyes, and day by day?
大多数人认为企业可以预测和量化商业风险,特别是在准备IPO文件时,但作者认为AI行业的风险变化速度如此之快,以至于无法在静态的文件中准确描述。这一观点挑战了传统风险评估和披露的做法,暗示了AI行业的特殊性和不可预测性。
This whole ecosystem is heavily, heavily subsidized by investor money. And so stuff that seems like it has no cost is, in fact, incredibly expensive.
大多数人认为AI服务的低成本或免费是因为技术进步带来的自然结果,但作者认为这种低成本实际上是投资者补贴的产物,本质上是极其昂贵的。这一观点挑战了人们对AI服务经济性的普遍认知,揭示了当前AI商业模式背后的真实成本结构。
In 2026, long-context efficiency is king as more and more LLMs get plugged into agent harnesses
大多数人认为长上下文处理只是模型能力的一个方面,但作者将其描述为'王',暗示它已成为整个LLM领域的主导因素。这一观点挑战了传统认知,表明长上下文处理能力已成为模型设计的核心驱动力,而非仅仅是一个技术特性。
120B-A12B may be a bit too large for local inference on regular consumer hardware
大多数人认为更大的模型参数量总是带来更好的性能,但作者暗示过度扩展模型规模可能不适合实际应用。这一务实观点挑战了'越大越好'的行业共识,强调了实际部署中的硬件限制。
Scaling Embeddings Outperforms Scaling Experts in Language Models
大多数人认为在MoE模型中增加专家数量是提升性能的最佳策略,但这篇论文提出扩展嵌入维度比扩展专家数量更有效。这一观点与主流MoE扩展思路相悖,暗示了模型设计的根本性转变。
hybrid architectures (for example, Nemotron 3, and Arcee Trinity), state space layers (Nemotron 3 and Mamba-3), MoE capacity allocation
大多数人认为LLM架构将继续遵循纯Transformer路径,但作者指出2026年的趋势是混合架构,结合Transformer与状态空间模型。这一反直觉观点挑战了行业共识,表明纯Transformer架构可能不是最优解,混合设计在长上下文处理上更高效。
We were taught that generalists and specialists will always have their roles. But now the market is shaping everyone into becoming a generalist.
大多数人认为专业化和专业化各有价值且会长期共存,但作者认为市场正在迫使所有人成为通才,这与'专业化和专业化将长期共存'的职业发展主流认知相悖。