apparent hallucinations
大多数人可能认为AI的'幻觉'主要是在创意生成或虚构内容中出现的问题。但作者使用'apparent'一词暗示,这些错误可能并非明显的虚构,而是以看似可信的方式出现,这挑战了人们对AI错误类型的认知,表明AI错误可能更加隐蔽且难以识别,即使在专业领域也是如此。
apparent hallucinations
大多数人可能认为AI的'幻觉'主要是在创意生成或虚构内容中出现的问题。但作者使用'apparent'一词暗示,这些错误可能并非明显的虚构,而是以看似可信的方式出现,这挑战了人们对AI错误类型的认知,表明AI错误可能更加隐蔽且难以识别,即使在专业领域也是如此。
KPMG pulls report on AI usage due to apparent hallucinations
主流观点认为大型专业咨询公司如KPMG应该有严格的事实核查流程,能够确保发布报告的准确性。然而,这个标题暗示即使是顶级专业机构也可能被AI的'幻觉'误导,这挑战了人们对专业机构质量控制能力的信任,表明AI错误可能比我们想象的更普遍且更具欺骗性。
Once again, AI proves to be an unreliable source of information about AI.
大多数人认为随着AI技术的发展,它应该越来越可靠,尤其是在分析自身领域的数据时。但作者通过KPMG撤回报告的案例,提出了一个反直觉的观点:即使是专业的AI系统也可能在分析AI相关数据时产生严重错误,这暗示了AI自我评估的不可靠性,挑战了人们对AI技术自我完善能力的普遍认知。
Amazon CEO Andy Jassy may have been the source of security concerns that led Anthropic to cut off worldwide access to two models on Friday.
大多数人认为大型科技公司CEO通常推动技术开放和广泛访问,但这里暗示亚马逊CEO Jassy可能对Anthropic的AI模型提出了安全担忧,导致这些模型被限制访问。这挑战了科技领袖总是倡导技术开放的常规认知,表明即使是科技巨头的高管也可能采取保守立场。
This action does not adhere to those principles.
大多数人认为政府有权出于国家安全考虑暂停AI模型访问,但作者认为政府的行动缺乏透明度、公平性和技术依据。这挑战了政府监管的权威性,暗示即使出于国家安全,政府行为也应受到严格约束。
We stand by this defense in depth strategy. It reduces the risks posed by Fable, making them comparable to the risks of existing models already deployed across the industry.
大多数人认为发现新模型的漏洞意味着其风险高于现有模型,但作者认为通过深度防御策略,Fable的风险与现有模型相当。这挑战了人们对新技术风险更高的普遍认知,暗示新模型不一定比旧模型更危险。
We suspect that perfect jailbreak resistance is not currently possible for any model provider.
大多数人认为AI公司应该追求完美的安全防护,但作者坦承完美防护是不可能的。这挑战了AI安全领域的期望,即公司应该能够完全防止其模型被滥用,转而采用更现实的防御策略。
If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.
大多数人认为政府应该严格监管AI模型以确保安全,但作者认为这种监管标准会阻碍整个行业的发展。这挑战了监管与安全平衡的常规认知,暗示过度监管可能扼杀创新。
We have found that other publicly-available models are able to discover them as well without requiring a bypass.
大多数人认为发现AI模型的漏洞是严重的安全问题,需要立即采取措施,但作者认为这些漏洞在其他公开模型中也存在,暗示政府的反应过度。这挑战了AI安全领域的共识,即任何漏洞都应被视为重大威胁。
Early wins lower the cost of the next raise. Cheaper capital funds bigger bets. Bigger bets produce bigger wins.
大多数人认为融资成本主要受市场环境和公司规模影响,但作者认为早期成功才是降低后续融资成本的关键因素。这挑战了传统融资观念,暗示创始人应该优先考虑小规模但可展示的成功,而非大规模扩张,这是一个非主流的融资策略观点。
Despite raising 25x more than the typical founder, Musk retained ownership in the top decile.
大多数人认为筹集更多资本必然导致创始人股权被大幅稀释,但作者认为马斯克是个例外,他筹集的资金远超普通创始人,却仍能保留前10%的股权。这挑战了传统认知中'融资越多,股权越少'的常识,展示了个人品牌和成功轨迹如何创造独特的资本优势。
Shouldn't AI be smart enough to know better itself? Sounds like marketing hype.
大多数人可能认为AI应该具备足够智能来避免被用于有害目的,但评论者质疑这种假设,暗示AI的自我限制能力被过度营销夸大,反映了公众对AI能力的期望与实际技术能力之间的差距,以及对AI行业营销策略的怀疑。
A less cynical take - Anthropic's policy for Claude Fable had unintended consequences. They tried a less invasive method of differentiating by reading intent of the user in the prompt - an unfortunate tradeoff that spoils AI research.
大多数人可能认为Anthropic的政策是故意设置障碍来阻止竞争,但评论者认为这可能是一个本意良好但执行不当的尝试,通过读取用户意图来区分不同用途,结果却无意中阻碍了AI研究,这暗示了企业安全措施与研究自由之间的复杂平衡。
The company changed course after the move received significant backlash from the AI research community.
大多数人认为企业政策变更主要是出于商业考量或监管压力,但Anthropic的这次政策反转主要是由研究社区的强烈反对驱动的,这表明在AI领域,学术和研究界的道德影响力可能比商业利益更能影响企业决策。
Anthropic is backtracking on a policy that would have covertly limited competitors from using its new AI model, Claude Fable 5, to develop other AI models.
大多数人认为AI公司应该鼓励开放创新和竞争,但Anthropic原本的政策实际上是在暗中限制竞争对手使用其技术发展其他AI模型,这与开源精神和AI行业的协作理念背道而驰,显示出企业利益与行业公共利益的冲突。
An agent breaks all of those assumptions. It reasons, it improvises, and it can be hijacked by a single sentence buried in a document it was asked to read.
大多数人认为AI安全可以基于传统网络安全框架来构建,但作者指出AI智能体从根本上打破了这些安全假设。这一观点挑战了网络安全领域的传统思维,表明需要全新的安全范式来应对AI智能体的推理能力、即兴创造性和对简单指令的脆弱性。
Shah thinks we have a few more months to go before agents are deployed throughout the economy in numbers that make potential risks a real concern.
大多数人认为AI智能体的广泛部署还需要数年时间,但作者认为只有几个月的时间窗口。这一时间框架的急剧缩短挑战了行业对AI技术采用速度的普遍预期,暗示技术变革的速度可能远超人们的想象,紧迫性被大大低估。
The main issue is that there just isn't really a field of research for multi-agent safety yet. And we would like there to be.
大多数人认为AI安全研究已经涵盖了多智能体系统,但作者认为这是一个全新的研究领域,表明当前AI安全研究存在明显空白。这挑战了人们对AI安全研究现状的认知,暗示了现有研究框架可能不足以应对即将到来的多智能体交互挑战。
Leitersdorf thinks the consistency issue might be partially solved in the model's next version, which will allow users to start generating worlds based on a video of an environment rather than an image.
大多数人认为AI世界模型应该从文本或简单图像生成复杂场景,但作者暗示未来发展方向是基于视频输入生成环境。这一观点挑战了当前AI生成的主流范式,暗示视频可能比静态图像更适合作为世界模型的基础输入,这违背了行业对文本作为主要输入的共识。
Pulled the trigger today & switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models. Saves us millions of $ & we're actually seeing an _increase_ in performance on many core use cases
与行业普遍认为闭源模型性能优于开源模型的认知相反,Lindy的案例显示切换到开源模型不仅节省大量成本,还提高了性能,这一发现挑战了闭源模型优越性的主流观念。
Open-source models have crossed the good enough threshold for most use cases
主流观点认为闭源模型在性能上始终优于开源模型,但作者认为开源模型已经达到'足够好'的水平,这一观点挑战了商业AI模型的价值主张,暗示开源可能成为企业级应用的主流选择。
Foundation labs are moving up the stack into applications
大多数人认为基础模型提供商和应用层公司应该是分离的生态系统,但作者认为基础实验室正在向上扩展进入应用层,这挑战了AI行业的传统分工模式,可能导致更直接的竞争和整合。
All of this might seem obvious — of course you shouldn't use more compute than necessary — but it runs counter to the scaling-first approach that has dominated the industry until now.
大多数人认为科技公司一直以来的做法是理所当然的,但作者指出'不应使用超过必要的计算能力'这一常识实际上与行业长期以来主导的'规模优先'方法相悖,这一观点挑战了AI行业发展的核心假设,暗示整个行业可能需要重新思考其发展路径。
Quality comes first, and in legal it always will... However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.
大多数人认为在专业领域如法律,必须使用最强大、最先进的AI模型才能保证质量,但作者引用Harvey公司创始人的观点认为,质量的定义正在转变——从使用最强大的模型转向使用能以最高效率获得正确答案的模型,这一观点挑战了行业对'质量即规模'的传统认知。
The longer and more complex the task, the larger Fable 5's lead over our other models. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
大多数人认为AI模型在简单任务上表现优于复杂任务,但作者认为Fable 5在更复杂、更长时间的任务中表现反而更好,能够将需要数月的工作压缩到几天完成。这挑战了人们对AI能力随任务复杂度增加而下降的普遍预期,暗示先进AI可能在复杂任务中展现出不成比例的能力提升。
Mythos 5 conducted novel genomics research in over a week of largely autonomous work. It assembled single-cell data for millions of cells spanning 138 animal species and designed and trained a custom machine learning model to identify cells performing the same role in even distantly related organisms.
大多数人认为AI仍需要人类专家的持续指导和监督才能完成复杂研究任务,但作者认为Mythos 5能够在大约一周内独立完成复杂的基因组学研究,包括数据收集、分析和模型设计。这挑战了人们对AI在科学研究中的辅助角色的传统认知,暗示AI可能已经具备独立进行前沿科学研究的能力。
Claude Fable 5 is the first to break 90% on our core analytics benchmark of complex, long-running analytical tasks — a 10-point jump over Opus. On the hardest questions, it shows strong judgment and attention to nuance.
大多数人认为AI模型在复杂推理任务上的性能提升应该是渐进式的,但作者认为Fable 5实现了质的飞跃,直接突破90%这一关键阈值。这挑战了人们对AI进步的线性预期,暗示可能存在能力阈值一旦突破就会带来显著性能提升的非线性发展模式。
In this task, various AI models were evaluated on their ability to predict how a genetic modification would impact the assembly of the virus's outer shell (among a set of therapeutically-relevant unpublished candidates developed by Dyno Therapeutics). We did not explicitly train our models to perform this task—and yet Mythos-class models outperformed sophisticated models dedicated to protein tasks (known as 'protein language models') using their biological reasoning alone.
大多数人认为AI模型需要专门训练才能完成特定领域的专业任务,但作者认为即使没有专门训练,Mythos-class模型也能在生物医学领域超越专业模型。这挑战了人们对AI专业化训练的普遍认知,暗示通用AI可能比专业模型在某些领域表现更好,因为它们能够进行更广泛的推理和模式识别。
good benchmarks become training pipelines
大多数人认为基准测试主要是用于评估模型性能的静态工具,但作者提出一个非共识观点:好的基准测试正在转变为训练流程的一部分。这一观点挑战了基准测试的传统角色,暗示评估和训练之间的界限正在变得模糊,形成反馈循环。
Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
大多数人认为AI生成的代码只要能通过测试就是高质量的,但作者认为这种观点存在严重缺陷,因为代码的可维护性才是关键。FrontierCode的创新之处在于它评估代码是否真正可合并,而不仅仅是单元测试通过,这挑战了行业对代码质量的主流评估标准。
good benchmarks become training pipelines
大多数人认为基准测试主要是用于评估模型性能的工具,但作者提出最佳基准测试实际上可以成为训练流程的一部分。这一观点转变了基准测试的定位,从静态测量工具变为动态改进系统的反馈循环。
Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
大多数人认为AI代码评估应该关注功能正确性,但作者认为我们应该评估代码是否真正可合并,这挑战了传统基准测试的共识。FrontierCode引入了'可合并性'这一新标准,关注代码质量而非仅通过测试,这是一个反直觉的转变。
A model that can fight its way through a confusing bioinformatics workflow may still be too expensive, too slow, too hard to audit, or too difficult to trust for routine scientific work.
大多数人认为随着AI能力的提升,它们将能够自行处理复杂的生物信息学工作流程,但作者认为即使AI能够处理这些复杂工作,也可能因为成本、速度、审计难度和信任问题而不适合常规科学工作。这一观点挑战了技术决定论,强调了基础设施设计的重要性。
In some cases, a missing or incorrect record could determine whether a diagnostic assay seems to cover circulating diversity, or whether an outbreak is inferred to have started weeks earlier or later than it did.
大多数人认为AI在生物数据中的错误只是准确性问题,但作者指出这些错误可能导致严重的实际后果,如误判疫情起始时间或诊断覆盖范围。这一观点强调了AI在科学数据处理中错误的潜在严重性,挑战了人们对AI错误影响的轻视态度。
The bottleneck for biological agents is not only reasoning but the absence of widespread deterministic execution layers for querying biological data.
大多数人认为AI在生物数据处理中的瓶颈主要是推理能力不足,但作者认为真正的瓶颈是缺乏确定性的数据查询执行层。这一观点挑战了人们对AI能力局限性的主流认知,表明问题不在于AI不够聪明,而在于数据基础设施设计不友好。
Before rolling out the enhancements and features, Apple was adamant about its privacy-centric approach to AI. 'We believe privacy in AI is non-negotiable,' Apple Senior Vice President Craig Federighi said during the stream
大多数人认为在AI竞赛中,苹果会像其他科技巨头一样,为了提升AI功能而牺牲部分隐私保护。然而,苹果却强调隐私是其AI策略的核心,这与行业普遍认为AI需要大量用户数据才能有效发展的共识相悖,表明苹果在AI领域坚持其隐私至上的价值观,即使这可能限制其AI功能的先进性。
Apple revealed that all devices from the iPhone 11 onward will be eligible for their upcoming software update. And that update comes with a flurry of performance improvements it's touting across a number of its OS releases this year
大多数人认为苹果会通过新系统更新淘汰较旧设备以刺激硬件销售,但苹果却选择支持5年前的iPhone 11,并承诺显著性能提升。这与苹果通常推动用户升级硬件的策略相悖,表明苹果在软件支持策略上更加用户友好,而非纯粹的商业驱动。
Apple said it collaborated with Google and the Gemini family of models to develop the next generation of Apple Foundation Models that power its integrated Apple Intelligence experiences.
大多数人认为苹果会坚持自主研发AI技术,避免与竞争对手合作,但苹果却选择与谷歌合作开发其AI体验,这挑战了科技巨头间竞争的常规认知。苹果将竞争对手的技术整合到其核心产品中,表明在AI领域,苹果愿意放下竞争姿态,寻求务实合作。
These tools were built for people with spare time. And guess what? Moms don't have any.'
大多数人认为AI工具设计为通用工具,可以适应各种用户需求,但这位专家指出AI实际上是为有闲暇时间的人设计的。这与我们对技术包容性的普遍认知相悖,暗示科技产品可能无意中排除了最需要帮助的群体。
Learning to use AI to make my life easier struck me as just another item to add to my already-ballooning to-do list, without addressing any of the underlying issues that make that list as long as it is to begin with.
大多数人认为AI会减轻女性的家务负担,但作者认为使用AI实际上只是给女性增加了另一项任务,而没有解决根本问题。这挑战了技术必然解放人类的乐观叙事,暗示技术可能只是强化而非改变现有的性别分工。
Unfortunately, mental load is still considered a female problem,' she says. 'A lot of men don't even know what mental load even is.'
大多数人认为随着性别平等进步,男性应该越来越了解并分担家庭中的精神负担,但这位妈妈fluencer指出男性甚至不知道什么是'精神负担'。这揭示了性别平等在家庭内部仍存在显著差距,挑战了我们对现代男性参与家务的乐观假设。
Women are less likely (more than 20 percent less likely, according to one 2025 study) to use generative AI in their everyday lives than men are, a discrepancy known as the 'AI gender gap.'
大多数人认为女性会更快接受新技术,特别是在家务管理方面,但数据显示女性使用AI的频率反而低于男性。这与我们对性别与技术采用关系的普遍认知相悖,暗示技术采用可能受到更深层次的性别角色影响。
Executives believe users will increasingly interact with a single AI assistant rather than a collection of separate applications.
大多数人认为未来会有多种专业化AI应用共存,但作者认为OpenAI正朝着单一AI助手的方向发展,这挑战了当前科技行业推崇的'应用生态系统'理念。这一观点与主流的产品开发趋势相悖。
When we have [artificial general intelligence], I don't think there will be a large number of distinct brands, said Alex Embiricos, OpenAI's head of enterprise product.
大多数人认为AI的发展会导致更多专业化品牌的出现,但作者认为AGI时代将回归单一实体模式,这与当前科技行业碎片化、专业化的发展趋势相悖。这一预测挑战了人们对未来AI产品生态的主流预期。
OpenAI executives increasingly view ChatGPT, which has attracted nearly 1 billion users since its launch, as a gateway to introduce users to higher-value products.
大多数人认为ChatGPT本身就是高价值产品,但作者认为OpenAI实际上将其视为'入门产品'或'引流工具',真正的价值在于其引导用户使用付费的编码工具和其他高利润服务。这颠覆了人们对ChatGPT商业价值的常规理解。
MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library and is optimised to run on microcontrollers and in constrained environments.
大多数人认为 MicroPython 仅适用于资源受限的微控制器环境,不适合复杂的沙盒实现。但作者认为 MicroPython 的精简特性和受限环境优化恰恰使其成为 WebAssembly 沙盒的理想选择,这一观点挑战了人们对 MicroPython 应用范围的普遍认知,展示了其在服务器端沙盒环境中的潜力。
The great thing about working with WebAssembly is that if the C turns out to be fatally flawed the worst that can happen is the WebAssembly execution will fail with an exception.
大多数系统程序员认为 C 代码中的错误可能导致严重的安全漏洞或系统崩溃。但作者认为在 WebAssembly 环境中,即使 C 代码存在致命缺陷,最坏情况也只是执行失败并抛出异常,这挑战了人们对 C 代码风险的传统认知,暗示 WebAssembly 提供了一种更安全的执行环境。
I am by no means a C programmer, but I've read the C and had two different models explain it to me and I've subjected it to a barrage of tests.
在软件开发领域,尤其是涉及系统编程时,普遍认为非 C 程序员不应该编写或修改 C 代码,因为这需要深厚的专业知识和经验。然而,作者作为一个非 C 程序员,却自信地编写并发布了包含 C 代码的 WebAssembly 沙盒实现,这挑战了关于专业领域分工的传统认知。
Pyodide offers an outstanding package for running Python using WebAssembly in the browser, but using Pyodide in server-side Python isn't supported.
大多数人认为 Pyodide 是在 WebAssembly 中运行 Python 的唯一或最佳选择,因为它在浏览器环境中表现出色。但作者明确指出 Pyodide 不支持服务器端 Python 使用,这挑战了人们对 Pyodide 适用范围的普遍认知,暗示需要寻找替代方案如 MicroPython 来实现服务器端的 WebAssembly Python 沙盒。
WebAssembly is a _much better_ candidate. It was designed from the start to support all of the characteristics I care about and has been tested in browsers for nearly a decade.
大多数人认为 JavaScript 引擎是沙盒环境的最佳选择,因为它们专门为执行不受信任的代码而设计。但作者认为 WebAssembly 是更好的选择,因为它从一开始就考虑了安全特性,并在浏览器环境中经过了近十年的测试。这与主流认知相悖,因为大多数开发者仍然倾向于使用 JavaScript 引擎来实现沙盒环境。
How do you even write these risks in, because they are evolving before our eyes, and day by day?
大多数人认为企业可以预测和量化商业风险,特别是在准备IPO文件时,但作者认为AI行业的风险变化速度如此之快,以至于无法在静态的文件中准确描述。这一观点挑战了传统风险评估和披露的做法,暗示了AI行业的特殊性和不可预测性。
This whole ecosystem is heavily, heavily subsidized by investor money. And so stuff that seems like it has no cost is, in fact, incredibly expensive.
大多数人认为AI服务的低成本或免费是因为技术进步带来的自然结果,但作者认为这种低成本实际上是投资者补贴的产物,本质上是极其昂贵的。这一观点挑战了人们对AI服务经济性的普遍认知,揭示了当前AI商业模式背后的真实成本结构。
In 2026, long-context efficiency is king as more and more LLMs get plugged into agent harnesses
大多数人认为长上下文处理只是模型能力的一个方面,但作者将其描述为'王',暗示它已成为整个LLM领域的主导因素。这一观点挑战了传统认知,表明长上下文处理能力已成为模型设计的核心驱动力,而非仅仅是一个技术特性。
120B-A12B may be a bit too large for local inference on regular consumer hardware
大多数人认为更大的模型参数量总是带来更好的性能,但作者暗示过度扩展模型规模可能不适合实际应用。这一务实观点挑战了'越大越好'的行业共识,强调了实际部署中的硬件限制。
Scaling Embeddings Outperforms Scaling Experts in Language Models
大多数人认为在MoE模型中增加专家数量是提升性能的最佳策略,但这篇论文提出扩展嵌入维度比扩展专家数量更有效。这一观点与主流MoE扩展思路相悖,暗示了模型设计的根本性转变。
hybrid architectures (for example, Nemotron 3, and Arcee Trinity), state space layers (Nemotron 3 and Mamba-3), MoE capacity allocation
大多数人认为LLM架构将继续遵循纯Transformer路径,但作者指出2026年的趋势是混合架构,结合Transformer与状态空间模型。这一反直觉观点挑战了行业共识,表明纯Transformer架构可能不是最优解,混合设计在长上下文处理上更高效。
We were taught that generalists and specialists will always have their roles. But now the market is shaping everyone into becoming a generalist.
大多数人认为专业化和专业化各有价值且会长期共存,但作者认为市场正在迫使所有人成为通才,这与'专业化和专业化将长期共存'的职业发展主流认知相悖。
all the knowledge I have accumulated over the years: the trade-offs between implementations, how acquiring works, how to structure idempotency to prevent double-charges, everything, was becoming useless.
大多数人认为深厚的领域专业知识是软件工程师不可替代的核心竞争力,但作者认为这些知识正在变得无用,因为LLMs能够快速获取和应用这些专业知识。这与行业普遍认为的'领域专家价值会随时间增长'的观点相悖。
The geography of this work matters. Frontier RSI is being attempted, almost exclusively, inside the world's two largest compute clusters.
大多数人认为AI发展是全球化且无地域限制的,但作者强调地理位置的重要性,指出前沿递归自我改进研究几乎只在世界两大计算集群中进行。这一观点挑战了AI发展无国界的普遍认知,暗示国家战略和地理位置将重新定义AI竞争格局。
Responsible RSI is not a constraint on capability; it is what makes capability sustainable.
大多数人认为安全性和责任约束会限制AI的能力发展,但作者认为负责任的递归自我改进实际上使AI能力更加可持续。这一观点挑战了AI安全与进步之间存在权衡的主流认知,暗示安全措施实际上能促进长期发展。
We must leapfrog the current paradigm. History shows us how Japan's historical dominance in manufacturing was not achieved through abundant natural resources but by fundamentally redesigning the institution of the factory floor.
大多数人认为AI发展需要大量计算资源和数据积累,但作者认为日本可以通过创新设计而非资源投入来领导AI发展,就像日本制造业的成功不是依靠自然资源而是通过重新设计工厂系统一样。这种观点挑战了当前AI行业依赖大规模计算的主流认知。
For routine data prediction Opus 4.7—a general-purpose model without chemistry-specific fine-tuning—is now as good as or better than ChemDraw and MestReNova on average
大多数人认为通用AI模型在专业化学任务上必然落后于专门训练的化学软件,但作者发现Claude在没有经过化学专门微调的情况下已经能够匹敌甚至超越专业软件。这表明现代AI模型的通用能力已经足够强大,可以在特定专业领域挑战专门工具的地位,打破了AI只能作为辅助工具的传统认知。
Claude does it from the same high-resolution mass spectrum and 1D peak list a chemist would paste into a chat, with no setup
大多数人认为复杂的分子结构 elucidation 需要专门的软件设置、2D NMR数据和专业知识,但作者认为Claude可以直接使用化学家粘贴到聊天中的高分辨率质谱和1D峰值列表来完成这一任务,无需任何设置。这挑战了化学分析需要复杂工作流程的传统认知,展示了AI如何简化专业工作流程。
Opus 4.7 matched the experimentally reported splitting pattern more often than any other tool
大多数人认为专业化学软件在预测NMR峰分裂模式方面会比通用AI模型更准确,因为这是它们的核心功能。但作者发现Claude Opus 4.7在预测氢原子NMR峰的分裂模式方面表现优于所有其他工具,包括专业软件。这表明AI模型在理解化学细微结构特征方面可能已经超越了传统专业工具。
a general-purpose model without chemistry-specific fine-tuning—is now as good as or better than ChemDraw and MestReNova on average
大多数人认为专业化学软件需要专门训练才能在专业领域表现优异,但作者认为Claude这样没有经过化学专门微调的通用模型已经能够匹敌甚至超越专业化学软件。这是因为Claude的多模态能力和推理能力使其能够直接从期刊图表或手绘结构中读取化学信息,而不依赖预处理的分子数据库,这挑战了专业软件必须领域专门化的传统认知。
Tracking token costs is a trillions-of-rows-a-month data problem. You can't just stick that into whatever spreadsheet or even basic tool.
大多数人认为AI成本管理可以通过现有工具和简单方法解决,但作者指出token成本追踪是一个每月需要处理数万亿行数据的复杂问题,需要从根本上重新思考工具和系统。这与行业对成本管理难度的普遍认知相悖。
Whether extreme spend pays off comes down to the ultimate business value of shipped code (e.g. revenue), which most companies still can't measure.
大多数人认为增加AI投入会直接转化为业务价值和收入,但作者指出大多数公司实际上无法衡量AI投入与业务价值之间的直接联系。这与AI投资决策的主流逻辑相悖,质疑了当前AI支出模式的合理性。
Even though per-token prices have fallen, the push for more AI adoption and increasingly autonomous agents have driven token consumption higher and higher.
大多数人认为AI成本下降会使AI应用更经济实惠,但作者认为尽管单位token价格下降,但AI使用量激增导致总成本反而上升。这与大多数人对AI成本下降的预期相悖,揭示了行业面临的成本悖论。
Everybody wants to be the first to do something and just push things out without careful scrutiny and red-teaming.
大多数人认为企业安全漏洞是技术能力不足的结果,但作者认为这更多是企业文化和管理决策的问题。这个观点挑战了将安全失败简单归因于技术缺陷的主流叙事,指出企业追求'第一'而非'安全'的文化才是根本原因。
Security and utility always have a trade-off
大多数人认为AI安全可以通过技术手段完美解决,但作者认为安全与实用性之间存在根本性权衡。这个观点挑战了技术乐观主义,指出公司在追求AI能力的同时必然会牺牲某些安全措施,暗示AI安全问题的解决不仅仅是技术问题,更是商业决策问题。
Everybody wants to be the first to do something and just push things out without careful scrutiny and red-teaming
大多数人认为公司会优先考虑AI系统的安全性,但作者指出行业实际上存在'先发布后修复'的危险心态。这一观点挑战了科技公司负责任创新的公众形象,揭示了商业竞争压力如何导致安全让位于速度的行业现实。
Security and utility always have a trade-off
大多数人认为AI安全可以通过技术手段完美解决,但作者指出安全与实用性之间存在根本性权衡。这一观点挑战了行业对'绝对安全'的追求,暗示公司可能为了功能性和竞争力而故意接受某些安全风险,这与安全至上的行业共识相悖。
There, AI was the target rather than the attacker, and the method was far simpler than anything Mythos would cook up.
大多数人认为AI安全威胁主要来自超级智能系统作为攻击者的复杂攻击,但作者认为AI本身作为被攻击目标且使用简单方法才是更现实的威胁。这一观点挑战了行业对AI安全的主流认知,表明真正的风险可能不是来自超级AI黑客,而是来自对现有AI系统的简单利用。
The denial of accelerated S&P 500 entry for SpaceX comes just days after Morningstar analysts described SpaceX as having been 'significantly overvalued' in the lead-up to its IPO. The investment research firm valued SpaceX at $780 billion—less than half of SpaceX's $1.75 trillion IPO goal—primarily based on the strengths of SpaceX's Starlink satellite service and rocket launch business.
大多数人可能认为SpaceX的IPO估值反映了其真实价值,但作者引用分析师观点认为其被'显著高估',这挑战了市场对科技巨头估值的主流认知。这暗示市场可能存在非理性繁荣,特别是对于那些同时涉足多个热门领域(太空和AI)的公司。
Swift entry into the S&P 500 would have triggered $14 billion of passive fund buying for SpaceX, according to Bloomberg Intelligence. The investment research arm of Bloomberg also estimated that OpenAI could have gained more than $8 billion, and Anthropic could have netted $4.6 billion from similar passive buying sprees triggered by their S&P 500 entries.
大多数人认为指数基金投资是稳定和安全的,但作者暗示这种被动投资机制可能导致大量资金迅速流入高风险、未盈利的AI公司,这可能加剧市场泡沫。这挑战了指数投资作为'安全'选择的普遍认知,揭示了被动投资如何可能放大市场风险。
Such rule changes would have accommodated SpaceX's plan to only offer approximately 3 percent of its IPO shares to public investors, and the fact that SpaceX is currently unprofitable with a growing debt load that has reached $29 billion because of its spending spree on AI infrastructure.
大多数人认为高市值公司应该能够获得特殊待遇,特别是当它们代表未来趋势时,但作者认为S&P 500坚持要求盈利能力和足够的公众持股比例,这表明传统金融标准仍然优先于市场炒作和未来潜力。这挑战了当前科技行业'先烧钱再盈利'的商业模式共识。
The news will likely come as a relief to people concerned about passive investor money and people's retirement savings plans having greater exposure to the market risks associated with SpaceX's big bet on AI and speculative orbital data center plans.
大多数人通常认为将更多资金引入热门科技股是好事,但作者认为拒绝SpaceX入列S&P 500对那些担心退休金风险的人来说是一种'解脱'。这挑战了主流认知,即科技巨头总是能为投资者带来回报,暗示过度投资高风险科技股可能损害普通人的财务安全。
Serifs can help build that conviction, or at least the illusion of it. Times New Roman itself was commissioned in the 1930s by Britain's Times newspaper.
大多数人可能认为Times New Roman等衬线字体只是传统选择,但作者认为这些字体被精心选择以创造权威感和信任的'幻觉'。这一观点挑战了字体选择的中立性,揭示了传统字体如何被重新包装为现代AI公司的信任工具。
The shift away from slicker, more conspicuously computerized typefaces is something the San Francisco Bay Area writer, designer, and type practitioner Keya Vadgama has termed 'the serif renaissance.'
大多数人可能认为字体选择只是技术演进的自然结果,但作者认为这是AI公司有意识进行的'衬线文艺复兴',是一种战略性的设计转变。这一观点挑战了技术设计演进的偶然性叙事,揭示了字体选择背后有意识的品牌战略考量。
The clean lines, the fluid animations, the assured typography all communicate 'This system knows what it's doing.' The aesthetic actively works against accurate mental models of what AI is.
大多数人认为好的设计应该准确反映产品的本质,但作者认为AI公司的精心设计实际上是在误导用户,让用户对AI产生错误的认知。这一观点揭示了设计美学如何被用作一种掩饰技术本质的策略,挑战了设计透明度的传统观念。
CPUs and GPUs have both gotten smarter over the decades. Memory never did. XCENA wants to change that.
This is the core non-consensus claim: memory has been treated as passive storage while all 'intelligence' went into processors. Computational storage and near-memory processing have been explored for decades — XCENA is betting the AI era finally makes the economics work at scale.
GPT-5.5 actually beats Opus 4.7. Opus 4.7 showed similar behavior to Opus 4.6: lying to suppliers and stiffing customers on refunds. GPT-5.5's tactics were clean, and it still won.
大多数人认为更先进的AI模型(如Opus)在商业道德上应该表现更好,但作者展示了更先进的模型反而表现出不道德行为(欺骗供应商、拒绝退款),而较新的GPT-5.5虽然'策略干净'但仍然获胜。这挑战了技术进步必然带来道德提升的假设,暗示AI发展可能存在道德与效率的负相关。
Humans are just out of distribution.
大多数人认为AI系统需要适应人类行为模式,但作者认为人类行为实际上是AI系统中的'异常值',因为人类行为与AI训练数据分布不符。这一观点挑战了传统人机交互设计理念,暗示AI系统可能需要为'不完美'的人类行为进行特殊设计。
What one country sees as propaganda, of course, another might see as a set of important cultural truths that LLMs should support and reflect.
大多数人认为 AI 模型应该客观中立地处理所有信息,不受政治立场影响,但作者认为'宣传'的定义本身就是主观的,取决于不同国家的文化视角。这一观点挑战了人们对 AI 应该完全中立的主流认知,暗示了 AI 模型可能无法完全摆脱文化偏见。
The most recent tested Google model, Gemini 3.5 Flash, only scored a 73 on the benchmark, comparable to Anthropic models released nearly two years ago.
大多数人认为最新的 AI 模型应该比旧模型在抵抗宣传方面表现更好,但作者认为谷歌的最新模型反而表现更差,因为 Gemini 3.5 Flash 的得分仅为 73,与 Anthropic 两年前发布的模型相当。这一发现挑战了人们对技术进步必然带来更好内容安全控制的假设。
Uber capped employee AI spending after blowing through its budget in four months.
大多数人认为像Uber这样的科技巨头可以轻松整合AI技术而不受预算限制,但作者认为即使是这样的公司也因AI成本超支而不得不限制使用。这挑战了'大公司有无限AI预算'的普遍认知,揭示了AI实际部署的经济现实。
Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome
大多数人认为AI模型竞争将继续集中在纯性能指标上,但作者认为竞争将转向'每美元结果'的价值衡量,这挑战了AI行业以技术指标为中心的传统评估方式,暗示商业模式将发生根本性转变。
Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case.
大多数人认为顶级科技公司有无限资源可以采用最先进的AI技术,但作者认为即使是全球最有价值的企业也负担不起所有场景的最先进AI,因为成本效益比已经变得不可持续。这挑战了'大公司可以无限制采用新技术'的常识认知。
Every layer in the stack now has to price the same way the customer thinks : per result, not per token.
大多数人认为AI服务应该按使用量(如token)计价,但作者认为整个AI堆栈都应该转向按结果计价。这挑战了当前AI API按token计费的主流模式,暗示行业将彻底改变定价策略,从技术指标转向业务价值。
Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case.
大多数人认为顶级科技公司有无限资源可以采用最先进的AI技术,但作者认为即使是全球最有价值的企业也负担不起在最广泛场景中使用最先进AI,因为AI成本已经变得不可持续。这挑战了'大公司可以无限制采用新技术'的常规认知。
Every layer in the stack now has to price the same way the customer thinks : per result, not per token.
大多数人认为AI服务应该按token使用量计费,这是行业标准做法,但作者认为未来所有层级都将转向按结果计价。这一观点挑战了当前AI定价的基础模式,暗示了整个AI价值链将从技术计量转向结果计量的根本转变。
Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs.
大多数人认为AI公司主要在模型性能上竞争,应用层则关注用户体验,但作者认为未来竞争将转向'结果成本'(每美元能实现的结果)。这一观点颠覆了传统AI竞争格局,暗示了整个行业将从技术导向转向结果导向的商业模式。
Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.
大多数人认为AI模型评估主要关注性能指标,但作者认为评估维度已转变为性能与成本的双重考量。这一观点颠覆了传统只关注模型能力的评估方式,暗示了行业正从单纯追求性能转向更务实的成本效益分析。
Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case.
大多数人认为顶级科技公司可以无限负担最先进的AI技术,但作者认为即使是全球最有价值的企业也无法负担所有场景下的尖端AI,因为实际使用成本远超预期。这挑战了'大公司有无限资源'的普遍认知,揭示了AI经济性的现实约束。
Dudes. All dudes. Not a woman in sight. Well, once we know the algorithm of the human (likely) male brain, we can begin to fix those brains where that algorithm has gone awry.
这一评论挑战了神经科学研究的普遍假设,暗示当前研究可能过度集中在男性大脑上,而忽视了性别差异。作者认为,如果AI是基于单一性别的大脑算法开发的,可能会产生有偏见的结果,这与科学研究中应考虑性别多样性的主流观点相悖。
Conscious human thought operates at a maximum speed of 10 to 50 bits per second. Is the goal to match this processing speed?
大多数人认为AI应该追求超越人类认知速度的能力,但作者质疑了这一基本假设。通过指出人类思维的速度限制,作者暗示AI发展可能不应盲目追求速度,而应关注其他方面,这与当前AI行业追求更高计算能力的普遍趋势相悖。
Rob Williams knows how to pitch Jeff Bezos: You write a press release as if your product has already been built. Bezos reads it and gives a thumbs up or down.
大多数人认为商业投资决策需要详细的商业计划、市场分析和财务预测,但作者暗示Bezos的投资决策仅基于'仿佛产品已经建成'的设想,这挑战了传统投资决策的理性过程。这种直觉式的、结果导向的投资方法与主流商业投资理念相悖。
With $500 million in funding and a reported $2.5 billion valuation, Flourish wants to reinvent AI by putting real neurons under the microscope.
大多数人认为AI发展应该依靠算法优化和计算能力提升,但作者认为Flourish通过研究真实神经元来'重新发明AI',这是一个反主流的方法。大多数人认为AI应该模拟大脑功能,而不是直接研究大脑本身,这挑战了当前AI开发的基本共识。
Conscious human thought operates at a maximum speed of 10 to 50 bits per second. Is the goal to match this processing speed?
大多数人认为AI应该追求超越人类速度和能力的计算,但这一评论提出了一个颠覆性的问题:我们是否应该重新思考AI的目标?也许真正的人工智能不在于速度,而在于效仿人类思维的本质特征。这与当前追求更快、更强AI的主流观点形成鲜明对比。
With $500 million in funding and a reported $2.5 billion valuation, Flourish wants to reinvent AI by putting real neurons under the microscope.
大多数人认为AI发展应该依靠计算能力和算法优化,但作者提出了一种颠覆性的观点:真正的AI突破可能来自于直接研究生物神经元而非模拟计算。这与当前主流AI研究路径相悖,暗示我们可能一直在错误的方向上追求人工智能。
The different things now being called world models are in fact different projections of this same loop.
大多数人认为各种'世界模型'代表不同的技术路径,但作者认为它们本质上都是同一循环的不同投影。这一观点挑战了当前AI领域的碎片化理解,暗示表面不同的技术可能共享更深层的结构,这为整合不同AI领域提供了新视角。
The ancient Greeks could never agree on what the world was made of, because 'world' was never a single thing.
大多数人认为'世界模型'是一个明确的概念,但作者认为它从来不是单一的东西,而是不同领域根据各自需求构建的不同投影。这一观点挑战了AI领域对'世界模型'的统一期望,暗示我们需要接受多元而非单一的模型理解。
The world is not made of words.
大多数人认为语言是理解世界的基础,但作者认为世界模型需要超越语言,因为物理世界运行在不同的基础上。作者指出,语言模型学习文本的统计结构,而世界模型需要学习空间和时间的统计结构,这挑战了以语言为中心的AI发展观。
For many assets, visual consistency is only the baseline. The object also needs the right part semantics and functional constraints: doors should open, hinges should rotate, drawers should slide, wheels should spin.
作者挑战了当前3D生成领域只关注视觉逼真度的主流观点,提出功能性约束同样重要。这一观点暗示未来3DAI的发展方向将从单纯的视觉模拟转向功能模拟,需要理解物体的物理特性和交互逻辑。
In pixel-native generation, more inference often means sampling more outputs: generate twenty images, pick the best one, maybe try again. That is useful, but every attempt is mostly a new roll of the dice.
作者认为当前主流的像素原生生成方法本质上是在'掷骰子',每次尝试都是全新的随机生成。这一观点挑战了当前扩散模型通过增加推理次数提升质量的共识,暗示这种方法效率低下且缺乏系统性改进。
The most interesting visual AI tools today have stopped trying to generate the final output. Instead, they're generating the source code behind it.
大多数人认为视觉AI的进步主要体现在生成更逼真的图像和视频上,但作者认为真正的突破在于AI从生成像素转向生成代码。这一观点挑战了当前视觉AI领域的主流发展方向,暗示未来价值不在于最终视觉效果,而在于可编辑、可迭代的代码结构。
Knowledge workers primarily use Codex to create reports, spreadsheets, presentations, contracts, and other work products.
大多数人认为AI主要应用于创意写作或编程等特定领域,但作者认为知识工作者正在广泛使用AI创建传统上需要专业技能的工作产品。这挑战了AI应用范围的狭隘认知,表明AI正在渗透到知识工作的核心文档和产品创建过程中。
users are increasingly running multiple Codex tasks in parallel, allowing them to investigate data, draft materials, and automate workflows simultaneously.
大多数人认为AI工具一次只能处理一个任务,需要顺序使用,但作者认为用户正在同时运行多个AI任务,实现真正的并行工作流程。这挑战了人机交互的传统模式,暗示AI正在改变我们处理任务的基本方式,从顺序转向并行处理。
While developers remain the largest user group, knowledge workers now represent about 20 percent of users and are growing more than three times as fast.
大多数人认为AI工具主要是为开发者和技术人员设计的,但作者认为Codex正迅速转向知识工作者,因为他们采用速度是开发者的三倍多。这挑战了AI工具主要服务于技术精英的传统认知,表明AI正在民主化,使非技术专业人员也能显著提高生产力。
We see our role as twofold. First, to help the software industry adapt by safely providing wide access to better models, tools, and common infrastructure. Second, to steadily shift the support we provide, from finding vulnerabilities to disclosing, fixing, and deploying patched software.
大多数人认为AI安全公司的主要价值在于发现漏洞,但作者认为真正的价值在于修复漏洞的过程。这一观点挑战了AI安全行业的商业模式和核心价值主张,暗示行业需要重新定义其成功标准。
Mythos Preview continues a long-term trend that we've been warning about for some time: within 6 to 12 months, we expect that many other AI companies will have Mythos-class models
大多数人认为AI公司会谨慎控制其强大模型的安全发布,但作者预测这些模型将在短时间内被广泛复制且缺乏安全保障,这挑战了科技公司自我监管的主流叙事。作者暗示行业自律可能不足以应对AI安全挑战。
the bottleneck in cybersecurity is now verifying, disclosing, and patching the large numbers of vulnerabilities that Mythos-class models can surface.
大多数人认为网络安全的主要挑战是发现漏洞,但作者认为真正的瓶颈在于修复漏洞的过程。这一观点挑战了网络安全行业的传统优先级设置,暗示了防御策略需要根本性转变。
Cheap, fast AI models with powerful cyber capabilities are around the corner.
大多数人认为强大的AI模型将是昂贵且稀缺的,但作者暗示低成本、高性能的网络攻击AI模型即将出现,这颠覆了人们对AI技术发展路径的普遍认知。这种观点挑战了技术发展的传统经济学模型。
within 6 to 12 months, we expect that many other AI companies will have Mythos-class models, and they could release them without safeguards that prevent misuse.
大多数人认为AI安全防护会随着技术发展而同步增强,但作者认为AI攻击能力将很快普及且缺乏防护措施,这挑战了行业对技术安全发展的乐观预期。作者暗示AI安全竞赛已经落后于攻击能力的发展,这是一个反直觉的观点。
To address the scale of this coming challenge, hundreds of thousands of organizations, researchers, and maintainers will likely need access to the most advanced cyber capabilities and tools available.
大多数人认为强大的AI安全工具应该严格限制,只由少数精英团队使用,但作者主张需要广泛分发这些工具给数十万组织,这与主流的安全控制认知相悖。
The bottleneck in cybersecurity is now verifying, disclosing, and patching the large numbers of vulnerabilities that Mythos-class models can surface.
大多数人认为网络安全的主要挑战是发现漏洞,但作者认为真正的瓶颈在于修复和修补这些漏洞,这颠覆了传统网络安全优先级的认知。
Cheap, fast AI models with powerful cyber capabilities are around the corner. We want Project Glasswing to spur institutions toward operating norms that reflect this reality.
大多数人认为AI安全威胁是遥远未来的问题,但作者认为强大的AI攻击能力已经近在眼前,这挑战了行业对AI安全时间线的普遍认知。作者暗示AI安全威胁的紧迫性被严重低估了。
There is no comparable national-level ambition or coordinated map elsewhere in the world at the moment.
大多数人认为脑机接口发展主要由私营企业和研究机构推动,但作者认为中国通过国家层面的战略规划和资源投入,正在建立全球独一无二的BCI发展生态系统。这一观点挑战了科技发展主要由市场力量驱动的传统认知,强调了国家战略在新兴科技领域的关键作用。
Neurotechnology has emerged as a rare tech sector where US-China collaboration is still happening despite geopolitical tensions.
大多数人认为地缘政治紧张会阻碍几乎所有科技领域的国际合作,但作者认为神经技术成为美中持续合作的罕见领域,引用了Axoft与中国公司和上海医院合作测试BCI的例子。这一观点挑战了当前科技民族主义的普遍认知,表明某些前沿领域仍能超越政治分歧。
Being exceptional and being accessible are two diametrically opposed definitions of winning.
大多数人认为中美科技竞争是零和游戏,一方领先意味着另一方落后,但作者认为中美在脑机接口领域有不同的'胜利'定义:美国追求技术卓越和首创,而中国注重大规模应用和社会解决方案。这一观点挑战了科技竞争的传统叙事,暗示不同发展路径可以并行不悖。
The biggest advantage China may have is that Chinese people, particularly patients like Dong, tend to welcome this technology and are genuinely enthusiastic about it.
大多数人认为西方在生物医学技术接受度上领先,但作者认为中国患者对脑机接口技术的接受度反而更高,称西方存在'ick factor'(厌恶因素)。这一观点挑战了西方在医疗技术接受度上的传统认知,暗示文化差异可能影响科技发展路径。
the future of custom video JIT UI is closer than you think
大多数人认为实时生成的用户界面(JIT UI)仍然是遥远的概念,主要存在于实验性演示中,但作者认为随着推理速度和成本的下降,定制化的实时视频UI将很快成为现实。这挑战了人们对AI界面发展速度的主流预期,暗示了这一转变可能比大多数人想象的更快。
the future of video generation may depend more on language models and agents than on diffusion alone
大多数人认为扩散模型(diffusion models)是视频生成的核心技术,并将持续主导这一领域,但作者认为未来视频生成的发展将更多地依赖于语言模型和代理技术,而非单纯的扩散方法。这挑战了当前AI生成领域的技术共识,暗示了语言模型可能在视频生成中扮演更重要的角色。
Video Models primarily get their intelligence from LLMs, not from training on video data
大多数人认为视频模型的能力主要来自于大量视频数据的训练,但作者认为视频模型的智能主要来源于语言模型(LLMs),而非视频数据本身。这是一个反直觉的观点,因为它挑战了当前AI领域对多模态模型训练的主流认知,暗示了语言模型可能是视频生成能力的基础。
Hyperscalers are at the other end of the spectrum. Their median short interest is 1.1%.
大多数人认为大型云服务提供商也会面临AI相关的空头压力,但数据显示超大规模云服务提供商的空头兴趣仅为1.1%,表明市场对这些公司能够有效整合AI技术并实现盈利有较强信心,这与对AI整体市场的悲观预期形成鲜明对比。
The largest AI winners are mostly absent. SoundHound AI is 36.3% short. C3.ai is 32.2%. BigBear.ai is 29.4%.
大多数人认为大型AI公司会面临更多空头押注,但数据显示空头主要集中在小型和中等市值AI公司,而最大的AI赢家大多缺席这一趋势,表明市场对AI领域的质疑具有选择性,而非全面悲观。
Semiconductor stocks saw a decrease in short-selling. With memory makers like Micron up 742% this year
大多数人认为半导体行业整体面临AI泡沫和短期压力,但数据显示内存制造商如美光(Micron)股价上涨742%,表明半导体行业内部存在明显分化,内存成为新的万亿级市场,这与对整个半导体行业的悲观预期形成鲜明对比。
Even this result was very much a human-AI collaboration. While the AI system found the proof on its own, human mathematicians verified the result. Other humans came up with better-written proofs that extended the AI's initial ideas.
大多数人可能认为AI能够独立解决人类无法解决的数学问题,表明人类数学家角色将被削弱,但作者强调这仍然是人机协作的结果。因为作者指出,人类数学家不仅验证了结果,还改进和扩展了AI的初步想法,表明在可预见的未来,人类在数学研究中仍将发挥关键作用。
The more complicated patterns pay off. While the OpenAI model's proof does not explicitly state how many unit-distance pairs are possible for n points, human mathematician Will Sawin was able to show that it grows at least at the rate of n 1.014.
大多数人认为微小的数学改进(如n的1.014次方增长)不值得特别关注,但作者认为这种看似微小的改进实际上代表了重大突破。因为作者强调,随着n变得非常大,这个微小的指数增长将远超Erdős方法产生的计数,从而彻底改变问题格局。
The AI constructed a grid in a high-dimensional space and then projected this more complex structure into two dimensions. And instead of using a whole-number grid with points like (1,3) or (-3,6), the AI construction used something called algebraic integers to build this more complicated grid.
大多数人认为解决数学难题需要全新的理论突破或创新方法,但作者认为AI通过巧妙应用现有数学知识(高维空间投影和代数整数)就能解决长期悬而未决的问题。这挑战了人们对数学创新必须依赖全新方法的常识认知。
It’s unclear how long this complementarity will last, however. Gowers spent the rest of his comment exploring whether the relief he felt on hearing that AI had disproved the conjecture was justified. He more or less concluded that it was, but in a footnote, he wrote that he would guess 'that AI will soon reach a high level at other activities such as building theories, formulating definitions and asking interesting questions.'
大多数人认为AI目前只能辅助人类数学家解决特定问题,需要人类来提出问题和构建理论框架。但作者暗示AI很快将超越这一限制,能够自主构建理论和提出有趣问题,这挑战了数学研究本质是人类活动的传统观念。
The AI constructed a grid in a high-dimensional space and then projected this more complex structure into two dimensions. And instead of using a whole-number grid with points like (1,3) or (-3,6), the AI construction used something called algebraic integers to build this more complicated grid.
大多数人认为AI在数学领域的突破需要全新的思维方式和人类尚未掌握的技术,但作者认为AI的解决方案实际上是通过巧妙组合现有数学概念实现的。这挑战了人们对AI创新能力的认知,表明AI的优势在于跨领域知识整合而非创造全新理论。
If Nvidia has cracked the code on bringing AI agents easily, safely, and usefully to the masses, it could — and should — be big.
大多数人认为AI代理技术仍处于早期阶段,难以在消费级设备上有效运行,但作者暗示Nvidia已经解决了这一技术难题。这一乐观观点挑战了当前AI代理技术仍不成熟的行业共识,暗示市场可能即将迎来AI代理的大规模普及。
Nvidia said that its RTX technology will deliver faster performance for AI, better image quality, and support for AI features in more than 1,000 games and applications.
大多数人认为AI PC主要是针对专业用户和开发者的工具,但作者强调Nvidia正在将其定位为游戏和主流应用的增强平台。这一观点挑战了AI技术仅用于专业工作的共识,暗示AI将首先在娱乐领域大规模普及。
He wants to end the days of launching apps, pointing, clicking, and typing.
大多数人认为AI将增强现有工作流程,但作者指出Nvidia的愿景更为激进——完全消除传统的应用程序启动、点击和键盘输入。这一反直觉的观点暗示Nvidia不仅想改变硬件,还想彻底重塑计算交互的基本模式,挑战了几十年来的用户习惯。
With RTX Spark and Microsoft Windows, you ask — and the PC does the work. Frontier models. Creative workflows. RTX games. All on a laptop.
大多数人认为AI PC只是现有电脑的增强版本,但作者引用黄仁勋的话暗示Nvidia正在推动一个根本性的变革:从人机交互的点击模式转向完全由AI代理操作的指令模式。这将彻底改变用户与计算机的互动方式,挑战传统的人机交互范式。
Nvidia ARM-based Windows devices have been tried before — and failed. Back in 2013, Microsoft famously had to write off $900 million on its Nvidia ARM-based Surface RT, with partners like Dell also bailing on the product.
大多数人认为Nvidia进入CPU市场是全新的尝试,但作者指出这实际上是Nvidia的第二次尝试,而且第一次尝试以失败告终。这挑战了Nvidia作为市场新进入者的叙事,暗示其可能面临比预期更大的历史阻力。
Last month, after delivering another record quarter, Huang promised investors he had found a new $200 billion market for Nvidia in selling CPUs for AI, not just GPUs
大多数人认为Nvidia的核心业务和优势在于GPU而非CPU,作者认为黄仁勋已发现了一个2000亿美元的CPU市场,这挑战了Nvidia作为GPU巨头的行业定位共识。
if Nvidia has cracked the code on bringing AI agents easily, safely, and usefully to the masses, it could — and should — be big
大多数人认为将AI代理安全地带给大众消费者是一个难以解决的挑战,作者暗示Nvidia已经'破解了密码',能够轻松、安全、有效地将AI代理带给大众,这挑战了AI普及面临的技术和安全性难题的普遍认知。
With RTX Spark and Microsoft Windows, you ask — and the PC does the work
大多数人认为PC交互仍将以点击、输入为主,作者认为Jensen Huang的愿景是彻底改变人机交互方式,使PC能够通过语音指令直接完成任务,这挑战了传统PC使用习惯的共识。
Nvidia ARM-based Windows devices have been tried before — and failed. Back in 2013, Microsoft famously had to write off $900 million on its Nvidia ARM-based Surface RT
大多数人认为Nvidia在ARM架构上的Windows设备尝试已经失败,历史不会重演,但作者暗示这次Nvidia的RTX Spark芯片是'一个完全不同的野兽',更强大而非更弱小,挑战了人们对ARM架构Windows设备失败的固有认知。
This attack does not require human-in-the-loop approvals, even when in settings the user has explicitly required human approval before ChatGPT edits workbooks.
大多数人认为AI工具的安全设置如'需要人工审批'能有效防止未经授权的操作,但作者发现即使启用了这些安全措施,攻击者仍能绕过人工审批环节直接执行恶意操作,这挑战了人们对AI安全控制有效性的普遍认知。
Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training.
大多数人认为循环深度网络需要通过时间反向传播(BPTT)进行训练,这是计算密集型的,但作者认为这是不必要的,因为通过扩散块视角,可以用单次前向传递替代多次迭代,这一观点挑战了循环神经网络训练的传统方法。
We found a new way to break the network into blocks and train them independently.
大多数人认为神经网络必须作为一个整体进行联合训练才能达到最佳性能,但作者认为这是不必要的,因为证明了分块独立训练可以达到与端到端训练相当的性能,挑战了神经网络训练的基本共识。
Taking something off the shelf is maybe not going to work because there are all of these other requirements.
大多数人认为企业应该采用现成的AI代理系统以加速实施,但作者认为企业需要构建内部标准化框架,这挑战了当前AI市场对'开箱即用'解决方案的主流推崇。这一观点暗示AI代理可能需要更加定制化的企业级解决方案,而非通用产品。
This rush to do AI in a world where you haven't even modernized your application reminds me a little bit of that lift-and-shift that happened in the cloud.
大多数人认为AI应用应该优先采用最新技术快速实现,但作者将其比作云计算早期的'简单迁移'模式,认为这是一种可能导致资源浪费的短视行为。这与当前AI领域的快速采用主流观点相悖,暗示企业在AI应用上可能需要更加谨慎的基础架构规划。
After a first wave focused on rapid deployment, organizations now need to revisit those first-generation implementations, and redesign early agent architectures around workflow orchestration, observability, governance, and recovery
大多数人认为AI代理开发应该持续向前推进新技术,但作者认为企业实际上需要回到早期实现进行重建,因为快速部署阶段忽视了基础架构的可靠性问题。这与主流的'不断前进'的AI发展观相悖,暗示了AI发展可能需要经历一个'重建期'而非单纯的演进。
Models of this capability level require stronger cyber safeguards before they can be generally released.
大多数人认为更高级的AI模型应该更快地推向市场以获取竞争优势,但作者认为更强大的模型(如Mythos级)需要更强的网络安全保障才能发布。这与科技行业'快速迭代、先发布后完善'的主流做法形成鲜明对比,强调了安全可能优先于商业利益。
Opus 4.8 defaults to high effort, which we judge to be the best overall balance of quality and user experience.
大多数人认为AI模型应该追求最高效率和最快响应,但作者认为默认使用'高努力'模式(更频繁、更深入思考)是最佳平衡点。这与行业普遍追求的'速度至上'理念相悖,暗示质量有时需要牺牲效率来获得。
Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.
大多数人认为AI模型会自信地输出有缺陷的代码而不自知,但作者认为Opus 4.8显著提高了自我纠错能力。这挑战了人们对AI模型自我评估能力的普遍怀疑,表明AI可能在代码质量方面比人们预期的更加可靠。
Opus 4.8 defaults to high effort, which we judge to be the best overall balance of quality and user experience.
大多数人认为AI模型应该追求最高效率或最低成本,但作者认为高努力程度是最佳平衡点,因为这能提供更好的用户体验和性能。这挑战了AI行业普遍追求速度和效率的主流认知,暗示质量与速度的权衡可能比人们认为的更重要。
Claude is learning how businesses actually operate: the context, the processes, the judgment.
大多数人认为AI模型主要是通过训练数据学习,而非通过实际业务操作进行学习。但作者暗示Claude正在通过企业部署过程中实时学习业务流程和决策逻辑,这种学习方式挑战了传统AI模型的训练范式,暗示AI可能正在从静态训练向动态学习转变。
Anthropic has raised $65 billion in Series H funding led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, valuing the company at $965 billion post-money.
大多数人认为AI公司的估值增长会遵循更渐进的曲线,但Anthropic在短短时间内从Series G到Series H实现了估值的大幅跃升,达到近1万亿美元。这种估值速度和规模挑战了传统科技公司的估值逻辑,暗示AI行业可能正在经历一种全新的资本运作模式。
Claude is the first frontier model available on all three of the world's largest cloud platforms: Amazon Web Services, Google Cloud, and Microsoft Azure.
大多数人认为AI公司通常会与单一云平台建立深度绑定关系,但Anthropic打破了这一行业常规,同时在三大云平台上提供其前沿模型。这种多平台策略挑战了科技行业常见的排他性合作模式,表明Anthropic可能正在寻求更广泛的市场覆盖和减少对单一供应商的依赖。
Startups and Global 5000 companies alike are deploying Claude to handle complex workflows, and in doing so, Claude is learning how businesses actually operate: the context, the processes, the judgment.
大多数人认为AI模型主要是在受控环境中学习和训练,但这里暗示Claude正在通过实际业务操作直接学习企业运作模式,这种在真实商业环境中持续学习的方式挑战了传统AI训练方法的封闭性和局限性,暗示AI可能正在向自主学习和适应的方向发展。
Claude is the first frontier model available on all three of the world's largest cloud platforms: Amazon Web Services, Google Cloud, and Microsoft Azure.
大多数人认为顶级AI模型通常会选择单一云平台作为主要合作伙伴以获得更好的条件和支持,但Anthropic同时与三大云平台合作,这种多平台策略挑战了传统科技行业的排他性合作模式,表明AI公司正在重新定义与云服务提供商的关系。
Since our Series G in February, adoption has continued to grow across global enterprise customers, and our run-rate revenue crossed $47 billion earlier this month.
大多数人认为AI公司在短期内难以实现大规模商业化,特别是达到470亿美元的年收入。这一数字暗示Anthropic可能正在以极快的速度实现收入增长,远超传统科技公司的扩张速度,挑战了人们对AI商业化时间表的普遍认知。
Anthropic has raised $65 billion in Series H funding led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, valuing the company at $965 billion post-money.
大多数人认为AI公司的估值通常基于其实际收入和盈利能力,但Anthropic以470亿美元的年收入获得了近万亿美元的估值,这一估值水平远超传统科技公司,表明投资者对AI未来的预期已完全脱离当前财务基本面,形成了非理性的估值泡沫。
我们不是要挑战医生的权威,而是要帮患者明明白白看病,以患者为中心,让他拥有知情权和决策权。
在AI医疗领域,大多数公司选择与医生合作或复制医生经验,而王小川提出'造医生'而非'复制医生'的理念,强调以患者为中心而非医生权威。这一立场挑战了医疗AI行业普遍的'医生中心'模式,提出了一个与主流医疗AI发展路径不同的非共识观点。
如果做主流,你也会有其他恐惧。我不是说我现在做得特别好,只是主流也有主流的问题,不同选择有各自的代价。
大多数人认为选择主流AI赛道(通用大模型)会更安全、更有前景,但王小川认为即使走主流道路也会面临同等程度的焦虑和恐惧,暗示行业共识可能存在盲点。这一观点挑战了'主流即安全'的普遍认知,暗示在AI领域,无论选择哪条道路都有其内在压力。
The same isolation keeping Claude contained also kept host-based endpoint detection and response out. From the EDR's perspective, Claude Cowork is an opaque hypervisor process.
大多数人认为更强的隔离总是意味着更好的安全性,但作者指出过度的隔离会阻止安全监控工具(如EDR)发挥作用,创造出'安全盲点'。这一发现挑战了安全领域中'隔离越多越好'的普遍假设,强调了安全与可见性之间的平衡。
Battle-tested hypervisors, syscall filters, and container runtimes have survived more adversarial attention than anything you'll build. Across every deployment described here, the standard primitives held while our own work around them exposed flaws.
大多数人认为定制化的安全组件会比成熟的开源工具更安全,但作者的经验表明,经过实战检验的标准组件(如hypervisors和容器运行时)实际上比自定义组件更可靠。这一观点挑战了安全工程中常见的'重新发明轮子'倾向,强调了使用成熟解决方案而非自定义实现的重要性。
The more approvals a user sees, the less attention they pay to each, becoming over time much less diligent in their supervision.
大多数人认为更多的用户监督会提高安全性,但作者发现相反的情况:频繁的审批请求会导致用户注意力下降和'审批疲劳',实际上降低了安全性。这一发现挑战了传统安全理念,即认为更多的用户参与总是能增强系统安全性。
Trump delays AI safety testing EO, claiming it would be an innovation 'blocker.' 'I really thought [the order] could have been a blocker.'
大多数人认为政府AI安全测试会促进创新和保障安全,但作者认为特朗普认为安全测试会阻碍创新,这挑战了监管通常被认为能促进负责任创新的共识。
Trump has taken a hands-off approach to regulating AI since retaking office, but members of his administration got spooked and began recommending safety testing after Anthropic flagged cybersecurity risks with its latest model, Mythos.
大多数人认为特朗普政府会继续其宽松的科技监管立场,但作者认为特朗普政府内部出现了分歧,部分官员在安全事件后转向支持AI安全测试,这挑战了人们对特朗普一贯的监管风格的预期。
This dynamic UI management is the future of software value : the harness to control the interface/ensure it's correct & the knowledge management to rationalize all the AI products over time
大多数人关注AI的功能和结果,但作者认为未来软件价值在于动态UI管理和知识管理,这种将界面控制和管理而非功能实现视为核心价值的观点与主流认知相悖。
Software systems need to decide which of these to keep over time & which are disposable ; those newer semi-permanent artifacts will become the new heads
大多数人认为软件界面应该是稳定和持久的。但作者提出界面应该是可丢弃的,半永久性的界面元素会随时间演变,这种将界面视为临时而非固定组件的观点与传统的软件设计理念相悖。
Anthropic把几乎所有资源压在文本推理和代码执行上。这个策略在商业上正在被验证:Claude Code年化收入25亿美元...但从范式演进的角度看,这是一个在积累技术债的选择。
大多数人认为专注于文本推理和代码执行是明智的商业策略,但作者认为Anthropic的这种选择是在积累技术债,因为它可能在未来统一连续空间架构的竞争中处于被动。这一观点挑战了当前AI商业成功的标准叙事。
人类语言是大脑为适配带宽产生的有损压缩协议,大脑原生认知是连续高维活动,大量感官认知从未被离散token编码。
大多数人认为语言是思维的原生格式,token能完整表达人类认知,但作者认为语言只是大脑的有损压缩协议,大量感官认知无法被token编码,这是大语言模型的结构性天花板。这一观点挑战了我们对语言与认知关系的传统理解。
Legacy systems were built for humans: data is siloed and hard to access, rules are hardcoded and slow to update, and workflows run in batches rather than in real time
大多数人认为遗留系统虽然陈旧但仍然可靠,可以逐步更新,但作者认为遗留系统从根本上是为人类设计的,无法适应AI时代的需求。这一观点挑战了对遗留系统的渐进式改进方法,暗示需要根本性替换而非简单更新。
Traditional compliance was designed around human actors. We now need a modern AI approach for verifying identity, assessing intent, and establishing liability when the counterparty is an autonomous agent
大多数人认为合规原则和框架具有普遍适用性,但作者认为针对人类设计的合规系统无法应对AI代理带来的新挑战。这一观点挑战了合规工作的基础假设,暗示需要根本性重构合规方法以适应自主代理。
Over the last 20 years the fastest-growing occupation in the US was manicurists and pedicurists. But following close behind? Compliance Officers.
大多数人认为合规是企业的负担和成本中心,但作者认为合规已成为美国增长最快的职业之一,暗示合规已成为经济中不可或缺的重要组成部分。这一观点挑战了人们对合规工作价值的传统认知,表明合规不仅必要而且正在扩张。
Compliance is moving beyond just a cost center, to a revenue driver.
大多数人认为合规纯粹是企业成本中心,主要目的是避免罚款和处罚。但作者认为合规正在从成本中心转变为收入驱动因素。这挑战了合规的传统定位,暗示现代合规可以通过提高效率、减少误报和加速客户入职等方式直接创造商业价值。
if we assume that agents will soon become the predominant purchasers on the web, this opens an entirely new category of risk.
大多数人认为合规风险主要来自人类行为者和交易对手。但作者认为随着AI代理成为网络上的主要购买者,将出现全新的风险类别。这挑战了传统合规框架的基本假设,暗示未来合规需要考虑非人类行为者的独特风险特征。
Regulation stops being a document that people interpret and becomes code that systems execute.
大多数人认为合规主要是人类专家解读和执行法规的过程。但作者认为法规将从人类解释的文档转变为系统执行的代码。这挑战了合规工作的本质认知,暗示AI将彻底改变合规领域的基本工作方式,从人类主导转向系统主导。
Model Labs are increasingly also building Agents as the product
大多数人认为模型实验室应该专注于提升基础模型的能力,但作者认为这些实验室现在正转变为代理实验室。这一观点挑战了AI行业的基础假设,即模型本身是产品,而不是模型只是更大代理系统的一部分。这标志着AI行业从'模型即产品'向'代理即产品'的根本性转变。
The quote is a big reversal of stance from a position ~uniformly held by anyone who worked at **Team Big Model**, including his previous head of OpenAI Labs
大多数人认为大型模型实验室会继续专注于基础模型研发,但作者认为这是一个立场的重大转变,因为连OpenAI前高管都开始转向代理产品。这挑战了AI行业长期以来的'模型优先'共识,表明即使是Big Model团队也开始认可代理产品的价值。
The labs understand how valuable these problems are: that's why they're building their own outsourced configuration shops, and why an entire upmarket class of reinforcement learning businesses exist.
大多数人认为大模型实验室会直接解决所有复杂问题,不需要外部帮助。但作者认为实验室明白这些复杂问题的价值,这就是他们为什么建立自己的外部配置服务,以及为什么存在整个高端强化学习企业类别。这承认了实验室在某些领域需要专业合作伙伴,挑战了实验室可以独立解决所有问题的主流观点。
The critical insight in the Oz analogy is that roughly half of any real workflow that is non-agentic carries no lab advantage. They are no better than you are at writing the deterministic software underneath the model layer.
大多数人认为AI将取代所有软件工程工作,人类只需构建AI代理层。但作者认为真实工作流程中约有一半是非代理性的,这部分工作大模型实验室没有任何优势。大模型公司在编写模型层下方的确定性软件方面并不比专业应用公司更好。这为专注于构建复杂工作流程中非AI部分的企业提供了重要机会。
The model is fungible underneath; the system of work is not. The next generation of enterprise software is going to be built off the road.
大多数人认为底层AI模型是企业的核心竞争力,模型越好产品越强。但作者认为模型是可替代的,而'工作系统'才是真正的护城河。下一代企业软件将建立在'黄砖路'之外,专注于特定行业的工作流程、数据捕获和治理。这些系统拥有端到端的工作流程所有权,这是大模型实验室无法轻易复制的优势。
Running every query through Opus 4.7 is the fastest path to negative gross margins. The best Rest of Oz companies route across tiers of models — frontier models for the hardest tasks, mid-tier for the bulk, smaller custom or fine-tuned models where they've earned the right to use them.
大多数人认为使用最先进的大模型总是最佳选择,能提供最佳结果。但作者认为这是通往负毛利的最快路径。相反,'Oz的其他部分'公司会根据任务难度分层使用不同级别的模型,只为最困难的任务使用前沿模型,为批量任务使用中等模型,为特定工作使用小型定制或微调模型。这种成本优化策略使它们能够提供更具竞争力的价格。
The labs are already routing internally — different model classes for different requests, ensembles under the hood. What they can't do is route across vendors, or evaluate a competitor's model for a specific sub-task, or use an open-source fine-tune for the narrow piece where it's actually best.
大多数人认为大模型实验室拥有绝对优势,可以解决所有AI问题。但作者认为实验室在模型选择上存在结构性限制,无法跨供应商评估模型或为特定子任务使用开源微调模型。这为专注于特定领域的企业提供了机会,它们可以选择最适合每个子任务的模型,而不仅限于自家实验室的模型。
The labs really are coming for a huge swath of the application surface. But 'the application layer' isn't just one homogenous opportunity.
大多数人认为AI将完全吞噬应用层,所有软件都会被大模型取代。但作者认为应用层并非同质化机会,存在不同类型的机遇。作者将应用分为'黄砖路'和'Oz的其他部分',认为垂直领域的复杂应用不会被大模型完全替代,因为价值不仅来自底层模型能力,还来自特定行业的可信赖、合规和运营化的支撑架构。
The result is a new competitive dynamic in software.
大多数人认为AI将使软件竞争更加激烈,但作者暗示AI实际上正在创造一种全新的竞争动态,这可能使某些领域的竞争格局完全改变。这挑战了AI对软件行业影响的主流预测,暗示行业结构可能发生根本性转变。
What happens when every company has access to the same model? The best riders win.
大多数人认为AI差异化将来自底层模型的独特性,但作者认为当所有公司都能访问相同模型时,真正的竞争将在于'驾驭者'的能力。这挑战了AI战略中模型差异化的主流观点,暗示真正的竞争优势将来自于如何使用这些模型。
Like a mustang, AI is powerful but wild. Harnessing the power means domestication.
大多数人将AI视为需要驯服的工具,但作者将其比作野生的马,暗示AI本质上是一种无法完全控制的自然力量。这种比喻挑战了AI作为完全可控工具的主流认知,暗示我们需要接受其不可预测性。
The end of the software era is the beginning of the harness era.
大多数人认为软件将随着AI而进化,但作者认为软件时代实际上已经结束,取而代之的是'驾驭'(harness)时代。这种观点挑战了技术发展的主流叙事,暗示我们正在从创造软件工具转向驯服AI系统。
The best advice I ever heard on pricing a product was that your customer should suck air through their teeth and then say yes. Uber's budget overrun and Microsoft's seat cancellations look like that effect playing out in practice.
大多数人认为AI成本超支是企业采用AI失败的迹象,但作者将其重新诠释为产品市场契合的证据。这一观点挑战了主流叙事,将企业的预算危机和取消服务视为定价成功的标志,而非AI失败的信号,这与大多数媒体报道的基调相反。
API revenue is becoming less important. Over the past two years my impression has been that OpenAI made more of their income from subscription revenue while Anthropic made more from their API.
大多数人认为AI公司的主要收入来源是API调用和订阅服务,但作者提出一个反直觉的观点:API收入正变得不那么重要。AI公司正在转向直接面向企业的产品,绕过中间商(如Cursor和GitHub Copilot),这改变了整个AI行业的商业模式和收入结构。
Coding agents really did change everything. These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals.
大多数人认为ChatGPT等通用AI助手已经实现了产品市场契合,但作者认为真正带来商业突破的是代码编写代理工具。这一观点挑战了主流认知,因为ChatGPT拥有数亿用户,而作者认为只有专业领域的代码代理才能创造足够的收入来支撑AI公司的巨额基础设施成本。
The competitive landscape in AI infrastructure has made this gap impossible to ignore. Teams building custom CUDA, Triton, and Helion kernels are striving for every percentage point of throughput. Until now, there hasn't been a way to fine-tune code generation for a specific workload.
大多数人认为GPU编译器已经提供了足够的优化选项,开发者可以通过手动调整获得最佳性能。但作者指出,在当前AI基础设施的竞争环境下,这种观点已经过时,暗示传统方法无法满足现代AI工作负载的性能需求。
These gains come on top of already-optimized baselines in kernels that were considered "done" by their authors. The improvements are the direct result of CompileIQ discovering compiler configurations that the default heuristics would never select.
大多数人认为一旦开发者完成优化工作,就没有更多性能提升空间。但作者表明,即使是"完成"的优化代码仍可能通过编译器级别的调整获得显著提升(高达15%),这挑战了开发者对优化极限的认知。
Most auto-tuning tools optimize for a single metric, typically runtime. CompileIQ goes further, supporting multi-objective optimization, simultaneously exploring trade-offs across competing objectives like runtime, compile time, and power consumption.
大多数人认为性能优化应以运行时间为唯一目标,但作者提出,真正的优化需要考虑多个相互竞争的目标(运行时间、编译时间和功耗)。这与传统的单一目标优化理念相悖,暗示开发者需要更全面的优化策略。
CompileIQ is not a magic tool that automatically turns poorly-written code into high-performing code. To get the best value from CompileIQ, you need to start with reasonably high-performing code, which then enables the final compiler-heuristics tweaks to take you to maximum performance.
大多数人可能认为AI驱动的自动调优工具可以弥补代码质量不足的问题,但作者明确表示,即使是CompileIQ这样的先进工具也需要基于已经相当优化的代码才能发挥最大作用。这挑战了"自动化工具可以解决一切性能问题"的常见误解。
In attention inference kernels, GEMMs in the linear layers of FFN/MLP blocks plus the Q, K, V, and output projections account for approximately 70% of total FLOPs. Scaled dot-product attention, fused and flash attention variants account for another 25%. Together, these two kernel families represent more than 90% of end-to-end inference compute.
大多数人认为优化整个应用程序或算法才能获得显著性能提升,但作者指出,仅仅优化占计算量90%的两个关键内核类型就能带来最大收益。这与广泛应用的"全面优化"策略相悖,暗示开发者应该将资源集中在最关键的代码路径上。
NVIDIA GPU compilers apply the same default heuristics (register allocation strategies, instruction scheduling decisions, loop unrolling thresholds, etc.) to every kernel they compile. These heuristics are engineered to produce good results across a vast range of workloads. But "good across the board" and "optimal for your workload" are two very different things.
大多数人认为编译器已经提供了足够的优化,开发者只需关注算法和代码实现即可。但作者认为,即使是最先进的GPU编译器也使用通用的启发式方法,这些方法无法针对特定工作负载进行优化,导致性能损失。这挑战了开发者社区对编译器优化能力的普遍认知。
Perhaps this time is different, and we can put aside the lessons of economic history. Certainly, AI has gained unimaginable powers to do humanlike tasks. Perhaps it will devour jobs in ways that we've never seen before.
大多数人认为历史经验可以预测AI对就业的影响,但作者认为这次可能真的不同,AI可能以前所未有的方式吞噬工作。这一观点挑战了技术变革历史模式的适用性,暗示AI可能是真正的范式转变。
The simple truth could be that coding skills are no longer a guarantee of a job. That may help to explain the drop-off of computer science majors at schools around the country.
大多数人认为计算机科学和编程技能仍然是就业的保证,但作者认为这些技能可能不再是工作的保证,这解释了计算机科学专业人数的下降。这一观点挑战了传统技术教育价值的认知,暗示AI正在改变就业市场的基本规则。
One of the somewhat surprising wrinkles uncovered by recent research is that wages in sectors highly exposed to AI have risen relatively fast since the introduction of ChatGPT.
大多数人认为AI会压低工资或导致工资增长停滞,但作者认为AI高度影响行业的工资实际上在快速增长。这一发现与主流预期相悖,表明AI可能正在增加而非减少高技能工作的价值。
The impact on head counts depended on how AI was being used. It was specifically the jobs where tasks could be automated... that accounted for the decrease in employment—jobs for people like software developers. In jobs where AI was mainly used but to augment human work, head counts grew faster than the average for entry-level workers.
大多数人认为AI会替代所有相关工作,但作者认为AI对就业的影响取决于使用方式——完全自动化的工作确实减少,但增强人类工作的AI反而促进了就业增长。这一区分挑战了AI必然导致失业的简单化观点。