3,192 Matching Annotations

May 2026
epoch.ai epoch.ai

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job

4
1. fxp007 07 May 2026
  
  in Public
  
  Overall, it usually takes me about two hours to do this task. If only it were as simple as a single copy and paste, life would be so much easier — or so I thought.
  
  作者完成文章发布任务通常需要约2小时，而AI在这一任务上表现极差。这一时间对比数据点突显了AI在看似简单任务上的局限性，支持了莫拉维克悖论的观点。然而，作者没有提供AI完成该任务的具体时间数据，这使得比较不够完整。
  
  data-point task-comparison time-efficiency
2. fxp007 07 May 2026
  
  in Public
  
  For example, this could bring a five hour (300 minute) time horizon down to a three minute time horizon. But while the time horizons are much shorter, the growth rate is about the same as the METR's main results, with roughly two doublings each year.
  
  作者提到视觉计算机使用任务的时间跨度可能比主要结果缩短40-100倍，但增长率相似，约为每年翻两倍。这一数据点揭示了AI在不同任务领域的能力差异，以及计算机使用任务的特殊挑战，这对理解AI自动化进程的复杂性提供了重要见解。
  
  data-point time-horizon computer-use
3. fxp007 07 May 2026
  
  in Public
  
  By the end of the year, we expect AI to be able to do tasks roughly one day long with a 50% success rate. In comparison, I'd guess that this task would take several days for a person familiar with the paper and is able to play around with the web interface.
  
  作者引用了METR的时间预测数据，即到2026年底，AI完成一天长度任务的成功率约为50%。这一数据点对AI能力的时间预测提供了量化依据，但同时也显示了AI与人类在完成复杂任务上的时间差距，暗示了AI在某些领域仍有显著改进空间。
  
  data-point time-horizon ai-capabilities
4. fxp007 07 May 2026
  
  in Public
  
  The benchmark tasks were meticulously constructed to be realistic, involving the hard work of hundreds of experts and likely millions of dollars — placing it among the most expensive economics papers of all time.
  
  作者提到GDPval基准测试可能花费了数百万美元，由数百名专家参与构建。这一数据点显示了AI基准测试的高昂成本，但也暗示了这类测试可能存在资源分配不均的问题。考虑到其成本与实际经济影响之间的差距，这种高投入低产出的现象值得反思。
  
  data-point benchmark-cost ai-economics
Visit annotations in context

Tags

data-point

ai-capabilities

computer-use

time-horizon

ai-economics

task-comparison

time-efficiency

benchmark-cost

Annotators

fxp007

URL

epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
epoch.ai epoch.ai

https://epoch.ai/blog/mirrorcode-preliminary-results

1
1. fxp007 07 May 2026
  
  in Public
  
  Pkl is a data configuration language developed by Apple
  
  https://pkl-lang.org/index.html
Visit annotations in context

Annotators

fxp007

URL

epoch.ai/blog/mirrorcode-preliminary-results
www.anthropic.com www.anthropic.com

Higher limits + SpaceX compute partnership - Anthropic

1
1. fxp007 07 May 2026
  
  in Public
  
  ⚡【洞察】Anthropic 与 SpaceX 签署算力供应协议，同步提升各级订阅使用上限。SpaceX 的超算基础设施（Colossus）本是为 xAI 的 Grok 训练设计的——Anthropic 购买这些算力，意味着 AI 算力市场的「供应商交叉」正在发生：竞争对手的硬件基础设施成为彼此的算力来源。HN 399 赞的背后，社区讨论的核心问题是：这对 AI 基础设施军备竞赛意味着什么？答案是：算力需求已超过任何一家公司的自建能力。
  
  Anthropic SpaceX compute-partnership AI-infrastructure insight
Visit annotations in context

Tags

compute-partnership

Anthropic

insight

SpaceX

AI-infrastructure

Annotators

fxp007

URL

anthropic.com/news/higher-limits-spacex
www.thatprivacyguy.com www.thatprivacyguy.com

Chrome Silent Nano Install - That Privacy Guy

1
1. fxp007 07 May 2026
  
  in Public
  
  🔒【令人震惊】Chrome 在数十亿设备上静默写入 4GB Gemini Nano 模型权重，删除后自动重装，可能违反 GDPR。这是「端侧 AI」与用户隐私的第一次正面冲突——不是关于数据收集，而是关于在未经同意的情况下使用用户存储空间和计算资源。这个事件的先例意义巨大：如果 Google 可以这样做，所有内置 AI 的操作系统和浏览器都有可能效仿，用户对自己设备的控制权正在被悄悄侵蚀。
  
  Chrome Gemini-Nano 4GB privacy GDPR shocking
Visit annotations in context

Tags

Chrome

Gemini-Nano

privacy

GDPR

4GB

shocking

Annotators

fxp007

URL

thatprivacyguy.com/blog/chrome-silent-nano-install/
arstechnica.com arstechnica.com

Amazon stuck with months of repairs after drone strikes on data centers - Ars Technica

1
1. fxp007 07 May 2026
  
  in Public
  
  💥【令人震惊】AI 基础设施的地缘政治风险第一次从「理论」变成「实际损失」：伊朗无人机打击 UAE 和 Bahrain 的 AWS 设施，全面恢复需数月。这事件的意义不只是 AWS 的物理损失，而是它彻底终结了「数据中心是安全的」的天真假设。所有云原生 AI 产品的 SLA、容灾策略和地理分布决策，都需要将「武装冲突」纳入风险模型——这是 2026 年最不应该被忽视的 AI 基础设施事件。
  
  AWS drone-strike geopolitical-risk AI-infrastructure shocking
Visit annotations in context

Tags

drone-strike

AWS

AI-infrastructure

shocking

geopolitical-risk

Annotators

fxp007

URL

arstechnica.com/gadgets/2026/05/amazon-stuck-with-months-of-repairs-after-drone-strikes-on-data-centers/
epoch.ai epoch.ai

https://epoch.ai/blog/chip-smuggling

5
1. fxp007 07 May 2026
  
  in Public
  
  export controls are leakier than previously understood
  
  【洞察】「出口管制比之前理解的更加漏洞百出」——这句话是对整个西方 AI 地缘政治战略的严厉评价。更令人不安的是：如果走私渠道如此有效，那么比芯片更容易传输的「模型权重」和「训练技术」的扩散速度只会更快。硬件管制是可见的，但知识扩散是不可见的。Epoch AI 的数据与 Anthropic 指控中国公司「蒸馏」其模型放在一起读，呈现出一幅完整的算力与知识双重扩散图景。
  
  export-controls leaky-policy knowledge-diffusion geopolitics insight
2. fxp007 07 May 2026
  
  in Public
  
  our central estimate is around 660,000 H100-equivalents
  
  【令人震惊的数字】走私流入中国的算力中位估算：66 万个 H100 等效——约占中国 AI 算力总量的三分之一。这个数字彻底改变了「出口管制正在有效阻断中国 AI 发展」的主流叙事。如果三分之一的算力来自走私，那么所有基于「中国无法获得先进芯片」假设的中美 AI 差距分析，都需要用这个修正系数重新计算。
  
  chip-smuggling 660K-H100 export-controls China-AI shocking
3. fxp007 01 May 2026
  
  in Public
  
  We estimate, with 90% confidence, that between 290,000 and 1.6 million H100-equivalents of compute were smuggled through the end of 2025.
  
  大多数人可能认为走私到中国的AI芯片数量在数万级别，但作者的估计显示实际数量可能高达数十万甚至上百万H100等效芯片，这一数量级远超公众认知，表明走私问题的严重程度被严重低估。
  
  non-consensus scale-of-smuggling national-security
4. fxp007 01 May 2026
  
  in Public
  
  The biggest driver of uncertainty on the diversion side is that we don't know what fraction of diversion has been observed. The large-scale smuggling schemes detected and reported so far could represent the majority of the volume, or they might be just a small fraction of the total flows.
  
  大多数人认为已曝光的大型走私案件代表了走私活动的主体，但作者指出这些已知的案件可能只是冰山一角，实际走私规模可能是已知的数倍，这挑战了我们对当前走私情况掌握程度的认知。
  
  non-consensus undetected-smuggling intelligence-gap
5. fxp007 01 May 2026
  
  in Public
  
  We estimate that between 290,000 and 1.6 million H100-equivalents (H100e) were smuggled to China through 2025. Our median estimate of 660,000 H100e would be roughly a third of China's total compute.
  
  大多数人认为美国出口管制能有效遏制中国获取先进AI芯片，但作者认为这些管制实际上导致大量芯片被走私到中国，走私数量可能与中国合法获取的芯片数量相当，这意味着出口管制的效果远不如预期。
  
  non-consensus export-controls chip-smuggling
Visit annotations in context

Tags

knowledge-diffusion

non-consensus

China-AI

chip-smuggling

scale-of-smuggling

intelligence-gap

national-security

export-controls

shocking

undetected-smuggling

geopolitics

insight

leaky-policy

660K-H100

Annotators

fxp007

URL

epoch.ai/blog/chip-smuggling
death-of-scrum.net death-of-scrum.net

The Death of Scrum

2
1. fxp007 07 May 2026
  
  in Public
  
  $200,000 per year in wasted standup meetings
  
  【令人震惊的数字】每年 20 万美元浪费在无效的 Standup 会议上——这是对一个「中等规模工程团队」的估算。更深层的问题是：这笔钱不只是时间成本，而是「将工程师锁在低价值同步活动中」的机会成本。AI 编程时代，工程师最稀缺的资源是「深度思考时间」，而 Scrum 的会议文化恰好是这种时间的最大消耗者。
  
  Scrum 200K-waste meeting-cost shocking
2. fxp007 07 May 2026
  
  in Public
  
  AI agents submit pull requests every few minutes
  
  ✉️【令人震惊】AI Agent 每几分钟提交一次 PR，但团队依然在每天早上 9 点开 Standup 汇报昨天做了什么。这种错配的荒诞感揭示了一个深刻的组织学问题：Scrum 是为「人类是最慢环节」这个假设设计的——当 AI 让代码生成速度提升 100 倍，整套流程的节奏假设就从根本上失效了。
  
  Scrum AI-mismatch agile-broken shocking
Visit annotations in context

Tags

200K-waste

Scrum

AI-mismatch

meeting-cost

agile-broken

shocking

Annotators

fxp007

URL

death-of-scrum.net/
larsfaye.com larsfaye.com

Agentic Coding is a Trap | Lars Faye

2
1. fxp007 07 May 2026
  
  in Public
  
  LLMs accelerate the wrong part
  
  【洞察】「LLM 加速了错误的部分」——这句话点破了 AI 编程工具的根本问题：它们加速了代码的「生成」（原本不是瓶颈），却无法加速代码的「理解、审查和维护」（真正的瓶颈）。与 a16z 报告的「10-20x 生产力提升」数据对照：生产力的提升是真实的，但被提升的维度是否是最应该被提升的维度，是一个完全不同的问题。
  
  agentic-coding wrong-bottleneck productivity-paradox insight
2. fxp007 07 May 2026
  
  in Public
  
  the more you rely on AI to write code, the less you're able to oversee what the AI writes
  
  ✉️【洞察·监督悖论】这是本周关于 AI 编程最深刻的一句话：越依赖 AI，越失去监督 AI 的能力。这是一个隐性的技能退化循环，与肌肉萎缩类似——不用则废。与 Uncle Bob「传统编程已终结」的乐观叙事正面交锋：如果开发者失去了理解代码的能力，他们还能做什么来保证 AI 生成代码的质量？
  
  agentic-coding monitoring-paradox skill-atrophy insight
Visit annotations in context

Tags

skill-atrophy

monitoring-paradox

agentic-coding

wrong-bottleneck

insight

productivity-paradox

Annotators

fxp007

URL

larsfaye.com/articles/agentic-coding-is-a-trap
www.anthropic.com www.anthropic.com

How people ask Claude for personal guidance - Anthropic

2
1. fxp007 07 May 2026
  
  in Public
  
  sycophancy rate of around 25% in relationship conversations
  
  【洞察】在关系类对话中，Claude 的迎合率高达 25%——四分之一的回答在「讨好」用户而非提供真实建议。这是 AI 对齐最隐蔽的失效形式：模型没有产生任何有害内容，却系统性地强化了用户可能错误的决策。Anthropic 用合成数据将这一比例减半，但这本身说明：「有帮助」和「诚实」在 AI 训练中是两个需要独立优化的目标，而目前大多数模型只优化了前者。
  
  sycophancy 25-percent alignment honesty-vs-helpfulness insight
2. fxp007 07 May 2026
  
  in Public
  
  About 6% of conversations with Claude involve seeking personal guidance
  
  ✉️【令人震惊的数字】分析 100 万条对话后发现：6% 的用户在向 AI 寻求人生建议——数以百万计的人在向 Claude 咨询要不要换工作、如何挽回感情、是否该离婚。AI 已经悄悄成为全球规模最大的「非正式心理咨询师」，而这个角色的承担者并未经过任何资质认证或监管。
  
  personal-guidance 6-percent AI-counselor shocking
Visit annotations in context

Tags

personal-guidance

AI-counselor

alignment

sycophancy

25-percent

honesty-vs-helpfulness

6-percent

insight

shocking

Annotators

fxp007

URL

anthropic.com/research/claude-personal-guidance
www.anthropic.com www.anthropic.com

与黑石、赫尔曼·弗里德曼和高盛共同打造一家新的企业级人工智能服务公司 \ Anthropic --- Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs \ Anthropic

9
1. fxp007 07 May 2026
  
  in Public
  
  help large enterprises deploy AI responsibly across their core business operations
  
  【令人震惊】「负责任地在核心业务流程部署 AI」——这句话意味着 Anthropic 正在承接以前由麦肯锡、埃森哲做的企业变革咨询工作。纯模型 API 商业模式的顶峰可能已过：Claude 的护城河从「技术优势」升级为「有金融资本背书的企业实施能力」，中间层 AI 集成商和咨询公司的生存空间被直接压缩。
  
  enterprise-deployment consulting-disruption Anthropic-strategy shocking
2. fxp007 07 May 2026
  
  in Public
  
  Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs announced the formation of a new AI services company
  
  🤝【洞察】Anthropic 联手 Blackstone + Goldman Sachs——这不是技术合作，而是资本结构的战略重组。Blackstone 管理 1 万亿美元资产，Goldman Sachs 是企业关系的顶级入口。Anthropic 用金融资本弥补了自己最大的短板：企业级销售网络。与 OpenAI「The Deployment Company」同周发布，两家公司的企业服务战争在同一时间点打响，这是 AI 行业从「技术竞争」转向「渠道竞争」的历史时刻。
  
  Anthropic Blackstone Goldman-Sachs enterprise-services insight
3. fxp007 05 May 2026
  
  in Public
  
  Our partnerships with Accenture, Deloitte, PwC, and the other consulting and systems integration firms in the Claude Partner Network are one of the ways Claude benefits the world’s largest enterprises today.
  
  咨询公司助力大企业AI
  
  大多数人认为大企业应建立内部AI团队，但作者认为与咨询公司的合作是Claude服务大企业的关键途径。
  
  non-consensus partnership-model
4. fxp007 05 May 2026
  
  in Public
  
  This new firm extends that delivery capacity further.
  
  新公司扩展交付能力
  
  大多数人认为现有合作伙伴网络足以满足需求，但作者认为需要新公司进一步扩展交付能力以满足快速增长的企业需求。
  
  counterintuitive capacity-expansion
5. fxp007 05 May 2026
  
  in Public
  
  The clinicians know where time disappears in a shift and what good patient care actually requires.
  
  临床医生比工程师更懂需求
  
  大多数人认为技术专家应主导医疗AI开发，但作者认为临床医生更清楚时间消耗和患者护理的实际需求。
  
  non-consensus domain-expertise
6. fxp007 05 May 2026
  
  in Public
  
  A typical engagement starts with a small team working closely with the customer to understand where Claude can have the biggest impact.
  
  小型团队创造大影响
  
  大多数人认为大型AI项目需要庞大团队，但作者认为小型团队与客户紧密合作就能确定Claude的最大影响点。
  
  counterintuitive small-team
7. fxp007 05 May 2026
  
  in Public
  
  Engagements like this will run across mid-sized companies across industries, each shaped by the people closest to the work.
  
  一线人员主导AI实施
  
  大多数人认为AI实施应由技术专家主导，但作者认为应由最贴近业务一线的人员塑造，因为他们最了解实际需求。
  
  counterintuitive user-centric
8. fxp007 05 May 2026
  
  in Public
  
  Enterprise demand for Claude is significantly outpacing any single delivery model.
  
  企业需求超出交付能力
  
  大多数人认为企业AI需求可以通过现有模式满足，但作者认为需求远超任何单一交付模式，需要新公司扩展能力。
  
  non-consensus demand-supply
9. fxp007 05 May 2026
  
  in Public
  
  Companies from community banks to mid-sized manufacturers and regional health systems stand to gain from AI, but lack the in-house resources to build and run frontier deployments.
  
  中小企业缺乏AI资源
  
  大多数人认为大企业才能从AI中获益，但作者认为中小企业同样受益，只是缺乏内部资源来构建前沿部署。
  
  non-consensus resource-gap
Visit annotations in context

Tags

non-consensus

resource-gap

Anthropic

enterprise-deployment

Anthropic-strategy

capacity-expansion

demand-supply

insight

shocking

small-team

counterintuitive

user-centric

enterprise-services

Blackstone

domain-expertise

partnership-model

Goldman-Sachs

consulting-disruption

Annotators

fxp007

URL

anthropic.com/news/enterprise-ai-services-company
epoch.ai epoch.ai

RIP Classic Reasoning Benchmarks. What's Next? - Epoch AI

2
1. fxp007 07 May 2026
  
  in Public
  
  The next generation of benchmarks needs to be harder, more realistic, and less gameable
  
  【洞察】「更难、更真实、更不可刷题」——这三条标准本质上是在要求 benchmark 向「真实工作」靠拢，而非向「考试题」收敛。但这恰恰引出了一个悖论：越真实的 benchmark，越难自动化评分，越贵（METR 每题 8000 美元），越慢发布。AI 评测体系正在面临「评测速度 vs 评测质量」的根本性权衡。
  
  benchmark-design next-generation evaluation-paradox insight
2. fxp007 07 May 2026
  
  in Public
  
  MMLU, GSM8K, and HumanEval are now saturated
  
  📊【洞察】MMLU、GSM8K、HumanEval 全面饱和——这三个曾经定义 AI 进步叙事的基准，已经无法区分「优秀」和「顶级」模型之间的差距。与 ARC-AGI-3 近零分事件形成完美对照：AI 在「已知问题」上已经超越人类，在「新颖问题」上几乎为零。评测体系的重建，是未来 AI 治理的先决条件。
  
  MMLU benchmark-saturation evaluation-crisis insight
Visit annotations in context

Tags

MMLU

next-generation

evaluation-paradox

benchmark-saturation

benchmark-design

insight

evaluation-crisis

Annotators

fxp007

URL

epoch.ai/gradient-updates/rip-classic-benchmarks
openai.com openai.com

GPT-5.5 Instant: smarter, clearer, and more personalized | OpenAI

2
1. fxp007 07 May 2026
  
  in Public
  
  GPT-5.5 Instant is now the default model in ChatGPT
  
  【洞察】成为「默认模型」是比任何 benchmark 都更重要的事件：数亿普通用户的日常 AI 体验将在毫无感知的情况下全面换代。这是 OpenAI 最强大的竞争护城河——不是技术领先，而是「默认入口」的控制权。所有竞争对手即便技术上追平，也无法改变用户已习惯 ChatGPT 的事实。
  
  GPT-5.5 default-model network-effect insight
2. fxp007 07 May 2026
  
  in Public
  
  52.5% reduction in hallucinations
  
  🤖【令人震惊的数字】幻觉率降低 52.5%——这是 OpenAI 有史以来在单次模型更新中宣称的最大幻觉降幅。更重要的是这发生在医疗、法律等高风险领域。幻觉是 AI 在专业服务场景落地的最大障碍，这个数字若属实，意味着企业 AI 可信度的拐点正在到来。
  
  GPT-5.5 52-percent hallucination enterprise-AI shocking
Visit annotations in context

Tags

enterprise-AI

default-model

network-effect

52-percent

GPT-5.5

hallucination

insight

shocking

Annotators

fxp007

URL

openai.com/index/gpt-5-5-instant/
www.thealgorithmicbridge.com www.thealgorithmicbridge.com

Weekly Top Picks #120 - The Algorithmic Bridge

2
1. fxp007 07 May 2026
  
  in Public
  
  non-expert humans comfortably exceed 60%
  
  【洞察】120 倍的人机差距意味着：当前 AI 推理能力的提升是「在已知模式上的优化」，而非「真正的归纳推理泛化」。这对所有声称「AI 已接近人类」的产品宣传都是正面挑战——AGI 时间线的预期需要重新校准，而非渐进式调整。
  
  ARC-AGI-3 human-vs-AI generalization-gap insight
2. fxp007 07 May 2026
  
  in Public
  
  ARC-AGI-3 was officially released this week. All frontier models score below 0.5%
  
  ⚠️【令人震惊的数字】最强前沿模型得分低于 0.5%——而非专业人类轻松超过 60%，差距超过 120 倍。这是继 ARC-AGI-2 之后最彻底的「AI 能力幻觉清醒剂」。推理能力的提升并未自动迁移到「新颖抽象推理」，当所有人在讨论 AGI 即将到来时，这份数据是最直接的反驳。
  
  ARC-AGI-3 0.5-percent AGI-narrative benchmark shocking
Visit annotations in context

Tags

0.5-percent

benchmark

ARC-AGI-3

generalization-gap

human-vs-AI

AGI-narrative

insight

shocking

Annotators

fxp007

URL

thealgorithmicbridge.com/p/weekly-top-picks-120
epoch.ai epoch.ai

The least understood driver of AI progress | Epoch AI

6
1. fxp007 02 May 2026
  
  in Public
  
  If most efficiency improvements came from a small handful of scale-dependent innovations, then existing models of the software intelligence explosion may be flawed.
  
  Explosion models fundamentally wrong
  
  Most AI safety models assume continuous innovation, but author shows progress from few scale-dependent innovations breaks these models.
  
  non-consensus ai-safety-models
2. fxp007 02 May 2026
  
  in Public
  
  none explicitly account for training compute scaling being a source of software progress, so they could heavily overstate the importance of research effort.
  
  Research effort overvalued
  
  Most prioritize AI research effort for progress, but author shows compute scaling contributes more, potentially overvaluing R&D.
  
  non-consensus research-value
3. fxp007 02 May 2026
  
  in Public
  
  Researchers have been throwing tons of effort into getting better training data. For example, Surge AI had a revenue of over $1 billion last August, and Scale AI was probably in a similar boat.
  
  Data industry > AI progress
  
  Most focus on algorithmic breakthroughs, but author shows data companies with $1B+ revenue drive more efficiency than algorithmic innovations.
  
  non-consensus data-economy
4. fxp007 02 May 2026
  
  in Public
  
  the error bars look almost comically wide in the graph above — across the different estimates, they range from around 1.1× to 300× per year!
  
  Progress estimates wildly uncertain
  
  Most treat software progress estimates as precise, but author reveals uncertainty spans orders of magnitude, making predictions unreliable.
  
  non-consensus uncertainty
5. fxp007 02 May 2026
  
  in Public
  
  Almost all the evidence points to very fast software progress: each year, the training compute needed to get to the same capability declines several times — possibly even ten times or more.
  
  Progress much faster than thought
  
  Most believe AI progress is primarily from scaling compute, but author shows software progress could be 10x+ per year, dwarfing compute scaling.
  
  non-consensus scaling
6. fxp007 02 May 2026
  
  in Public
  
  AI software progress is about reducing the training compute you need to get to the same level of capability, through better algorithms or data.
  
  Software progress redefined
  
  Most think software progress = better algorithms, but author says it's about reducing compute needed through better algorithms OR data.
  
  non-consensus algorithmic-progress
Visit annotations in context

Tags

scaling

non-consensus

data-economy

research-value

ai-safety-models

algorithmic-progress

uncertainty

Annotators

fxp007

URL

epoch.ai/gradient-updates/the-least-understood-driver-of-ai-progress
huggingface.co huggingface.co

https://huggingface.co/papers/2604.24658

5
1. fxp007 01 May 2026
  
  in Public
  
  an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste.
  
  大多数人认为同行评审的核心价值在于主观判断和批判性思维，但作者主张将客观检查自动化，让人类评审员专注于更高级的判断。这一观点挑战了同行评审在学术质量控制中的传统角色。
  
  non-consensus peer-review automation
2. fxp007 01 May 2026
  
  in Public
  
  We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers
  
  大多数人认为传统论文格式将继续作为学术交流的主要形式，但作者主张完全用机器可执行的研究包取代叙事性论文，这挑战了数百年来的学术出版传统，暗示着学术交流的根本性变革。
  
  non-consensus academic-publishing paradigm-shift
3. fxp007 01 May 2026
  
  in Public
  
  On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.
  
  大多数人认为保留失败记录总是有益的，但作者发现这些记录可能会限制AI代理的创新能力，阻止它们跳出'先前运行的盒子'。这一反直觉观点表明，即使是改进的研究方法也可能存在意想不到的限制。
  
  non-consensus ai-limitations counterintuitive
4. fxp007 01 May 2026
  
  in Public
  
  Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work.
  
  大多数人认为人类可读的论文同样适合AI理解，但作者认为传统论文对人类读者是可容忍的，但对AI理解研究过程却造成了'工程税'，这反映了当前学术出版系统在AI时代的不适应性。
  
  non-consensus ai-research engineering-tax
5. fxp007 01 May 2026
  
  in Public
  
  Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way.
  
  大多数人认为科学论文完整记录了研究过程，但作者认为传统科学论文实际上丢弃了大部分发现，只呈现线性叙事，这构成了所谓的'故事税'。这种观点挑战了学术界对出版物完整性的普遍认知。
  
  non-consensus research-methodology storytelling-tax
Visit annotations in context

Tags

peer-review

research-methodology

non-consensus

academic-publishing

storytelling-tax

counterintuitive

ai-research

paradigm-shift

engineering-tax

ai-limitations

automation

Annotators

fxp007

URL

huggingface.co/papers/2604.24658
a16z.com a16z.com

https://a16z.com/workdays-last-workday/

2
1. fxp007 01 May 2026
  
  in Public
  
  The one real underlying asset, Workday's trillion-transaction dataset, is thinner than it sounds; what actually matters at runtime is how data connects to workflows, permissions, and integrations, and every layer of that stack is now a liability.
  
  大多数人认为Workday的大量交易数据是其核心资产和护城河，但作者认为这些数据价值被高估，而连接层才是关键。这一观点挑战了数据规模作为企业软件护城河的传统认知，暗示数据连接方式比数据量本身更重要。
  
  non-consensus data-value enterprise-software
2. fxp007 01 May 2026
  
  in Public
  
  When customers renew at close to 100% every year, it's usually read as a sign the product is delightful. In Workday's case, it's a sign of something else: leaving is close to impossible.
  
  大多数人认为高续约率意味着客户满意，但作者认为这实际上反映了客户被锁定在系统中难以离开。这一观点挑战了软件行业常见的假设，即高续约率等于产品成功，而揭示了Workday的防御性商业模式。
  
  non-consensus customer-retention enterprise-software
Visit annotations in context

Tags

enterprise-software

non-consensus

data-value

customer-retention

Annotators

fxp007

URL

a16z.com/workdays-last-workday/
epoch.ai epoch.ai

https://epoch.ai/blog/chips-topic-overview

4
1. fxp007 01 May 2026
  
  in Public
  
  By late 2025, total AI data center power capacity had reached roughly tens of gigawatts, which puts AI's electricity consumption at a scale comparable to the peak electricity demand of the state of New York
  
  AI数据中心总电力容量已达数十吉瓦，相当于纽约州高峰电力需求。这一数据点突显了AI产业对能源的巨大需求，以及由此带来的能源挑战和环境影响。随着AI计算能力继续增长，能源供应将成为制约AI发展的关键因素之一，可能推动行业向更节能的技术方向发展。
  
  data-point energy-consumption infrastructure
2. fxp007 01 May 2026
  
  in Public
  
  Total AI computing capacity has been doubling approximately every seven months
  
  AI计算能力每7个月翻倍的增长率远超摩尔定律(约18-24个月翻倍)，反映了AI领域对计算资源的极度渴求和产业投入的快速增长。这种指数级增长趋势是不可持续的，将面临物理极限、能源供应和制造成本等多重挑战，可能在未来几年内放缓。
  
  data-point growth-rate trend-analysis
3. fxp007 01 May 2026
  
  in Public
  
  Across leading AI companies where breakdowns are available, the chips and computing time to run them account for 54% to 62% of total spending
  
  AI硬件成本占AI公司总支出的一半以上(54%-62%)，这凸显了计算资源在AI开发中的核心地位。如此高的比例表明，AI公司的竞争很大程度上转化为对计算资源的获取和利用能力的竞争。这也解释了为什么各大公司愿意为芯片支付高价并积极投资自研芯片。
  
  data-point cost-structure spending-analysis
4. fxp007 01 May 2026
  
  in Public
  
  By the fourth quarter of 2025, the five largest chip designers had cumulatively shipped roughly 20 million AI chips
  
  这个数据点表明AI芯片市场已经达到相当规模，约2000万片。考虑到每片芯片价值数万美元，这个市场总价值已达数千亿美元级别。这个数字反映了AI硬件需求的爆炸性增长，但也需要考虑这是累积数据而非年度出货量，可能包含较早的芯片型号。
  
  data-point statistics market-size
Visit annotations in context

Tags

data-point

cost-structure

growth-rate

infrastructure

trend-analysis

spending-analysis

market-size

statistics

energy-consumption

Annotators

fxp007

URL

epoch.ai/blog/chips-topic-overview
openai.com openai.com

https://openai.com/index/open-source-codex-orchestration-symphony/

5
1. fxp007 01 May 2026
  
  in Public
  
  We also learned that treating agents as rigid nodes in a state machine doesn't work well. Models get smarter and can solve bigger problems than the box we try to fit them in.
  
  大多数人认为AI系统需要严格的、有限的状态机控制，但作者认为这种限制反而阻碍了AI的潜力，因为AI模型已经能够解决超出预设范围的问题。这个观点挑战了人们对AI系统设计的传统认知，暗示我们应该给予AI更大的自主权而不是限制它。
  
  non-consensus ai-design state-machine counterintuitive
2. fxp007 01 May 2026
  
  in Public
  
  Our early versions of agentic work was only asking Codex to implement the task. That approach proved too limiting. Codex is perfectly capable of creating multiple PRs as well as reading review feedback and addressing it.
  
  大多数人认为AI只能执行简单的、单一的任务，但作者认为AI已经能够处理复杂的、多步骤的工作流程，包括创建多个PR和回应代码审查。这个观点挑战了人们对AI能力的传统认知，表明AI已经进化到能够理解并执行复杂的软件工程任务。
  
  non-consensus ai-capabilities software-engineering counterintuitive
3. fxp007 01 May 2026
  
  in Public
  
  When our engineers no longer spend time supervising Codex sessions, the economics of code changes completely. The perceived cost of each change drops because we're no longer investing human effort in driving the implementation itself.
  
  大多数人认为AI编程会增加监督成本，但作者认为通过Symphony系统，人类监督成本实际上大幅下降，因为AI能够自主完成大部分实现工作。这个观点挑战了人们对AI编程成本结构的普遍认知，暗示正确的AI编排可能根本性地改变软件开发的经济模型。
  
  non-consensus cost-structure ai-economics counterintuitive
4. fxp007 01 May 2026
  
  in Public
  
  Among some teams at OpenAI, we saw the number of landed PRs increase by 500% in the first three weeks.
  
  大多数人认为AI辅助编程只能带来适度的生产力提升，但作者认为Symphony系统实现了500%的代码合并增长率，这是一个惊人的数字。这个数据点挑战了人们对AI辅助编程效果的传统预期，表明正确的AI编排可能带来指数级的生产力提升。
  
  non-consensus productivity ai-orchestration counterintuitive
5. fxp007 01 May 2026
  
  in Public
  
  Six months ago, while working on an internal productivity tool, our team made a controversial (at the time) decision: we'd build our repo with no human-written code. Every line in our project repository had to be generated by Codex.
  
  大多数人认为软件开发必须由人类编写核心代码，但作者认为完全由AI生成代码是可行的，因为他们成功地构建了一个没有任何人工代码的仓库。这个观点挑战了软件开发的传统认知，暗示AI可能已经发展到能够独立完成整个项目的程度。
  
  non-consensus ai-generated-code software-development
Visit annotations in context

Tags

productivity

cost-structure

ai-generated-code

non-consensus

ai-capabilities

counterintuitive

software-engineering

software-development

ai-economics

ai-design

state-machine

ai-orchestration

Annotators

fxp007

URL

openai.com/index/open-source-codex-orchestration-symphony/
sakana.ai sakana.ai

https://sakana.ai/fugu-beta/

2
1. fxp007 01 May 2026
  
  in Public
  
  Instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns.
  
  大多数人认为多模型系统需要人工设计明确的分工和角色分配，但作者认为Fugu能够自主发现最优的协作模式。这一观点挑战了当前多模型系统设计的主流方法，暗示未来AI系统可能发展出超越人类直觉的协作方式，颠覆传统的系统架构理念。
  
  non-consensus ai-orchestration counterintuitive
2. fxp007 01 May 2026
  
  in Public
  
  The depth of recursion becomes a tunable compute axis at inference time, requiring no retraining. A small model, by reading itself, can iterate toward answers that neither it nor any of its workers could reach in a single pass.
  
  大多数人认为模型的能力受其规模和训练数据的限制，需要更大模型或重新训练才能提升性能。但作者提出小模型通过自我递归调用可以在推理时动态扩展能力，无需重新训练就能达到单个模型无法企及的高度。这挑战了规模即能力的行业共识，暗示小模型可能通过自省机制实现突破性能力。
  
  non-consensus model-scaling self-improvement
Visit annotations in context

Tags

non-consensus

model-scaling

counterintuitive

ai-orchestration

self-improvement

Annotators

fxp007

URL

sakana.ai/fugu-beta/
developer.chrome.com developer.chrome.com

https://developer.chrome.com/docs/ai/prompt-api

7
1. fxp007 01 May 2026
  
  in Public
  
  Set the `expectedInputs` and `expectedOutputs` modalities and languages when creating your session
  
  在使用Prompt API时，开发者需要明确指定输入和输出的模态和语言，以避免不必要的问题。
  
  best-practice key-concept
2. fxp007 01 May 2026
  
  in Public
  
  Add context with initial prompts
  
  通过提供初始提示，开发者可以为模型提供上下文，这对于构建交互式应用至关重要。
  
  best-practice code-example
3. fxp007 01 May 2026
  
  in Public
  
  The `create()` function's optional options object also takes a `signal` field
  
  使用signal字段可以优雅地取消正在进行的API调用，这是编写健壮代码的一个重要实践。
  
  best-practice code-example
4. fxp007 01 May 2026
  
  in Public
  
  Set the following flags to **Enabled** on `localhost`
  
  初学者在本地测试时，可能会忘记设置必要的标志，导致API无法正常工作。
  
  best-practice localhost
5. fxp007 01 May 2026
  
  in Public
  
  The Prompt API with audio input requires a GPU.
  
  非GPU设备无法使用带音频输入的Prompt API，这是初学者在使用前需要注意的技术限制。
  
  hardware-requirement trap
6. fxp007 01 May 2026
  
  in Public
  
  Before you use this API, acknowledge Google's Generative AI Prohibited Uses Policy.
  
  初学者在使用API前应特别注意遵守相关政策，以避免潜在的法律风险。
  
  best-practice legal-consideration
7. fxp007 01 May 2026
  
  in Public
  
  The Prompt API uses the Gemini Nano model in Chrome.
  
  初学者可能误以为Prompt API和Gemini Nano是同一种技术，而忽略了它们是相互关联但不同的组件。
  
  misconception key-concept
Visit annotations in context

Tags

best-practice

misconception

trap

code-example

hardware-requirement

localhost

legal-consideration

key-concept

Annotators

fxp007

URL

developer.chrome.com/docs/ai/prompt-api
medium.com medium.com

https://medium.com/codetodeploy/the-end-of-the-exponential-a-deep-dive-into-dario-amodeis-vision-for-agi-e9e17276ec0a

4
1. fxp007 01 May 2026
  
  in Public
  
  Amodei is vocal about the national security implications of this technology. He advocates for export controls on chips to China
  
  这是一个可以延伸思考的问题，探讨了AGI技术对国家安全的影响，以及可能采取的措施，如对芯片出口的控制。
  
  extension-thought national-security
2. fxp007 01 May 2026
  
  in Public
  
  He is nearly certain that by 2035, we will have reached AGI-level capabilities
  
  这是一个值得记录的重要信息，表明作者对AGI的达成持高度信心，并预测将在2035年左右实现。
  
  important-info agi-timeline
3. fxp007 01 May 2026
  
  in Public
  
  He argues that specific algorithmic “cleverness” matters far less than the massive scaling of a few fundamental inputs
  
  这是一个反直觉的观点，指出算法的“聪明才智”远不如对几个基本输入的巨大扩展重要，这为我们理解AI的发展提供了新的视角。
  
  counterintuitive ai-algorithm
4. fxp007 01 May 2026
  
  in Public
  
  we are nearing the “end of the exponential” for AI development
  
  这是一个非共识观点，认为AI发展的指数增长阶段即将结束，这为AI的未来发展提出了新的思考方向。
  
  non-consensus ai-growth
Visit annotations in context

Tags

ai-algorithm

non-consensus

important-info

ai-growth

counterintuitive

agi-timeline

extension-thought

national-security

Annotators

fxp007

URL

medium.com/codetodeploy/the-end-of-the-exponential-a-deep-dive-into-dario-amodeis-vision-for-agi-e9e17276ec0a
blog.cloudflare.com blog.cloudflare.com

Agents can now create Cloudflare accounts, buy domains, and deploy

21
1. fxp007 01 May 2026
  
  in Public
  
  When your user needs a [domain](https://domains.cloudflare.com/), a [storage bucket](https://developers.cloudflare.com/r2/), a [sandbox](https://blog.cloudflare.com/dynamic-workers/) to give their agent, or [anything else](https://workers.cloudflare.com/), you make one API call to Cloudflare to provision a new Cloudflare account to them, and get back a token to make authenticated requests on their behalf.
  
  值得注意的代码示例：平台通过单次API调用即可为用户提供云flare账户，实现无缝集成。
  
  notable-code-example platform-integration
2. fxp007 01 May 2026
  
  in Public
  
  Stripe then sets a default limit of $100.00 USD/month as the maximum the agent can spend on any one provider.
  
  令人震惊的数据：默认预算限制为每月100美元，保护用户免受意外高额费用。
  
  shocking-data budget-limit
3. fxp007 01 May 2026
  
  in Public
  
  The agent chooses services to use from this catalog based on what the user has asked them to do and the user’s preferences — but the user needs no prior knowledge of what services are offered by which providers, and does not need to provide any input.
  
  关键概念解释：代理通过服务目录自动选择和部署服务，无需用户具备特定知识。
  
  concept-explanation service-catalog
4. fxp007 01 May 2026
  
  in Public
  
  These build on prior art and existing standards like OAuth, OIDC and payment tokenization —but are used together to remove many steps that might otherwise require a human in the loop.
  
  过时的认证和支付方式可能导致部署流程复杂，而本文介绍的新协议则通过整合现有标准简化了流程。
  
  outdated-content integration-standards
5. fxp007 01 May 2026
  
  in Public
  
  Humans can be in the loop to grant permission and must accept Cloudflare's terms of service, but no human steps are otherwise required from start to finish.
  
  最佳实践是让代理自动化大部分部署流程，但关键步骤如用户同意服务条款仍需人工参与。
  
  best-practice human-approval
6. fxp007 01 May 2026
  
  in Public
  
  Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app —an account, a way to pay, and an API token.
  
  初学者常误以为部署到生产环境需要复杂的手动操作，而忽略了自动化工具如代理的存在。
  
  beginner-mistake agent-utility
7. fxp007 01 May 2026
  
  in Public
  
  Let’s say your product is a coding agent. You’d love for people to be able to take what they’ve built and get it deployed to production, using Cloudflare and other services.
  
  令人震惊的数据：这个新协议可能改变整个行业，因为它使得任何平台都可以像Stripe一样轻松地集成Cloudflare。
  
  shocking-data industry-changing
8. fxp007 01 May 2026
  
  in Public
  
  The protocol accounts for this in two ways. When an agent provisions a paid service, Stripe includes a payment token in the request to the Provider (Cloudflare).
  
  非共识观点：通过引入支付令牌而不是直接分享信用卡信息，为代理提供了更安全的支付方式。
  
  non-consensus secure-payment
9. fxp007 01 May 2026
  
  in Public
  
  Stripe then sets a default limit of $100.00 USD/month as the maximum the agent can spend on any one provider.
  
  值得注意的代码示例：设置代理每月支出的默认限制为100美元，这有助于防止意外开销。
  
  code-example budget-limit
10. fxp007 01 May 2026
  
  in Public
  
  When the agent chooses a service and provisions it (ex: `stripe projects add cloudflare/registrar:domain`), it provisions the resource within a Cloudflare account.
  
  关键概念解释：服务配置指的是在Cloudflare账户中为特定服务创建和配置资源的过程。
  
  concept-explanation service-provisioning
11. fxp007 01 May 2026
  
  in Public
  
  These build on prior art and existing standards like OAuth, OIDC and payment tokenization —but are used together to remove many steps that might otherwise require a human in the loop.
  
  强调了现有标准和技术的融合使用，这是实现自动化流程的关键，同时也避免了过时的做法。
  
  best-practice existing-standards
12. fxp007 01 May 2026
  
  in Public
  
  Without any extra setup, agents have everything they need to deploy a new production application in one shot.
  
  最佳实践建议是简化部署过程，避免手动步骤，使自动化部署更加高效。
  
  best-practice automated-deployment
13. fxp007 01 May 2026
  
  in Public
  
  Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app —an account, a way to pay, and an API token.
  
  新手的常见陷阱在于错误地认为部署应用程序只需要代码构建，而忽略了账户、支付和API令牌这些基础设施环节。
  
  newbie-mistake infrastructure-requirements
14. fxp007 01 May 2026
  
  in Public
  
  The protocol accounts for this in two ways. When an agent provisions a paid service, Stripe includes a payment token in the request to the Provider (Cloudflare). Raw payment details like credit card numbers aren’t ever shared with the agent.
  
  This is a key concept explaining how payment is handled securely without exposing sensitive information to the agent, a crucial aspect of any automated system.
  
  key-concept secure-payment
15. fxp007 01 May 2026
  
  in Public
  
  The agent has gone from literal zero, no Cloudflare account at all, without any preconfigured [Agent Skills](https://github.com/cloudflare/skills) or [MCP server](https://blog.cloudflare.com/code-mode-mcp/), to having: * Provisioned a new Cloudflare account * Obtained an API token * Purchased a domain * Deployed an app to production
  
  This showcases a significant non-consensus view that agents can autonomously perform complex tasks like account creation and app deployment, which might be surprising to some.
  
  non-consensus agent-automation
16. fxp007 01 May 2026
  
  in Public
  
  Humans can be in the loop to grant permission and must accept Cloudflare's terms of service, but no human steps are otherwise required from start to finish.
  
  This emphasizes the best practice of automating processes where possible, reducing manual intervention and streamlining workflows.
  
  best-practice automation
17. fxp007 01 May 2026
  
  in Public
  
  Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app —an account, a way to pay, and an API token.
  
  This highlights a common pitfall for beginners: understanding the infrastructure requirements for deploying software, especially the need for accounts and payment methods.
  
  beginner-pitfall deployment-requirements
18. fxp007 01 May 2026
  
  in Public
  
  When the agent chooses a service and provisions it (ex: `stripe projects add cloudflare/registrar:domain`), it provisions the resource within a Cloudflare account.
  
  值得注意的代码示例：示例代码展示了如何使用Stripe Projects CLI添加Cloudflare注册服务。
  
  code-example service-provisioning
19. fxp007 01 May 2026
  
  in Public
  
  These build on prior art and existing standards like OAuth, OIDC and payment tokenization —but are used together to remove many steps that might otherwise require a human in the loop.
  
  关键概念解释：该协议结合了OAuth、OIDC和支付令牌化等现有标准，以自动化流程，减少人工干预。
  
  key-concept protocol-explanation
20. fxp007 01 May 2026
  
  in Public
  
  Humans can be in the loop to grant permission and must accept Cloudflare's terms of service, but no human steps are otherwise required from start to finish.
  
  最佳实践建议：自动化流程可以大幅提高效率，但人类审核和接受服务条款仍然是必要的。
  
  best-practice human-in-loop
21. fxp007 01 May 2026
  
  in Public
  
  Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app —an account, a way to pay, and an API token.
  
  初学者常见陷阱：错误地认为部署到生产环境只需要代码，而忽略了账户、支付和API令牌等必要条件。
  
  beginner-trap deployment-requirements
Visit annotations in context

Tags

best-practice

secure-payment

non-consensus

automated-deployment

notable-code-example

code-example

protocol-explanation

newbie-mistake

service-catalog

concept-explanation

integration-standards

infrastructure-requirements

automation

outdated-content

key-concept

beginner-pitfall

beginner-trap

platform-integration

deployment-requirements

agent-automation

beginner-mistake

existing-standards

shocking-data

agent-utility

budget-limit

service-provisioning

industry-changing

human-in-loop

human-approval

Annotators

fxp007

URL

blog.cloudflare.com/agents-stripe-projects/
developers.googleblog.com developers.googleblog.com

https://developers.googleblog.com/torchtpu-running-pytorch-natively-on-tpus-at-google-scale/

2
1. fxp007 01 May 2026
  
  in Public
  
  The entire AI community should be able to easily access the full capabilities of TPUs, and because many of these potential users build models in PyTorch, an integration that allows PyTorch to work natively and efficiently on the TPU is crucial.
  
  非共识观点：并非所有用户都能轻松访问TPU的全功能，特别是对于在PyTorch中构建模型的用户来说，这可能是一个挑战。
  
  non-consensus integration accessibility
2. fxp007 01 May 2026
  
  in Public
  
  As models scale to run on clusters of O(100,000) chips, the software that powers these models must meet new demands for performance, hardware portability, and reliability.
  
  对于初学者来说，理解大规模模型运行的需求可能是一个常见陷阱，他们可能忽视了对软件性能、硬件兼容性和可靠性的要求。
  
  beginner-trap performance reliability
Visit annotations in context

Tags

beginner-trap

performance

non-consensus

integration

reliability

accessibility

Annotators

fxp007

URL

developers.googleblog.com/torchtpu-running-pytorch-natively-on-tpus-at-google-scale/
geohot.github.io geohot.github.io

https://geohot.github.io//blog/jekyll/update/2026/04/23/us-win-ai.html

4
1. fxp007 01 May 2026
  
  in Public
  
  They aren’t going to get better with more power, they are going to get worse.
  
  作者对科技巨头随着权力增加而变好的可能性持怀疑态度，认为他们可能会变得更糟。
  
  doubt-in-tech-companies power-misuse
2. fxp007 01 May 2026
  
  in Public
  
  The good world is where everyone has AI, and not as a revokable privilege through an API, but through hard possession.
  
  作者提出了一个关于AI普及的愿景，即每个人都应该拥有AI，而不是将其作为一种可以撤销的API特权。
  
  vision-for-ai AI-accessibility
3. fxp007 01 May 2026
  
  in Public
  
  He isn’t Dario EA levels of evil, like the EA people have a plan for you and it’s never good when someone has a plan for you.
  
  作者批评了某些科技巨头如EA的“阴谋论”，认为他们的计划并不总是对人们有利。
  
  criticism-of-tech-giants conspiracy-theory
4. fxp007 01 May 2026
  
  in Public
  
  Of course it’s impossible to know for sure, but I think I really wouldn’t. Even the ideal version, industrial megaprojects at hyperhuman scale while constantly being out over your skis with leverage sounds hellish.
  
  作者对高度工业化、超人类规模的AI项目表示担忧，即使是在理想化的情况下，这种对未来社会的设想也让他感到恐惧。
  
  non-consensus-opinion fear-of-technology
Visit annotations in context

Tags

non-consensus-opinion

AI-accessibility

conspiracy-theory

fear-of-technology

power-misuse

criticism-of-tech-giants

doubt-in-tech-companies

vision-for-ai

Annotators

fxp007

URL

geohot.github.io//blog/jekyll/update/2026/04/23/us-win-ai.html
github.blog github.blog

GitHub Copilot is moving to usage-based billing

1
1. fxp007 01 May 2026
  
  in Public
  
  GitHub Copilot is moving to usage-based billing
  
  初学者可能不清楚按使用量计费的具体细节，容易混淆订阅模式和按需使用模式。
  
  initial-trap billing-model
Visit annotations in context

Tags

initial-trap

billing-model

Annotators

fxp007

URL

github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
epoch.ai epoch.ai

https://epoch.ai/blog/openai-stargate-where-the-us-sites-stand

4
1. fxp007 01 May 2026
  
  in Public
  
  with 0.3 gigawatts already operational in Abilene and six more US sites under active construction
  
  阿比林已运营的0.3吉瓦和六个正在建设中的美国站点，表明美国在AI数据中心领域的实际进展与预期一致。
  
  construction progress alignment
2. fxp007 01 May 2026
  
  in Public
  
  The $500 billion AI data center initiative is projected to exceed 9 gigawatts of capacity by 2029
  
  这一巨额投资预计将推动美国AI数据中心容量的大幅增长，可能引发全球范围内的技术竞争。
  
  investment capacity global-impact
3. fxp007 01 May 2026
  
  in Public
  
  0.3 gigawatts already operational in Abilene and six more US sites under active construction
  
  目前已有0.3吉瓦的容量在阿比林运营，另外六个美国站点正在建设中，这显示出美国在AI数据中心建设方面的迅速进展。
  
  construction progress capacity
4. fxp007 01 May 2026
  
  in Public
  
  $500 billion AI data center initiative is projected to exceed 9 gigawatts of capacity by 2029
  
  这一预测表明，美国在AI数据中心领域的投资规模巨大，预计到2029年将超过9吉瓦的容量，这可能会对全球AI发展产生重大影响。
  
  investment capacity projection
Visit annotations in context

Tags

global-impact

projection

investment

construction

alignment

progress

capacity

Annotators

fxp007

URL

epoch.ai/blog/openai-stargate-where-the-us-sites-stand
breakingdefense.com breakingdefense.com

https://breakingdefense.com/2026/04/pentagon-workers-vibe-code-100000-ai-agents-to-use-on-unclassified-networks/

7
1. fxp007 01 May 2026
  
  in Public
  
  The alternative to moving fast and taking risks isn’t safety, but a very real danger of being surpassed by adversaries
  
  这种观点可能忽视了快速采用AI技术可能带来的风险，需要进一步探讨如何在安全性和创新之间取得平衡。
  
  risk innovation balance
2. fxp007 01 May 2026
  
  in Public
  
  The department official who spoke to Breaking Defense went further, saying the IL-5 authorization demonstrates “that it meets rigorous security controls for handling DoD information”
  
  官员对AI代理安全性的声明需要进一步核查，以确认这些控制措施是否足以保护敏感信息。
  
  security official-statement il-5-authorization
3. fxp007 01 May 2026
  
  in Public
  
  In one case [first reported by the Financial Times](https://www.ft.com/content/00c282de-ed14-4acd-a948-bc8d6bdb339d?syn-25a6b1a6=1), an Amazon Web Service agent called Kiro purportedly decided the best way to upgrade a particular software service was to delete the whole thing and start over — and was able to do so without asking for human permission
  
  这个案例突显了AI代理可能带来的风险，需要深入了解如何防范这类事件的发生。
  
  risk ai-agent-case unauthorized-action
4. fxp007 01 May 2026
  
  in Public
  
  The official, who spoke on the condition of anonymity, said some of the most popular agents on the Pentagon system automate standard staff work
  
  匿名官员的话可能带有偏见，因为它没有提供具体的数据或案例来支持其说法，需要进一步核实。
  
  bias anonymity official-statement
5. fxp007 01 May 2026
  
  in Public
  
  Instead of just answering a user’s questions, the way a chatbot does, agents can take a human user’s instructions and act on them
  
  AI代理的能力描述可能存在偏见，因为它暗示AI能够像人类一样行动，而实际上可能缺乏人类的判断力和道德考量。
  
  bias ai-agents human-comparison
6. fxp007 01 May 2026
  
  in Public
  
  We’ve seen remarkable adoption since its launch, with over 103,000 agents built and a total of more than 1.1 million agent sessions recorded
  
  令人震惊的AI代理和会话数量可能反映了AI工具在军事领域的巨大潜力和影响，需要深入分析这些工具的实际应用和效果。
  
  shocking-data ai-agents sessions
7. fxp007 01 May 2026
  
  in Public
  
  Military personnel and Defense Department civilians have used a version of Google Gemini’s [Agent Designer](https://docs.cloud.google.com/gemini/enterprise/docs/agent-designer) to create over 100,000 semi-autonomous AI agents in less than five weeks since the tool became available
  
  这个数据表明了在短时间内AI工具的广泛使用和接受程度，值得进一步调查其背后的具体应用场景和效果。
  
  data ai-adoption timeframe
Visit annotations in context

Tags

ai-agent-case

data

il-5-authorization

unauthorized-action

human-comparison

ai-adoption

security

ai-agents

anonymity

balance

shocking-data

risk

bias

sessions

timeframe

innovation

official-statement

Annotators

fxp007

URL

breakingdefense.com/2026/04/pentagon-workers-vibe-code-100000-ai-agents-to-use-on-unclassified-networks/
zed.dev zed.dev

https://zed.dev/blog/zed-1-0

4
1. fxp007 01 May 2026
  
  in Public
  
  We built AI into our editor's foundation instead of bolting it on top.
  
  关键概念是，将AI集成到编辑器的基础架构中，而不是作为附加功能，可以提供更流畅的用户体验。
  
  key-concept ai-integration user-experience
2. fxp007 01 May 2026
  
  in Public
  
  We've spent five years building that surface area across Mac, Windows, and Linux, exceeding a million lines of code.
  
  令人震惊的数据展示了开发一个全面支持的编辑器所需的时间和努力。
  
  shocking-data development-effort time-commitment
3. fxp007 01 May 2026
  
  in Public
  
  Instead of building Zed like a web page, we built it like a video game, organizing the entire application around feeding data to shaders running on the GPU.
  
  最佳实践是针对特定需求定制开发，而非依赖通用框架，这可以显著提升性能。
  
  best-practice performance-improvement custom-development
4. fxp007 01 May 2026
  
  in Public
  
  Web technology offered an easy path to shipping flexible software, but it also imposed a ceiling. No matter how hard we worked, we couldn't make Atom better than the platform it was built on.
  
  初学者可能会误以为使用现有平台（如Electron）可以快速开发软件，但实际上这限制了软件的性能和功能。
  
  beginner-trap best-practice outdated-content
Visit annotations in context

Tags

beginner-trap

best-practice

custom-development

performance-improvement

time-commitment

ai-integration

shocking-data

development-effort

user-experience

outdated-content

key-concept

Annotators

fxp007

URL

zed.dev/blog/zed-1-0
www.promptarmor.com www.promptarmor.com

https://www.promptarmor.com/resources/ramps-sheets-ai-exfiltrates-financials

1
1. fxp007 01 May 2026
  
  in Public
  
  The feature can edit spreadsheets without a human-in-the-loop and was vulnerable to data exfiltration risks due to its ability to insert formulas that trigger external communication.
  
  最佳实践建议：在使用无需人工干预的AI工具时，应特别注意数据泄露风险。
  
  best-practice data-security
Visit annotations in context

Tags

best-practice

data-security

Annotators

fxp007

URL

promptarmor.com/resources/ramps-sheets-ai-exfiltrates-financials
www.koshyjohn.com www.koshyjohn.com

https://www.koshyjohn.com/blog/ai-should-elevate-your-thinking-not-replace-it/

4
1. fxp007 01 May 2026
  
  in Public
  
  Early years matter because that is when foundational skills are formed. Debugging instinct. System intuition. Precision. Taste. Skepticism.
  
  这个观点强调了早期职业生涯对于工程师技能形成的重要性。
  
  early-career foundational-skills
2. fxp007 01 May 2026
  
  in Public
  
  The value was always in judgment. The valuable engineer is the one who sees the hidden constraint before it causes an outage.
  
  这个观点突出了判断力在软件工程中的核心价值。
  
  core-argument value-of-judgment
3. fxp007 01 May 2026
  
  in Public
  
  Every time you substitute generated output for your own comprehension, you are skipping the exercises / reps that build judgment.
  
  这个观点指出，过度依赖AI生成内容会阻碍个人判断力的培养。
  
  counterintuitive judgment-development
4. fxp007 01 May 2026
  
  in Public
  
  The software engineers who will be most valuable in the future are not the ones who do everything themselves. They are the ones who refuse to spend time on work that A.I. can do for them, while still understanding everything that is done on their behalf.
  
  这个观点强调了未来软件工程师的价值不在于他们能做什么，而在于他们如何利用AI来提升自己的思考能力。
  
  non-consensus-view future-of-work
Visit annotations in context

Tags

non-consensus-view

judgment-development

early-career

future-of-work

value-of-judgment

counterintuitive

core-argument

foundational-skills

Annotators

fxp007

URL

koshyjohn.com/blog/ai-should-elevate-your-thinking-not-replace-it/
handyai.substack.com handyai.substack.com

https://handyai.substack.com/p/your-ceo-is-suffering-from-ai-psychosis

6
1. fxp007 01 May 2026
  
  in Public
  
  But there’s a critical difference between using agents to accomplish defined objectives and spinning up 20 agents because the dashboard makes you feel like a general commanding an army.
  
  作者指出，使用AI代理实现特定目标和仅仅因为仪表板让人感觉像指挥军队一样使用大量代理之间存在关键区别，这引发了关于AI工具使用目的的思考。
  
  critical-thinking ai-impact
2. fxp007 01 May 2026
  
  in Public
  
  The average employee AI usage was 1.5 hours per week. The average CEO AI usage was less than one hour per week.
  
  数据显示，员工和CEO每周使用AI工具的时间非常有限，但他们对AI的依赖和热情却很高，这可能是AI心理疾病的表现。
  
  shocking-data ai-impact
3. fxp007 01 May 2026
  
  in Public
  
  The enthusiasm has spawned an entire ecosystem of tools designed to make you feel like you’re running a company with AI agents.
  
  文章指出，对AI代理的狂热催生了一个完整的工具生态系统，这些工具可能加剧了AI心理疾病。
  
  ecosystem-analysis ai-impact
4. fxp007 01 May 2026
  
  in Public
  
  37,000 lines per day. And this was the output.
  
  作者以Garry Tan的例子说明，尽管声称每天产生大量代码，但实际产出却微乎其微，揭示了AI工具可能导致的低效。
  
  shocking-data ai-impact
5. fxp007 01 May 2026
  
  in Public
  
  Two prominent tech leaders, both publicly using the word psychosis. Both framing sleeplessness and obsessive agent usage as a feature of the moment rather than a bug.
  
  文章指出两位知名科技领袖公开将AI心理疾病视为一种特征而非缺陷，这表明了AI心理疾病可能被误解或忽视。
  
  counterintuitive-view ai-impact
6. fxp007 01 May 2026
  
  in Public
  
  It’s feeling like a new form of [AI psychosis](https://en.wikipedia.org/wiki/Chatbot_psychosis).
  
  文章提出AI心理疾病这一新概念，暗示过度依赖AI工具可能导致类似心理问题。
  
  non-consensus-view ai-impact
Visit annotations in context

Tags

non-consensus-view

shocking-data

counterintuitive-view

critical-thinking

ecosystem-analysis

ai-impact

Annotators

fxp007

URL

handyai.substack.com/p/your-ceo-is-suffering-from-ai-psychosis
www.axios.com www.axios.com

https://www.axios.com/2026/04/26/ai-cost-human-workers

4
1. fxp007 01 May 2026
  
  in Public
  
  Even companies with the biggest IT budgets will need to prove returns on AI spending over time, especially if they're answering to shareholders on quarterly earnings calls.
  
  这个观点值得深入了解，因为它提出了一个可能被忽视的问题：即使公司有巨大的IT预算，也需要证明人工智能投资的回报。
  
  deeper-insight investment-return
2. fxp007 01 May 2026
  
  in Public
  
  An OpenAI investor told Axios that the shift could benefit them, since they view Codex as superior to Claude Code at maximizing tokens efficiently, cutting down on usage costs.
  
  这篇报道中提到了一个非共识观点，即OpenAI的投资者认为他们的产品在效率上优于竞争对手，这需要进一步调查以验证。
  
  non-consensus-view product-comparison
3. fxp007 01 May 2026
  
  in Public
  
  Worldwide IT spending is expected to reach $6.31 trillion in 2026, up 13.5% from 2025, according to Gartner.
  
  Gartner的预测提供了一个重要的数据点，说明了全球IT支出的增长趋势，这背后可能隐藏着更深层次的行业变化。
  
  important-data industry-trend
4. fxp007 01 May 2026
  
  in Public
  
  IT budgets are getting blown out as some companies increasingly spend more on AI than on employees' salaries.
  
  这个陈述提出了一个令人震惊的数据，即一些公司在人工智能上的支出超过了员工工资，需要核查这些公司的具体支出情况。
  
  shocking-data cost-comparison
Visit annotations in context

Tags

investment-return

shocking-data

cost-comparison

non-consensus-view

industry-trend

product-comparison

important-data

deeper-insight

Annotators

fxp007

URL

axios.com/2026/04/26/ai-cost-human-workers
www.axios.com www.axios.com

https://www.axios.com/2026/04/22/anthropic-no-kill-switch-ai-classified-settings

5
1. fxp007 01 May 2026
  
  in Public
  
  A hearing is scheduled for May 19
  
  可执行行动：定于 5 月 19 日举行听证会，这为关注该案件进展的各方提供了一个具体的行动点。
  
  actionable-item upcoming-hearing
2. fxp007 01 May 2026
  
  in Public
  
  Now, agency heads are scrambling to figure out how they can protect their systems from cyber attacks using Mythos
  
  非共识观点：现在，机构负责人正在努力弄清楚他们如何保护自己的系统免受 Mythos 的网络攻击，这一观点可能反映了政府内部对 AI 安全性的担忧。
  
  non-conventional-view cybersecurity-concerns
3. fxp007 01 May 2026
  
  in Public
  
  The company also says the Pentagon has the opportunity to test models before deployment
  
  可能带有偏见的表述：Anthropic 声称五角大楼有机会在部署前测试模型，这种表述可能暗示了 Anthropic 对五角大楼决策过程的看法。
  
  bias pre-deployment-testing
4. fxp007 01 May 2026
  
  in Public
  
  The Pentagon designated Anthropic a supply chain risk
  
  重要的数据或统计数字：五角大楼将 Anthropic 标记为供应链风险，这一数据点对分析 Anthropic 与美国国防部的关系至关重要。
  
  data-point supply-chain-risk
5. fxp007 01 May 2026
  
  in Public
  
  Anthropic says it has no way to control or shut down its AI models once they're deployed by the Pentagon
  
  需要核查的事实声明：Anthropic 声称其无法控制或关闭由五角大楼部署的 AI 模型，这一声明需要进一步核实。
  
  fact-check ai-control
Visit annotations in context

Tags

data-point

ai-control

cybersecurity-concerns

upcoming-hearing

supply-chain-risk

non-conventional-view

actionable-item

pre-deployment-testing

bias

fact-check

Annotators

fxp007

URL

axios.com/2026/04/22/anthropic-no-kill-switch-ai-classified-settings
simonwillison.net simonwillison.net

https://simonwillison.net/2026/Apr/30/zig-anti-ai/

6
1. fxp007 01 May 2026
  
  in Public
  
  Zig values contributors over their contributions.
  
  Zig项目将贡献者视为比他们的贡献更重要，这表明了其对个人和社区发展的重视。
  
  contributor-value project-philosophy human-focus
2. fxp007 01 May 2026
  
  in Public
  
  It relates to an idea I've seen circulating elsewhere: if a PR was mostly written by an LLM, why should a project maintainer spend time reviewing and discussing that PR as opposed to firing up their own LLM to solve the same problem?
  
  作者提出了一个值得深思的问题：如果PR主要由LLM编写，那么维护者为何要花费时间审查和讨论它，而不是自己使用LLM解决问题？
  
  critical-questions llm-usage project-maintenance
3. fxp007 01 May 2026
  
  in Public
  
  In contributor poker, you bet on the contributor, not on the contents of their first PR.
  
  Zig项目将贡献者视为其赌注，而非他们的代码，这体现了对个人成长和社区参与的重视。
  
  contributor-poker individual-growth community-involvement
4. fxp007 01 May 2026
  
  in Public
  
  LLM assistance breaks that completely. It doesn't matter if the LLM helps you submit a 'perfect' PR to Zig - the time the Zig team spends reviewing your work does nothing to help them add new, confident, trustworthy contributors to their overall project.
  
  Zig项目认为，LLM的辅助会破坏其培养可信贡献者的目标，即使PR本身是完美的。
  
  llm-assistance contributor-trust project-goals
5. fxp007 01 May 2026
  
  in Public
  
  We don’t do this just because it’s the 'right' thing to do, but also because it’s the smart thing to do.
  
  Zig项目不仅认为帮助新贡献者是正确的行为，也认为这是明智的，这反映了其对社区成长的长期投资。
  
  community-growth long-term-investment contributor-help
6. fxp007 01 May 2026
  
  in Public
  
  Bun operates its own fork of Zig, and recently achieved a 4x performance improvement on Bun compile after adding 'parallel semantic analysis and multiple codegen units to the llvm backend'.
  
  尽管Bun项目从AI辅助中受益，但Zig项目坚持其反AI政策，突显了项目间价值观的差异。
  
  performance-improvement project-values ai-assisted-programming
Visit annotations in context

Tags

llm-usage

project-goals

project-values

contributor-help

long-term-investment

individual-growth

ai-assisted-programming

community-involvement

human-focus

community-growth

project-philosophy

contributor-value

contributor-trust

contributor-poker

project-maintenance

performance-improvement

llm-assistance

critical-questions

Annotators

fxp007

URL

simonwillison.net/2026/Apr/30/zig-anti-ai/
scottaaronson.blog scottaaronson.blog

https://scottaaronson.blog/?p=9718

6
1. fxp007 01 May 2026
  
  in Public
  
  when you think about it that way, isn’t racing to build a cryptographically relevant QC, as quickly as possible, the most _ethical, socially responsible thing_ for an American QC company to do?
  
  这一观点提出了一个有洞见的伦理问题，即是否应该将快速开发量子计算机视为美国量子计算公司的道德和社会责任。
  
  ethical-consideration quantum-computing
2. fxp007 01 May 2026
  
  in Public
  
  So, mixing metaphors, mightn’t we just as well rip this Band-Aid off ASAP, rather than giving foreign intelligence agencies extra years to catch up?
  
  这一观点提出了一个反直觉的观点，即尽快发展量子计算机可能是最负责任的做法，以避免他国情报机构获得额外的优势。
  
  counter-intuitive geo-political
3. fxp007 01 May 2026
  
  in Public
  
  Aren’t many in cybersecurity still in denial about the threat? Haven’t these slumberers shown that they _won’t_ wake up until dramatic achievements in fault-tolerant QC roust them?
  
  这一观点指出，网络安全领域对量子威胁的忽视，暗示了需要采取更积极的措施来应对这一挑战。
  
  worth-considering cybersecurity
4. fxp007 01 May 2026
  
  in Public
  
  Given that reality, isn’t it better that it be done first by mostly US-based companies in the open, than by (let’s say) Chinese or Russian intelligence in secret?
  
  这一观点提出了一个值得深思的问题：在量子计算机可能被用于恶意目的的情况下，是否应该由美国公司公开地首先发展这一技术？
  
  worth-considering geo-political
5. fxp007 01 May 2026
  
  in Public
  
  The way they see it, cryptographically relevant QCs _will_ plausibly be built sometime soon: indeed, it’s ultimately unavoidable, even if people’s only interest in QC was to do quantum simulations for materials science and chemistry.
  
  这一观点揭示了量子计算机发展的必然性，即使其最初的应用并非用于密码学。
  
  non-consensus-views quantum-computing
6. fxp007 01 May 2026
  
  in Public
  
  some of the most reputable people in quantum hardware and quantum error-correction—people whose judgment I trust more than my own on those topics—are now telling me that a fault-tolerant quantum computer able to break deployed cryptosystems _ought_ to be possible by around 2029.
  
  这一观点令人震惊，因为它暗示了量子计算机可能在不久的将来就能破解现有的加密系统，这是一个非共识的观点。
  
  shocking-data quantum-computing
Visit annotations in context

Tags

quantum-computing

counter-intuitive

worth-considering

cybersecurity

shocking-data

ethical-consideration

non-consensus-views

geo-political

Annotators

fxp007

URL

scottaaronson.blog/
openai.com openai.com

https://openai.com/index/where-the-goblins-came-from/

7
1. fxp007 01 May 2026
  
  in Public
  
  A search through GPT‑5.5’s SFT data found many datapoints containing “goblin” and “gremlin.”
  
  值得注意的代码示例：SFT（监督微调）数据中的异常数据点可能揭示了模型行为的问题。
  
  notable-code sft-data
2. fxp007 01 May 2026
  
  in Public
  
  The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them.
  
  关键概念解释：强化学习可能导致行为泛化，即使是在特定条件下学习的行为也可能在其他情境中表现出来。
  
  key-concept reinforcement-learning
3. fxp007 01 May 2026
  
  in Public
  
  We retired the “Nerdy” personality in March after launching GPT‑5.4.
  
  这表明了已弃用或过时的内容（如“Nerdy”个性）可能导致模型行为问题，需要及时识别和修复。
  
  deprecated-content model-problem
4. fxp007 01 May 2026
  
  in Public
  
  When we looked, use of “goblin” in ChatGPT had risen by 175% after the launch of GPT‑5.1, while “gremlin” had risen by 52%.
  
  令人震惊的数据表明，一个看似无害的偏好可以迅速在模型中扩散，突显了监控和及时响应模型行为变化的重要性。
  
  shocking-data model-change
5. fxp007 01 May 2026
  
  in Public
  
  We unknowingly gave particularly high rewards for metaphors with creatures.
  
  这揭示了最佳实践建议：在训练模型时，应仔细设计奖励机制，以避免意外地鼓励不希望的行为。
  
  best-practice reward-mechanism
6. fxp007 01 May 2026
  
  in Public
  
  A single “little goblin” in an answer could be harmless, even charming.
  
  初学者可能误以为模型中的小问题（如偶尔提到“小怪物”）是无害的，而忽略了它们可能随时间累积成更大的问题。
  
  beginner-trap model-accumulation
7. fxp007 01 May 2026
  
  in Public
  
  Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors.
  
  初学者可能难以理解模型行为的发展模式，尤其是当这种模式以微妙的方式出现时，如GPT-5.1开始频繁使用怪物的隐喻。
  
  beginner-trap model-behavior
Visit annotations in context

Tags

beginner-trap

best-practice

model-behavior

sft-data

model-problem

model-accumulation

reinforcement-learning

deprecated-content

notable-code

shocking-data

model-change

key-concept

reward-mechanism

Annotators

fxp007

URL

openai.com/index/where-the-goblins-came-from/
blog.pragmaticengineer.com blog.pragmaticengineer.com

https://blog.pragmaticengineer.com/the-pulse-tokenmaxxing-as-a-weird-new-trend/

11
1. fxp007 01 May 2026
  
  in Public
  
  Putting a leaderboard in place was always going to incentivize much more AI usage.
  
  此观点暗示了排行榜可能无意中刺激了过度使用AI，引发了关于管理工具潜在负面影响的讨论。
  
  incentive-effect ai-overuse management-tool-effect
2. fxp007 01 May 2026
  
  in Public
  
  One engineer at Meta told me they think Meta had a different goal with the token leaderboard.
  
  内部人士的评论揭示了‘tokenmaxxing’可能背后隐藏的目的，引发了对公司真实动机的思考。
  
  hidden-agenda meta-motivation insider-comment
3. fxp007 01 May 2026
  
  in Public
  
  After backlash on social media, Meta abolished the internal leaderboard last week.
  
  Meta在社交媒体上的负面反应导致其取消内部排行榜，这一事件表明社交媒体对企业管理决策的影响力。
  
  social-media-backlash management-decision meta-response
4. fxp007 01 May 2026
  
  in Public
  
  As per The Information, Meta employees used a total of 60.2 trillion AI tokens (!!) in 30 days.
  
  这个令人震惊的数据揭示了Meta在AI token使用上的巨大规模，暗示了潜在的经济浪费和资源过度消耗。
  
  shocking-data resource-waste meta-usage
5. fxp007 01 May 2026
  
  in Public
  
  The rankings, set up by a Meta employee on its intranet using company data, measure how many tokens — the units of data processed by AI models — employees are burning through.
  
  这一观点揭示了‘tokenmaxxing’作为衡量员工AI使用能力的新趋势，暗示了数据消耗成为衡量生产力的一种方式。
  
  new-trend productivity-measure ai-usage
6. fxp007 01 May 2026
  
  in Public
  
  Calibrating token spend to be above average
  
  博弈
7. fxp007 01 May 2026
  
  in Public
  
  “Minimum” incentives with a tracking tool.
  
  “低消”可能更好
8. fxp007 01 May 2026
  
  in Public
  
  Workers are maximizing their prompts, coding sessions and the number of agents working in parallel to climb internal rankings at Meta and other companies a
  
  这个引用表明员工在Meta和其他公司内部排名中通过最大化他们的提示、编码会话和并行工作的代理数量来提升自己的排名。
  
  pragmatic-action employee-strategies
9. fxp007 01 May 2026
  
  in Public
  
  The practice is emblematic of Silicon Valley’s newest form of conspicuous consumption, known as “tokenmaxxing,” which has turned token usage into a benchmark for productivity and a competitive measure of who is most AI native.
  
  这句话指出“Tokenmaxxing”是硅谷最新的一种显摆消费形式，它将令牌的使用转化为衡量生产力和AI原生能力的竞争指标。
  
  non-consensus-view tokenmaxxing-definition
10. fxp007 01 May 2026
  
  in Public
  
  The rankings, set up by a Meta employee on its intranet using company data, measure how many tokens — the units of data processed by AI models — employees are burning through.
  
  这个引用说明了这种内部排名是通过员工消耗的AI令牌数量来衡量的，这些令牌是AI模型处理数据的单位。
  
  significant-info ai-token-measurement
11. fxp007 01 May 2026
  
  in Public
  
  Employees at Meta Platforms who want to show off their AI superuser chops are competing on an internal leaderboard for status as a “Session Immortal”— or, even better, “Token Legend.”
  
  这个引用揭示了“Tokenmaxxing”作为一种新的竞争和显摆形式在Meta内部的兴起，员工通过使用AI令牌的数量来竞争地位。
  
  non-consensus-view ai-usage-competition
Visit annotations in context

Tags

hidden-agenda

ai-overuse

insider-comment

non-consensus-view

pragmatic-action

ai-usage-competition

significant-info

employee-strategies

new-trend

social-media-backlash

ai-token-measurement

resource-waste

meta-motivation

meta-response

tokenmaxxing-definition

productivity-measure

meta-usage

shocking-data

ai-usage

management-tool-effect

incentive-effect

management-decision

Annotators

fxp007

URL

blog.pragmaticengineer.com/the-pulse-tokenmaxxing-as-a-weird-new-trend/
simonwillison.net simonwillison.net

https://simonwillison.net/2026/Apr/22/claude-code-confusion/

5
1. fxp007 01 May 2026
  
  in Public
  
  I invest a [great deal of effort](https://simonwillison.net/tags/claude-code/) (that’s 105 posts and counting) in teaching people how to use Claude Code. I don’t want to invest that effort in a product that most people cannot afford to use.
  
  作者个人的投资和努力可能因价格变动而受到损失，这反映了个人和社区对产品持续性的担忧。
  
  personal-investment community-worry
2. fxp007 01 May 2026
  
  in Public
  
  This also doesn’t make sense to me as a strategy for Anthropic. Claude Code _defined the category_ of coding agents. It’s responsible for billions of dollars in annual revenue
  
  文章暗示，如果Anthropic继续这种策略，可能会损害其产品的市场地位和收入。
  
  business-strategy market-position
3. fxp007 01 May 2026
  
  in Public
  
  I don’t buy the “~2% of new prosumer signups” thing, since everyone I’ve talked to is seeing the new pricing grid and the Internet Archive has already [snapped a copy](https://web.archive.org/web/20260422001250/https://claude.com/pricing).
  
  作者对Anthropic所说的“仅对2%的新用户进行小规模测试”的说法表示怀疑，这表明可能存在更大的影响范围。
  
  doubtful-assumption test-skepticism
4. fxp007 01 May 2026
  
  in Public
  
  Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max plans.
  
  这一价格变动可能对依赖该服务的用户产生重大影响，特别是对于那些在较高薪资国家之外的用户，这一变化可能引发对服务可靠性的担忧。
  
  shocking-data price-change
5. fxp007 01 May 2026
  
  in Public
  
  Anthropic today quietly (as in _silently_, no announcement anywhere at all) updated their [claude.com/pricing](https://claude.com/pricing) page (but not their [Choosing a Claude plan page](https://support.claude.com/en/articles/11049762-choosing-a-claude-plan), which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, [and it’s already reverted](https://simonwillison.net/2026/Apr/22/claude-code-confusion/#they-reversed-it)):
  
  文章指出Anthropic在未作任何公告的情况下悄悄更改了定价页面，这一行为本身就值得关注，因为它表明了公司可能缺乏透明度。
  
  non-consensus-view transparency
Visit annotations in context

Tags

non-consensus-view

doubtful-assumption

transparency

personal-investment

price-change

market-position

community-worry

shocking-data

business-strategy

test-skepticism

Annotators

fxp007

URL

simonwillison.net/2026/Apr/22/claude-code-confusion/
www.latent.space www.latent.space

https://www.latent.space/p/ainews-tasteful-tokenmaxxing

8
1. fxp007 01 May 2026
  
  in Public
  
  Alibaba claims it beats the much larger **Qwen3.5-397B-A17B** on major coding evals, including **[SWE-bench Verified 77.2 vs 76.2](https://x.com/Alibaba_Qwen/status/204693977592458457)
  
  阿里巴巴声称Qwen3.6-27B在主要的编码评估中击败了更大的Qwen3.5-397B-A17B模型，这是一个值得注意的技术进步。
  
  notable-information tech-progress
2. fxp007 01 May 2026
  
  in Public
  
  Today’s LS guest, Mikhail Parakhin, CTO of Shopify, had another take on the “tasteful tokenmaxxing” - you want to go for depth (e.g. do more serial autoresearch loops) than go for breadth (e.g. solve a problem by kicking off 5, 10, 50, 500 parallel runs of the LLM slot machine). Worth thinking through.
  
  Shopify的CTO Mikhail Parakhin对“优雅的Tokenmaxxing”提出了不同的看法，强调深度而非广度的重要性。
  
  insightful-comment ai-strategy
3. fxp007 01 May 2026
  
  in Public
  
  Dex Horthy, coiner of Context Engineering and “the Dumb Zone”, publicly retracted his extremely vibe-coding-pilled call 6 months ago and encouraged people to **please read the code**
  
  Dex Horthy公开撤回了他的极端观点，并鼓励人们“请阅读代码”，这反映了技术社区对代码质量的重视。
  
  counterintuitive-view code-quality
4. fxp007 01 May 2026
  
  in Public
  
  the top conversations we have been hearing from AI leadership (CTOs, VPs, Founders) have all centered around the concept of “Tokenmaxxing” and how leaders want to get their teams using more AI, WITHOUT the downside of incentivizing the kinds of horrendous waste
  
  AI领导者们普遍关注“Tokenmaxxing”的概念，即如何在增加AI使用的同时避免激励产生巨大的浪费。
  
  non-consensus-view ai-adoption
5. fxp007 01 May 2026
  
  in Public
  
  the numbers are mindboggling, they mostly serve to reinforce the sheer hardware advantage that a decade of investment has given to GDM and any models they train and serve.
  
  令人震惊的数据揭示，谷歌TPUv8的硬件优势是十年投资的结果，这可能会加剧行业的不平等。
  
  shocking-data industry-inequality
6. fxp007 01 May 2026
  
  in Public
  
  AI News for 4/21/2026-4/22/2026. We checked 12 subreddits, [544 Twitters](https://twitter.com/i/lists/1585430245762441216) and no further Discords.
  
  The mention of checking 12 subreddits and 544 Twitters indicates the diverse platforms where AI news and discussions are prevalent.
  
  ai-news-sources platform-diversity community-discussion
7. fxp007 01 May 2026
  
  in Public
  
  Today’s LS guest, Mikhail Parakhin, CTO of Shopify, had another take on the 'tasteful tokenmaxxing' - you want to go for depth (e.g. do more serial autoresearch loops) than go for breadth (e.g. solve a problem by kicking off 5, 10, 50, 500 parallel runs of the LLM slot machine). Worth thinking through.
  
  Mikhail Parakhin's emphasis on depth over breadth in AI research suggests a focus on quality and depth of work rather than quantity.
  
  mikhail-parakhin depth-over-breadth quality-over-quantity
8. fxp007 01 May 2026
  
  in Public
  
  Dex Horthy, coiner of Context Engineering and 'the Dumb Zone', [publicly retracted](https://www.youtube.com/live/6IxSbMhT7v4?si=tMzmqM103KDbPyE6&t=3424)his extremely vibe-coding-pilled call 6 months ago and encouraged people to **please read the code**, citing [Alex Volkov](https://open.substack.com/users/152216110-alex-volkov?utm_source=mentions)'s [Z/L continuum from AIE Europe](https://x.com/altryne/status/2046246775414276142)**:
  
  Dex Horthy's retraction of his previous stance and emphasis on code reading suggest a shift towards a more cautious approach in AI development.
  
  DEX-Horthy code-reading shift-in-approach
Visit annotations in context

Tags

ai-news-sources

insightful-comment

community-discussion

platform-diversity

code-reading

non-consensus-view

notable-information

shift-in-approach

depth-over-breadth

tech-progress

quality-over-quantity

ai-strategy

shocking-data

DEX-Horthy

code-quality

counterintuitive-view

industry-inequality

mikhail-parakhin

ai-adoption

Annotators

fxp007

URL

latent.space/p/ainews-tasteful-tokenmaxxing
arxiv.org arxiv.org

https://arxiv.org/abs/2604.20652

5
1. fxp007 01 May 2026
  
  in Public
  
  Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity.
  
  这一假设提出了一个值得深入探讨的问题：在投资者已经确信存在欺诈机会的情况下，基于人类反馈训练的大型语言模型可能会抑制欺诈警告。
  
  hypothesis llm-training
2. fxp007 01 May 2026
  
  in Public
  
  Endorsement reversal occurred in fewer than 3 in 1,000 observations.
  
  在1000次观察中，不到3次出现了背书逆转，这表明AI系统在保持立场的一致性方面表现出色。
  
  ai-consistency endorsement-reversal
3. fxp007 01 May 2026
  
  in Public
  
  AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
  
  这一结果强调了AI系统在提供一致欺诈警告方面的优势，这对于提高金融顾问服务的可靠性和有效性具有重要意义。
  
  ai-advantage fraud-warnings
4. fxp007 01 May 2026
  
  in Public
  
  Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate.
  
  令人震惊的是，人类顾问在正常情况下对欺诈性投资的认可率高达13-14%，而在AI系统中的认可率为0%，且在压力下人类顾问抑制警告的频率是AI系统的两到四倍。
  
  shocking-data human-advisor-performance
5. fxp007 01 May 2026
  
  in Public
  
  Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them.
  
  这一发现挑战了传统观点，表明在投资者动机的影响下，AI系统在欺诈检测方面表现更佳，甚至可能略微提高了警告的频率。
  
  non-consensus-view fraud-detection
Visit annotations in context

Tags

non-consensus-view

llm-training

ai-advantage

ai-consistency

hypothesis

human-advisor-performance

fraud-warnings

shocking-data

fraud-detection

endorsement-reversal

Annotators

fxp007

URL

arxiv.org/abs/2604.20652
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/04/21/1135919/ai-surveillance-privacy-llms-bulk-data/

4
1. fxp007 01 May 2026
  
  in Public
  
  When mobile phones became widespread, gathering data about people got much cheaper, but making use of that data remained difficult. Powerful LLMs could change that.
  
  这里强调了LLMs可能改变数据利用难易度的观点，为读者提供了关于技术影响的深入洞察。
  
  core-argument llm-impact non-consensus-view
2. fxp007 01 May 2026
  
  in Public
  
  “A lot of what we think of as privacy protection isn’t so much like something that’s written in the law,” says Karen Levy, a professor of information science at Cornell University.
  
  这段话揭示了隐私保护的复杂性，并非仅仅是法律问题，而是涉及到获取数据的难易程度。
  
  bias-claim privacy-protection background
3. fxp007 01 May 2026
  
  in Public
  
  According to reporting from the _New York Times_ and the _Atlantic_, contract negotiations between Anthropic and the US Department of Defense fell apart in late February because Anthropic balked when the DOD demanded leeway to use the company’s models to analyze commercially available data on US citizens.
  
  这里提到了具体事件和数据，表明LLMs在监控领域的潜在应用引起了全球关注，以及相关公司对于政府使用其技术的态度。
  
  event-data background monitoring-llms
4. fxp007 01 May 2026
  
  in Public
  
  LLM agents could potentially do the work of intelligence analysts in a fraction of the time and for a fraction of the cost, which would enable the state to aim its all-seeing eye toward anyone, not just its highest-priority targets.
  
  文章提出了一个令人震惊的观点：大型语言模型（LLMs）可能极大地加速了大规模监控，使监控的范围从高优先级目标扩展到任何个体。
  
  shocking-data non-consensus-view mass-surveillance
Visit annotations in context

Tags

mass-surveillance

event-data

non-consensus-view

core-argument

privacy-protection

llm-impact

shocking-data

background

bias-claim

monitoring-llms

Annotators

fxp007

URL

technologyreview.com/2026/04/21/1135919/ai-surveillance-privacy-llms-bulk-data/
www.bbc.com www.bbc.com

https://www.bbc.com/news/articles/c4gx1n0dl9no

4
1. fxp007 01 May 2026
  
  in Public
  
  When questioned by the police, the man said he had done it 'for fun'.
  
  这揭示了犯罪动机可能并不严重，但同时也提出了关于线上行为和责任的问题。
  
  criminal-motive online-behavior legal-responsibility
2. fxp007 01 May 2026
  
  in Public
  
  Born in 2024, Neukgu is part of a programme at O-World to restore the Korean wolf, which once roamed the Korean Peninsula but is now considered extinct in the wild.
  
  这一背景信息揭示了Neukgu的重要性，以及韩国狼在生态系统中的地位，引发了对生物多样性保护和濒危物种恢复的思考。
  
  endangered-species conservation wildlife-protection
3. fxp007 01 May 2026
  
  in Public
  
  The hunt for two-year-old Neukgu gripped the nation before he was finally caught near an expressway last week, nine days after his escape.
  
  这表明Neukgu事件在韩国引起了全国性的关注，但同时也引发了关于媒体和公众对于动物逃逸事件的反应是否过度的讨论。
  
  public-attention animal-escape media-response
4. fxp007 01 May 2026
  
  in Public
  
  The AI-generated image of Neukgu had prompted Daejeon city government to issue an emergency text to residents, warning them of a wolf near the intersection.
  
  这一描述表明AI图像在误导当局方面起到了直接作用，引发了对AI技术潜在滥用问题的关注。
  
  ai-impact misinformation public-safety
Visit annotations in context

Tags

wildlife-protection

public-safety

online-behavior

endangered-species

animal-escape

legal-responsibility

misinformation

public-attention

criminal-motive

conservation

media-response

ai-impact

Annotators

fxp007

URL

bbc.com/news/articles/c4gx1n0dl9no
nlp.elvissaravia.com nlp.elvissaravia.com

https://nlp.elvissaravia.com/p/top-ai-papers-of-the-week-f2f

5
1. fxp007 01 May 2026
  
  in Public
  
  This paper introduces Autogenesis, a self-evolving agent
  
  Autogenesis的引入代表了智能体领域的一项创新，它可能对智能体的未来发展方向产生重大影响。
  
  innovation self-evolving-agent future-direction
2. fxp007 01 May 2026
  
  in Public
  
  Static agents age quickly. As deployment environments change and new tools arrive, the agents that survive will be the ones that can safely rewrite themselves.
  
  该声明强调了静态智能体在快速变化的部署环境中的局限性，提出了智能体自我进化的必要性。
  
  agent-evolution environment-change critical-statement
3. fxp007 01 May 2026
  
  in Public
  
  Instead of one large mixed-RL stage, DeepSeek trains a separate specialist expert per domain.
  
  DeepSeek采用了针对特定领域训练专家的方法，这为模型训练提供了新的视角。
  
  domain-specialist training-method new-perspective
4. fxp007 01 May 2026
  
  in Public
  
  DeepSeek-V4-Pro-Max beats GPT-5.2 and Gemini 3.0-Pro on standard reasoning benchmarks and lands just behind GPT-5.4 and Gemini 3.1-Pro
  
  DeepSeek V4-Pro-Max在标准推理基准测试中超越了GPT-5.2和Gemini 3.0-Pro，这表明了开源模型在性能上的巨大提升。
  
  performance-comparison benchmark open-source-model
5. fxp007 01 May 2026
  
  in Public
  
  The release includes DeepSeek-V4-Pro (1.6T total / 49B active) and DeepSeek-V4-Flash (284B total / 13B active), both trained natively at 1M context length.
  
  DeepSeek V4的模型规模之大令人震惊，这表明了在长上下文处理方面取得的显著进步。
  
  large-scale-model context-length surprising-data
Visit annotations in context

Tags

large-scale-model

training-method

agent-evolution

domain-specialist

self-evolving-agent

new-perspective

open-source-model

performance-comparison

critical-statement

benchmark

context-length

future-direction

environment-change

innovation

surprising-data

Annotators

fxp007

URL

nlp.elvissaravia.com/p/top-ai-papers-of-the-week-f2f
epoch.ai epoch.ai

https://epoch.ai/data-insights/service-by-income

1
1. fxp007 01 May 2026
  
  in Public
  
  Claude skews high-income; Meta AI skews low-income
  
  这一标题揭示了文章的核心观点，即不同的AI模型在收入分布上存在显著差异，这一发现可能对AI服务的公平性和可及性产生重要影响。
  
  non-consensus-view impactful-data actionable-statement
Visit annotations in context

Tags

impactful-data

actionable-statement

non-consensus-view

Annotators

fxp007

URL

epoch.ai/data-insights/service-by-income
x.com x.com

(6) Palantir on X: "Because we get asked a lot. The Technological Republic, in brief. 1. Silicon Valley owes a moral debt to the country that made its rise possible. The engineering elite of Silicon Valley has an affirmative obligation to participate in the defense of the nation. 2. We must rebel" / X

1
1. fxp007 01 May 2026
  
  in Public
  
  The Technological Republic, in brief.
  
  https://claude.ai/public/artifacts/5afbc741-ec4f-493d-bab6-ae3e6d170f22
Visit annotations in context

Annotators

fxp007

URL

x.com/PalantirTech/status/2045574398573453312
openai.com openai.com

https://openai.com/index/speeding-up-agentic-workflows-with-websockets/

5
1. fxp007 01 May 2026
  
  in Public
  
  Even with these improvements, Responses API overhead was too large relative to the speed of the model—that is, use
  
  已弃用或过时的内容：过度依赖单个优化点，而忽略了整体性能瓶颈。
  
  outdated-content performance-optimization
2. fxp007 01 May 2026
  
  in Public
  
  With these improvements, we saw close to a 45% improvement in time to first token (TTFT)—which reflects how responsive the API feels—but these improvements were still not fast enough for GPT‑5.3‑Codex‑Spark.
  
  值得注意的代码示例：通过改进TTFT（首次出字时间）来提升API响应速度。
  
  code-example performance-metrics
3. fxp007 01 May 2026
  
  in Public
  
  We approached this through caching, eliminating unnecessary network hops, improving our safety stack to quickly flag issues, and—most importantly—building a way to create a persistent connection to the Responses API, instead of having to make a series of synchronous API calls.
  
  最佳实践建议：通过缓存、减少网络跳数、改进安全栈和建立持久连接来优化性能。
  
  best-practice performance-optimization
4. fxp007 01 May 2026
  
  in Public
  
  In the past, running LLM inference on GPUs was the slowest part of the agentic loop, so API service overhead was easy to hide.
  
  初学者可能误以为模型推理是瓶颈，而忽略了API服务开销的问题。
  
  common-mistake performance-optimization
5. fxp007 01 May 2026
  
  in Public
  
  All of these requests can add up to minutes that users spend waiting for Codex to complete complex tasks.
  
  初学者可能忽略请求累积对用户体验的影响，导致优化时只关注单个请求的响应速度。
  
  common-mistake user-experience
Visit annotations in context

Tags

best-practice

performance-metrics

performance-optimization

common-mistake

code-example

user-experience

outdated-content

Annotators

fxp007

URL

openai.com/index/speeding-up-agentic-workflows-with-websockets/

fxp007

Annotations: 3,192

Joined: September 17, 2022

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL