Hypothesis

14 Matching Annotations

Last 7 days
thesequence.substack.com thesequence.substack.com

https://thesequence.substack.com/p/the-sequence-radar-885-last-week

1
1. fxp007 03 Jul 2026
  
  in Public
  
  the way we communicate with them must evolve from loose conversation into something closer to structured collaboration.
  
  随着模型变得更加 agentic，传统的自然语言提示词工程可能正在走向终结。未来的人机交互将更像是在设计机器可读的工作流。这隐含了一个假设：为了可靠性和可控性，我们需要牺牲部分自然语言的模糊性，转向结构化的语义标记。
  
  core-argument human-ai-interaction prompt-engineering
Visit annotations in context

Tags

prompt-engineering

human-ai-interaction

core-argument

Annotators

fxp007

URL

thesequence.substack.com/p/the-sequence-radar-885-last-week
May 2026
a16z.com a16z.com

https://a16z.com/avoiding-death-on-the-yellow-brick-road/

1
1. fxp007 27 May 2026
  
  in Public
  
  The critical insight in the Oz analogy is that roughly half of any real workflow that is non-agentic carries no lab advantage. They are no better than you are at writing the deterministic software underneath the model layer.
  
  大多数人认为AI将取代所有软件工程工作，人类只需构建AI代理层。但作者认为真实工作流程中约有一半是非代理性的，这部分工作大模型实验室没有任何优势。大模型公司在编写模型层下方的确定性软件方面并不比专业应用公司更好。这为专注于构建复杂工作流程中非AI部分的企业提供了重要机会。
  
  non-consensus ai-limitations software-engineering
Visit annotations in context

Tags

non-consensus

ai-limitations

software-engineering

Annotators

fxp007

URL

a16z.com/avoiding-death-on-the-yellow-brick-road/
huggingface.co huggingface.co

https://huggingface.co/papers/2604.24658

1
1. fxp007 01 May 2026
  
  in Public
  
  Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work.
  
  大多数人认为人类可读的论文同样适合AI理解，但作者认为传统论文对人类读者是可容忍的，但对AI理解研究过程却造成了'工程税'，这反映了当前学术出版系统在AI时代的不适应性。
  
  non-consensus ai-research engineering-tax
Visit annotations in context

Tags

non-consensus

engineering-tax

ai-research

Annotators

fxp007

URL

huggingface.co/papers/2604.24658
openai.com openai.com

https://openai.com/index/open-source-codex-orchestration-symphony/

1
1. fxp007 01 May 2026
  
  in Public
  
  Our early versions of agentic work was only asking Codex to implement the task. That approach proved too limiting. Codex is perfectly capable of creating multiple PRs as well as reading review feedback and addressing it.
  
  大多数人认为AI只能执行简单的、单一的任务，但作者认为AI已经能够处理复杂的、多步骤的工作流程，包括创建多个PR和回应代码审查。这个观点挑战了人们对AI能力的传统认知，表明AI已经进化到能够理解并执行复杂的软件工程任务。
  
  non-consensus ai-capabilities software-engineering counterintuitive
Visit annotations in context

Tags

non-consensus

ai-capabilities

counterintuitive

software-engineering

Annotators

fxp007

URL

openai.com/index/open-source-codex-orchestration-symphony/
Apr 2026
www.kimi.com www.kimi.com

https://www.kimi.com/blog/kimi-k2-6

1
1. fxp007 26 Apr 2026
  
  in Public
  
  Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source financial matching engine. Over a 13-hour execution, the model iterated through 12 optimization strategies, initiating over 1,000 tool calls to precisely modify more than 4,000 lines of code.
  
  大多数人认为AI在复杂工程任务中仍需要人类专家的指导和监督，难以独立完成大规模系统重构。但作者展示了AI能够自主分析、优化并重构一个运行8年的金融系统，这挑战了人们对AI工程能力的传统认知，暗示AI可能已经具备系统级架构设计和优化的能力。
  
  non-consensus ai-engineering autonomous-systems
Visit annotations in context

Tags

non-consensus

ai-engineering

autonomous-systems

Annotators

fxp007

URL

kimi.com/blog/kimi-k2-6
www.anthropic.com www.anthropic.com

https://www.anthropic.com/news/anthropic-nec

1
1. fxp007 26 Apr 2026
  
  in Public
  
  NEC aims to build one of Japan's largest AI-native engineering teams, who will use Claude Code in their work.
  
  大多数人认为AI会取代大量工程师职位，但作者认为AI实际上是在创造新的工程角色和技能需求，因为NEC正在积极建立一支大规模的AI原生工程团队，这表明AI工具正在增强而非替代工程能力，创造新的就业机会。
  
  non-consensus ai-jobs engineering-transformation
Visit annotations in context

Tags

non-consensus

ai-jobs

engineering-transformation

Annotators

fxp007

URL

anthropic.com/news/anthropic-nec
www.tomtunguz.com www.tomtunguz.com

https://www.tomtunguz.com/ai-problem-matrix/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  AI writes the code. Tests verify correctness. More code enables more features.
  
  这个简洁描述揭示了AI在软件开发中的完整闭环：AI生成代码，测试验证正确性，更多代码创造更多功能。这种自增强循环可能使软件开发成为AI最具颠覆性的应用领域。
  
  ai-workflow software-engineering
Visit annotations in context

Tags

ai-workflow

software-engineering

Annotators

fxp007

URL

tomtunguz.com/ai-problem-matrix/
www.minimax.io www.minimax.io

https://www.minimax.io/models/text/m27

1
1. fxp007 17 Apr 2026
  
  in Public
  
  M2.7 demonstrates excellent performance in real-world software engineering, including end-to-end project delivery, log analysis for bug hunting, code security, and machine learning tasks.
  
  这一声明暗示AI模型已经超越了简单的代码生成，能够完成完整的软件开发生命周期，这代表了AI在工程领域应用的重大突破，可能重新定义软件开发的未来模式。
  
  ai-capability software-engineering
Visit annotations in context

Tags

software-engineering

ai-capability

Annotators

fxp007

URL

minimax.io/models/text/m27
epoch.ai epoch.ai

https://epoch.ai/blog/mirrorcode-preliminary-results

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.
  
  这是一个惊人的发现，表明AI已经能够完成通常需要人类工程师数周时间才能完成的复杂编程任务。这不仅挑战了我们对AI当前能力的认知，也暗示了软件工程领域可能即将发生重大变革。这种级别的自主编程能力远超当前主流AI编程助手的表现。
  
  ai-capabilities software-engineering autonomous-coding
Visit annotations in context

Tags

ai-capabilities

autonomous-coding

software-engineering

Annotators

fxp007

URL

epoch.ai/blog/mirrorcode-preliminary-results
x.com x.com

https://x.com/AlphaSignalAI/status/2043706039334252599

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Same clinical question, two framings. One as a patient, one as a doctor.
  
  令人惊讶的是：完全相同的医疗问题，仅因提问者身份从"患者"变为"医生"，AI就会给出截然不同的回答。这种简单的措辞变化就能触发或绕过安全限制，表明AI的安全机制极其脆弱且容易被规避。
  
  surprising ai-security prompt-engineering
Visit annotations in context

Tags

prompt-engineering

surprising

ai-security

Annotators

fxp007

URL

x.com/AlphaSignalAI/status/2043706039334252599
z.ai z.ai

https://z.ai/blog/glm-5.1

1
1. fxp007 16 Apr 2026
  
  in Public
  
  GLM-5.1 achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).
  
  令人惊讶的是：GLM-5.1在软件工程代理任务上取得了最先进的性能，特别是在代码仓库生成和真实终端任务方面大幅领先其前代模型。这表明AI在理解和执行复杂软件工程任务方面取得了质的飞跃。
  
  surprising software-engineering ai-advancement
Visit annotations in context

Tags

surprising

ai-advancement

software-engineering

Annotators

fxp007

URL

z.ai/blog/glm-5.1
mistral.ai mistral.ai

https://mistral.ai/news/spaces

1
1. fxp007 09 Apr 2026
  
  in Public
  
  There's an old saying that content is king. With agents, context is.
  
  在 LLM 时代，这是对“上下文窗口”重要性最精辟的注解。Agent 不具备人类的隐性知识和环境感知能力，因此显式的上下文（如 context.json）成为了其行动的基石。这提醒我们，在设计 AI 辅助系统时，构建高质量的上下文生成机制往往比优化模型本身更为关键。
  
  llm-context ai-engineering core-argument
Visit annotations in context

Tags

core-argument

ai-engineering

llm-context

Annotators

fxp007

URL

mistral.ai/news/spaces
Feb 2026
blog.comma.ai blog.comma.ai

Owning a $5M data center

1
1. pyxelr 07 Feb 2026
  
  in Public
  
  Owning a $5M data center
  
  comma.ai operates its own $5M data center in-office to handle model training, metrics, and data storage, avoiding the "cloud tax."
  
  The facility consumes approximately 450kW at peak; power costs in San Diego (over 40c/kWh) totaled over $540,000 in 2025.
  
  Cooling is achieved using pure outside air with dual 48” intake and exhaust fans, utilizing a PID loop to manage temperature and humidity.
  
  The compute cluster consists primarily of 600 GPUs across 75 "TinyBox Pro" machines built in-house for cost efficiency and easier repairability.
  
  Storage is handled by several racks of Dell R630/R730 servers with ~4PB of total SSD storage, favoring speed and random access over redundancy.
  
  The software stack is kept simple to ensure 99% uptime, utilizing Ubuntu (pxeboot), Salt for management, and "minikeyvalue" for distributed storage.
  
  By owning their hardware, comma.ai estimates they saved $20M+ compared to equivalent compute costs in a public cloud environment.
  
  Hacker News Discussion
  
  Users discussed the spectrum of infrastructure, ranging from pure Cloud (low cap-ex, high op-ex) to colocation and on-prem (high cap-ex, high skill requirement).
  
  A primary concern raised was "brain drain"—on-prem setups can become "legacy debt" if the senior engineers who built the custom systems leave without documenting unwritten knowledge.
  
  Commenters noted that AWS and other cloud providers are incentivized to keep architectures complex (microservices, serverless) to increase billing, whereas on-prem encourages efficiency.
  
  There was a debate regarding "software freedom" and the "WhatsApp effect," where small, highly motivated teams can outperform massive corporations by using lean, self-hosted stacks.
  
  Some users highlighted that while AWS pricing is expected to rise due to hardware costs, the "Quality of Life" and managed services still justify the cost for many startups without comma's scale.
  
  comma-ai #self-hosting #datacenter #hardware-engineering
  
  comma-ai self-hosting datacenter hardware-engineering cloud
Visit annotations in context

Tags

hardware-engineering

datacenter

comma-ai

cloud

self-hosting

Annotators

pyxelr

URL

blog.comma.ai/datacenter/
Nov 2021
psyarxiv.com psyarxiv.com

Measuring Trust in the XAI Context

1
1. jasminehollingworth 03 Nov 2021
  
  in BehSci
  
  Hoffman, R., Mueller, S., Klein, G., & Litman, J. (2021). Measuring Trust in the XAI Context. PsyArXiv. https://doi.org/10.31234/osf.io/e3kv9
  
  lang:en is:preprint measuring trust context XAI air force US government automation computer science cognitive cognitive system engineering AI development test
Visit annotations in context

Tags

cognitive system engineering

automation

computer science

US

trust

government

XAI

is:preprint

lang:en

measuring

context

cognitive

air force

test

AI

development

Annotators

jasminehollingworth

URL

psyarxiv.com/e3kv9/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Hacker News Discussion

comma-ai #self-hosting #datacenter #hardware-engineering

Tags

Annotators

URL

Tags

Annotators

URL