Hypothesis

17 Matching Annotations

Jun 2026
huggingface.co huggingface.co

https://huggingface.co/blog/zai-org/glm-52-blog

1
1. fxp007 17 Jun 2026
  
  in Public
  
  On Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro.
  
  大多数人认为开源模型与顶级闭源模型之间存在巨大差距，但作者认为GLM-5.2在终端基准测试中已经接近Claude Opus 4.8的性能，甚至超过了Gemini 3.1 Pro。这一观点挑战了AI领域'闭源模型遥遥领先'的行业共识，表明开源模型在特定编码任务上已经能够与顶级商业模型竞争。
  
  non-consensus ai-performance coding-benchmarks
Visit annotations in context

Tags

coding-benchmarks

ai-performance

non-consensus

Annotators

fxp007

URL

huggingface.co/blog/zai-org/glm-52-blog
www.latent.space www.latent.space

https://www.latent.space/p/ainews-frontiercode-benchmarking

1
1. fxp007 09 Jun 2026
  
  in Public
  
  The headline result is that the best model, Opus 4.8, scores only about 13% on the hardest subset—far below the 50%+ regime common on SWE-Bench-style evals
  
  大多数人认为AI编程能力已经接近或超越人类水平，但作者指出即使在最先进的模型上，代码质量评估也远低于传统基准测试，暗示编程问题远未解决。这一发现挑战了AI编程能力已成熟的普遍认知。
  
  counterintuitive ai-capabilities coding-performance
Visit annotations in context

Tags

coding-performance

ai-capabilities

counterintuitive

Annotators

fxp007

URL

latent.space/p/ainews-frontiercode-benchmarking
www.latent.space www.latent.space

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

2
1. fxp007 05 Jun 2026
  
  in Public
  
  the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.
  
  This is the sharpest critique of naive AI coding adoption in the article. Without proper agent oversight, code review loops, and quality gates, AI doesn't raise the floor — it lowers it by enabling low-quality code to ship at machine speed. The 'worst engineer' framing implies that unconstrained agents optimize for task completion, not codebase health.
  
  vibe-coding ai-code-quality failure-modes
2. fxp007 05 Jun 2026
  
  in Public
  
  The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor's tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer's local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time.
  
  Framing Copilot and Cursor's autocomplete as 'wave 1' that merely accelerated the existing bottleneck reframes the narrative: these tools didn't change the fundamental unit of work (developer attention), they just made it faster. The real disruption is removing developer attention as the rate-limiting step entirely.
  
  ai-coding-waves copilot cursor
Visit annotations in context

Tags

ai-code-quality

copilot

failure-modes

cursor

vibe-coding

ai-coding-waves

Annotators

fxp007

URL

latent.space/p/cognition
May 2026
github.com github.com

actual/.claude/skills at master · actualbudget/actual

1
1. TylerRick 29 May 2026
  
  in Public
  
  AI AI: coding Claude
Visit annotations in context

Tags

Claude

AI: coding

AI

Annotators

TylerRick

URL

github.com/actualbudget/actual/blob/master/.claude/skills/committing-actual-changes/SKILL.md
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/

1
1. fxp007 29 May 2026
  
  in Public
  
  annual employment growth for coders has slowed significantly—by about 3%—since the introduction of ChatGPT
  
  程序员就业增长率自ChatGPT推出以来下降了约3%，这是一个值得注意的下降。然而，文章同时指出'程序员就业总数仍在增长'，只是增速放缓。这表明AI正在改变特定职业的性质，而非完全消除这些职业。3%的增速下降反映了AI对编程领域的影响，但影响程度相对温和。
  
  data-point coding-jobs ai-automation
Visit annotations in context

Tags

data-point

ai-automation

coding-jobs

Annotators

fxp007

URL

technologyreview.com/2026/05/26/1137855/a-reality-check-on-the-ai-jobs-hysteria/
blog.k10s.dev blog.k10s.dev

https://blog.k10s.dev/im-going-back-to-writing-code-by-hand/

1
1. fxp007 19 May 2026
  
  in Public
  
  The tl;dr of this dev log is that I still need to be in the loop to make anything meaningful.
  
  大多数人认为AI可以完全自主开发软件，但作者认为人类干预仍然必不可少，因为AI擅长实现功能但不理解架构设计，需要人类掌控整体方向。
  
  non-consensus ai-coding human-intervention
Visit annotations in context

Tags

ai-coding

human-intervention

non-consensus

Annotators

fxp007

URL

blog.k10s.dev/im-going-back-to-writing-code-by-hand/
Apr 2026
openai.com openai.com

https://openai.com/index/introducing-gpt-5-5/

1
1. fxp007 24 Apr 2026
  
  in Public
  
  GPT‑5.5 is our strongest agentic coding model to date. On **Terminal-Bench 2.0,** which tests complex command-line workflows requiring planning, iteration, and tool coordination, it achieves a state-of-the-art accuracy of 82.7%.
  
  大多数人认为AI在复杂编程任务中仍需要人类监督和干预，但作者认为GPT-5.5已经能在复杂的命令行工作流中达到82.7%的准确率，这挑战了'AI编程助手仍处于辅助阶段'的共识，暗示AI可能在某些编程领域已经接近或达到专业人类水平。
  
  non-consensus coding-ai counterintuitive
Visit annotations in context

Tags

coding-ai

counterintuitive

non-consensus

Annotators

fxp007

URL

openai.com/index/introducing-gpt-5-5/
www.anthropic.com www.anthropic.com

Introducing Claude Opus 4.7

1
1. fxp007 17 Apr 2026
  
  in Public
  
  On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve.
  
  13%的性能提升在AI领域是显著的飞跃，特别是解决了前代模型完全无法处理的任务，这表明AI能力的非线性发展可能已经到来，而非简单的线性进步。
  
  performance-leap coding-ai
Visit annotations in context

Tags

performance-leap

coding-ai

Annotators

fxp007

URL

anthropic.com/news/claude-opus-4-7
epoch.ai epoch.ai

https://epoch.ai/blog/mirrorcode-preliminary-results

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.
  
  这是一个惊人的发现，表明AI已经能够完成通常需要人类工程师数周时间才能完成的复杂编程任务。这不仅挑战了我们对AI当前能力的认知，也暗示了软件工程领域可能即将发生重大变革。这种级别的自主编程能力远超当前主流AI编程助手的表现。
  
  ai-capabilities software-engineering autonomous-coding
Visit annotations in context

Tags

autonomous-coding

ai-capabilities

software-engineering

Annotators

fxp007

URL

epoch.ai/blog/mirrorcode-preliminary-results
www.xiaohu.ai www.xiaohu.ai

https://www.xiaohu.ai/c/xiaohu-ai/glm-5v-turbo

1
1. fxp007 16 Apr 2026
  
  in Public
  
  GLM-5V-Turbo 拿了 94.8 分，Claude Opus 4.6 是 77.3。差距不小。
  
  令人惊讶的是，在将UI设计稿还原成代码的测试中，GLM-5V-Turbo的得分(94.8)显著领先于Claude Opus 4.6(77.3)，这表明它在视觉编码领域有着惊人的优势，几乎领先了17个百分点，这种差距在AI模型比较中是非常罕见的。
  
  surprising ai-performance coding-models
Visit annotations in context

Tags

surprising

ai-performance

coding-models

Annotators

fxp007

URL

xiaohu.ai/c/xiaohu-ai/glm-5v-turbo
a16z.com a16z.com

Where Enterprises are Actually Adopting AI - a16z

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Coding is the dominant use case for AI by nearly an order of magnitude. It's abundantly clear in the [reported explosive growth] of companies like Cursor, as well as the [hyper growth] of tools like Claude Code and Codex.
  
  令人惊讶的是：编程已成为AI在企业中最主要的应用场景，其规模远超其他用例近一个数量级。工程师使用AI工具可以将生产力提高10-20倍，这一惊人的效率提升解释了为什么企业愿意如此迅速地采用AI编程工具，也颠覆了人们对软件开发工作流程的传统认知。
  
  surprising coding-ai productivity fun-fact
Visit annotations in context

Tags

surprising

fun-fact

coding-ai

productivity

Annotators

fxp007

URL

a16z.com/where-enterprises-are-actually-adopting-ai/
Jan 2026
passo.uno passo.uno

The four modes of AI-augmented technical writing

1
1. tonz 26 Jan 2026
  
  in Public
  
  blogger Fabrizio Ferri Benedetti on their 4 modes of using AI in technical writing. - watercooler conversations, to get code explained - text suggestions while writing/coding (esp for repeating patterns in your work - providing context / constraints / intent to generate first drafts, restructure content, or boilerplate commentary etc. - a robotic assembly line, to do checks, tests and rewrites. MCP/skills involved.
  
  Not either/or but switching between modes
  
  ai-agents algogens coding
Visit annotations in context

Tags

coding

algogens

ai-agents

Annotators

tonz

URL

passo.uno/four-modes-ai-augmented-tech-writing/
simonwillison.net simonwillison.net

2025: The year in LLMs

2
1. tonz 01 Jan 2026
  
  in Public
  
  I love the asynchronous coding agent category. They’re a great answer to the security challenges of running arbitrary code execution on a personal laptop and it’s really fun being able to fire off multiple tasks at once—often from my phone—and get decent results a few minutes later.
  
  async coding agents: prompt and forget
  
  coding vibecoding ai-agents
2. tonz 01 Jan 2026
  
  in Public
  
  coding agents—LLM systems that can write code, execute that code, inspect the results and then iterate further.
  
  author def of coding agents
  
  definition ai-agents coding vib
Visit annotations in context

Tags

coding

definition

vibecoding

vib

ai-agents

Annotators

tonz

URL

simonwillison.net/2025/Dec/31/the-year-in-llms/
Apr 2015
thegrid.io thegrid.io

Frequently Asked Questions

1
1. quoudten 11 Apr 2015
  
  in Public
  
  Do you need to learn code to use The Grid? No coding is required to use The Grid. Just do what you're already doing on Facebook, Twitter, Instagram, etc. Post images, video, and content to your site and our AI Designer will make it beautiful. If you know code, you can extend functionality using our platform tools and API.
  
  Coding skills are a plus but not necessary. Accessibility!...
  
  coding programming ai grid.io website design api tools
Visit annotations in context

Tags

coding

tools

programming

grid.io

website

api

design

ai

Annotators

quoudten

URL

thegrid.io/faq/
May 2014
www.npr.org www.npr.org

Untitled document

1
1. Spence 06 May 2014
  
  in Public
  
  coding code tumors software AI MachineLearning patterns biocompute cancer HPC
Visit annotations in context

Tags

coding

cancer

HPC

AI

patterns

tumors

code

biocompute

MachineLearning

software

Annotators

Spence

URL

npr.org/blogs/health/2014/05/06/309003098/chemist-turns-software-developer-after-sons-cancer-diagnosis

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Not either/or but switching between modes

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL