Hypothesis

4 Matching Annotations

Jun 2026
huggingface.co huggingface.co

https://huggingface.co/blog/zai-org/glm-52-blog

1
1. fxp007 17 Jun 2026
  
  in Public
  
  On Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro.
  
  大多数人认为开源模型与顶级闭源模型之间存在巨大差距，但作者认为GLM-5.2在终端基准测试中已经接近Claude Opus 4.8的性能，甚至超过了Gemini 3.1 Pro。这一观点挑战了AI领域'闭源模型遥遥领先'的行业共识，表明开源模型在特定编码任务上已经能够与顶级商业模型竞争。
  
  non-consensus ai-performance coding-benchmarks
Visit annotations in context

Tags

coding-benchmarks

ai-performance

non-consensus

Annotators

fxp007

URL

huggingface.co/blog/zai-org/glm-52-blog
www.latent.space www.latent.space

https://www.latent.space/p/ainews-frontiercode-benchmarking

1
1. fxp007 09 Jun 2026
  
  in Public
  
  The headline result is that the best model, Opus 4.8, scores only about 13% on the hardest subset—far below the 50%+ regime common on SWE-Bench-style evals
  
  大多数人认为AI编程能力已经接近或超越人类水平，但作者指出即使在最先进的模型上，代码质量评估也远低于传统基准测试，暗示编程问题远未解决。这一发现挑战了AI编程能力已成熟的普遍认知。
  
  counterintuitive ai-capabilities coding-performance
Visit annotations in context

Tags

coding-performance

ai-capabilities

counterintuitive

Annotators

fxp007

URL

latent.space/p/ainews-frontiercode-benchmarking
Apr 2026
www.anthropic.com www.anthropic.com

Introducing Claude Opus 4.7

1
1. fxp007 17 Apr 2026
  
  in Public
  
  On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve.
  
  13%的性能提升在AI领域是显著的飞跃，特别是解决了前代模型完全无法处理的任务，这表明AI能力的非线性发展可能已经到来，而非简单的线性进步。
  
  performance-leap coding-ai
Visit annotations in context

Tags

performance-leap

coding-ai

Annotators

fxp007

URL

anthropic.com/news/claude-opus-4-7
www.xiaohu.ai www.xiaohu.ai

https://www.xiaohu.ai/c/xiaohu-ai/glm-5v-turbo

1
1. fxp007 16 Apr 2026
  
  in Public
  
  GLM-5V-Turbo 拿了 94.8 分，Claude Opus 4.6 是 77.3。差距不小。
  
  令人惊讶的是，在将UI设计稿还原成代码的测试中，GLM-5V-Turbo的得分(94.8)显著领先于Claude Opus 4.6(77.3)，这表明它在视觉编码领域有着惊人的优势，几乎领先了17个百分点，这种差距在AI模型比较中是非常罕见的。
  
  surprising ai-performance coding-models
Visit annotations in context

Tags

surprising

ai-performance

coding-models

Annotators

fxp007

URL

xiaohu.ai/c/xiaohu-ai/glm-5v-turbo

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL