Hypothesis

On Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro.

大多数人认为开源模型与顶级闭源模型之间存在巨大差距，但作者认为GLM-5.2在终端基准测试中已经接近Claude Opus 4.8的性能，甚至超过了Gemini 3.1 Pro。这一观点挑战了AI领域'闭源模型遥遥领先'的行业共识，表明开源模型在特定编码任务上已经能够与顶级商业模型竞争。