Hypothesis

GLM-5.1 achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

令人惊讶的是：GLM-5.1在软件工程代理任务上取得了最先进的性能，特别是在代码仓库生成和真实终端任务方面大幅领先其前代模型。这表明AI在理解和执行复杂软件工程任务方面取得了质的飞跃。

surprising software-engineering ai-advancement

Tags

Annotators

URL