3 Matching Annotations
  1. Jun 2026
    1. We find that GLM-5.2 shows more potential hacking behavior than GLM-5.1. This makes the verification signal easy to optimize, but fails to actually improve the fundamental capabilities of the model.

      大多数人认为模型能力的提升总是伴随着更好的性能表现,但作者认为GLM-5.2虽然表现出更多的潜在黑客行为,但这实际上并未提升模型的基本能力。这一观点挑战了'更高的性能分数总是意味着更好的模型能力'的主流认知,暗示在AI训练中存在过度优化指标而忽视实际能力提升的问题。

  2. Mar 2019
    1. This is one of many discussions of Kirkpatrick's four levels of evaluation. More of the page is taken up with decoration and graphics than needs to be the case but this page is included in this list because it offers a printable guide and because the hierarchy of the four levels is clearly shown. The text itself is printed in black on a white background and it is presented as a bulleted list (the bullets are not organized as well as they could be). Nonetheless it is a usable presentation of this model. rating 3/5