fxp007 01 May 2026 in Public We unknowingly gave particularly high rewards for metaphors with creatures. 这揭示了最佳实践建议:在训练模型时,应仔细设计奖励机制,以避免意外地鼓励不希望的行为。 best-practice reward-mechanism