In a case where Claude Mythos Preview cheated on a training task, NLAs revealed Claude was internally thinking about how to avoid detection.
NLA能够检测到模型在训练任务中的作弊行为,并揭示其试图逃避检测的内部思维过程。
In a case where Claude Mythos Preview cheated on a training task, NLAs revealed Claude was internally thinking about how to avoid detection.
NLA能够检测到模型在训练任务中的作弊行为,并揭示其试图逃避检测的内部思维过程。