Reinforcement learning is evil. This is not something new. People in AI safety have been talking about the fundamental flaw in training by reinforcement learning to achieve something in the world: it gives rise to the problems of instrumental goals and reward hacking.
这一强烈批评指出了强化学习的根本缺陷,即工具性目标和奖励黑客问题,对当前AI训练方法提出了重要质疑。