s RLHF which is reinforced learning with human feedback.
Gemini said Reinforcement Learning (RL) is a subset of machine learning where an agent learns to make decisions by performing actions in an environment to maximize a reward.
Unlike Supervised Learning (where the model is given the "right answer"), RL is based on trial and error. It is very similar to training a dog: you don't tell the dog exactly which muscles to move; you just give it a treat when it does something good and no treat when it doesn't.