1 Matching Annotations
- Jul 2023
-
-
This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
Tags
Annotators
URL
-