5 Matching Annotations
  1. Nov 2020
    1. we designed a reward function that isbased on a game-balancing constant and introduce itinto the Proximal-Policy-Opmitization (PPO) (Schul-man et al., 2017) algorithm, a reinforcement learn-ing method that directly optimizes the policy usinggradient-based learning.

      *핵심 reward function + PPO