5 Matching Annotations
- Nov 2020
-
docdrop.org docdrop.org
-
we designed a reward function that isbased on a game-balancing constant and introduce itinto the Proximal-Policy-Opmitization (PPO) (Schul-man et al., 2017) algorithm, a reinforcement learn-ing method that directly optimizes the policy usinggradient-based learning.
*핵심 reward function + PPO
-
still because the player may have a wrong vision ofits own abilities (Missura and G ̈artner, 2009)
의미 파악 ??
-
remains inside a range around this constant during the training
이 문장의 의미 파악하기
-
a reward function based on a balancingconstan
reward function 에 대한 내용 조사
-
how to act while still maintaining the balancing
밸런스를 유지하면서의 의미..
-