Hypothesis

4 Matching Annotations

Nov 2020
docdrop.org docdrop.org

Annotate PDF: 1708.04782-Tzjxj.pdf

4
1. changhoonpark 07 Nov 2020
  
  in Public
  
  We define two different reward structures: ternary1(win) /0(tie) /−1(loss) received at the endof a game (with all-zero rewards during the game), and Blizzard score. The ternary win/tie/lossscore is the real reward that we care about. The Blizzard score is the score seen by players on thevictory screen at the end of the game. While players can only see this score at the end of the game, weprovide access to the running Blizzard score at every step during the game so that the change in scorecan be used as a reward for reinforcement learning. It is computed as the sum of current resourcesand upgrades researched, as well as units and buildings currently alive and being built. This meansthat the player’s cumulative reward increases with more mined resources, decreases when losingunits/buildings, and all other actions (training units, building buildings, and researching) do notaffect it. The Blizzard score is not zero-sum since it is player-centric, it is far less sparse than theternary reward signal, and it correlates to some extent with winning or losing.
  
  dynalist 정리
  
  형찬
2. changhoonpark 07 Nov 2020
  
  in Public
  
  To win a game, a player must: 1. Accumulateresources (minerals and vespene gas), 2. Construct production buildings, 3. Amass an army, and4. Eliminate all of the opponent’s buildings. A game typically lasts from a few minutes to onehour, and early actions taken in the game (e.g., which buildings and units are built) have long termconsequences. Players have imperfect information since they can typically only see the portion ofthe map where they have units. If they want to understand and react to their opponent’s strategy theymust send units to scout. As we describe later in this section, the action space is also quite uniqueand challenging.
  
  형찬
3. changhoonpark 07 Nov 2020
  
  in Public
  
  with other researchers; 5. In some cases a pool of avid human players exists, making it possible tobenchmark against highly skilled individuals. 6. Since games are simulations, they can be controlledprecisely, and run at scale.
  
  dynalist 정리
  
  형찬
4. changhoonpark 07 Nov 2020
  
  in Public
  
  These games offer multiple advantages: 1. They have clearobjective measures of success; 2. Computer games typically output rich streams of observationaldata, which are ideal inputs for deep networks; 3. They are externally defined to be difficult andinteresting for a human to play. This ensures that the challenge itself is not tuned by the researcherto make the problem easier for the algorithms being developed; 4. Games are designed to be runanywhere with the same interface and game dynamics, making it easy to share a challenge precisely
  
  dynalist 정리
  
  형찬
Visit annotations in context

Tags

형찬

Annotators

changhoonpark

URL

docdrop.org/static/drop-pdf/1708.04782-Tzjxj.pdf

Tags

Annotators

URL