4 Matching Annotations
  1. Nov 2020
    1. We define two different reward structures: ternary1(win) /0(tie) /−1(loss) received at the endof a game (with all-zero rewards during the game), and Blizzard score. The ternary win/tie/lossscore is the real reward that we care about. The Blizzard score is the score seen by players on thevictory screen at the end of the game. While players can only see this score at the end of the game, weprovide access to the running Blizzard score at every step during the game so that the change in scorecan be used as a reward for reinforcement learning. It is computed as the sum of current resourcesand upgrades researched, as well as units and buildings currently alive and being built. This meansthat the player’s cumulative reward increases with more mined resources, decreases when losingunits/buildings, and all other actions (training units, building buildings, and researching) do notaffect it. The Blizzard score is not zero-sum since it is player-centric, it is far less sparse than theternary reward signal, and it correlates to some extent with winning or losing.

      dynalist 정리

    2. To win a game, a player must: 1. Accumulateresources (minerals and vespene gas), 2. Construct production buildings, 3. Amass an army, and4. Eliminate all of the opponent’s buildings. A game typically lasts from a few minutes to onehour, and early actions taken in the game (e.g., which buildings and units are built) have long termconsequences. Players have imperfect information since they can typically only see the portion ofthe map where they have units. If they want to understand and react to their opponent’s strategy theymust send units to scout. As we describe later in this section, the action space is also quite uniqueand challenging.
    3. with other researchers; 5. In some cases a pool of avid human players exists, making it possible tobenchmark against highly skilled individuals. 6. Since games are simulations, they can be controlledprecisely, and run at scale.

      dynalist 정리

    4. These games offer multiple advantages: 1. They have clearobjective measures of success; 2. Computer games typically output rich streams of observationaldata, which are ideal inputs for deep networks; 3. They are externally defined to be difficult andinteresting for a human to play. This ensures that the challenge itself is not tuned by the researcherto make the problem easier for the algorithms being developed; 4. Games are designed to be runanywhere with the same interface and game dynamics, making it easy to share a challenge precisely

      dynalist 정리