- Feb 2023
-
arxiv.org arxiv.org
-
Definition 3.2 (simple reward machine).
The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
-
e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
-
However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
Fascinating idea, why not? Why are we hiding the reward from the agent really?
-
Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
[Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
-
-
proceedings.mlr.press proceedings.mlr.press
-
Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
[Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
-