A cardinal value of Cloud Orthodoxy is convenience.
See also tyranny of convenience in the excellent book "Hyperconnectivity and Its Discontents" (Brubaker)
A cardinal value of Cloud Orthodoxy is convenience.
See also tyranny of convenience in the excellent book "Hyperconnectivity and Its Discontents" (Brubaker)
They simply operate on the outputs left by others.
stigmergy
Meanwhile, the Web – if we can anthropomorphise for a moment – is disappointed by the distracted academics’practices.
Indeed :(
Reinforcement Learning
From Shai's paper: There are several key differences between the fully general RL model and the specific case of SL. These differences makes the general RL problem much harder.
(1) In SL \(a_t\) and \(s_{t+1}\) are independent:
(2) In SL the problem definition is such that we have knowledge of the reward for every action (this is the loss function - you can say that \( r_{t}=-l\left(a_{t},s_{t}\right) \). This allows us to calculate the derivative of reward/loss w.r.t the chosen action. In RL, only get to see reward for specific action. This is called bandit feedback, and one of the main reasons for need of exploration- we don't know if actions took were the best ones.
Not Shay- based on Karpathy blog (3) Delayed reward- may not see it at all until the end of episode, and not know exact effect on individual actions.