- May 2023
-
jon-e.net jon-e.net
-
A cardinal value of Cloud Orthodoxy is convenience.
See also tyranny of convenience in the excellent book "Hyperconnectivity and Its Discontents" (Brubaker)
-
- Jul 2022
-
s3.us-west-2.amazonaws.com s3.us-west-2.amazonaws.com
-
They simply operate on the outputs left by others.
stigmergy
-
- Jan 2022
-
s3.us-west-2.amazonaws.com s3.us-west-2.amazonaws.com
-
Meanwhile, the Web – if we can anthropomorphise for a moment – is disappointed by the distracted academics’practices.
Indeed :(
-
- Apr 2019
-
Local file Local file
-
Reinforcement Learning
From Shai's paper: There are several key differences between the fully general RL model and the specific case of SL. These differences makes the general RL problem much harder.
(1) In SL \(a_t\) and \(s_{t+1}\) are independent:
- we can take our sample set and look for a good predictor over it (this is like a policy in RL). In RL, the sample set is dependent on the policy - training data generation tied to policy learning problem
- An action at round \( t \) may have an effect for in the future whereas in SL only local- current reward.
(2) In SL the problem definition is such that we have knowledge of the reward for every action (this is the loss function - you can say that \( r_{t}=-l\left(a_{t},s_{t}\right) \). This allows us to calculate the derivative of reward/loss w.r.t the chosen action. In RL, only get to see reward for specific action. This is called bandit feedback, and one of the main reasons for need of exploration- we don't know if actions took were the best ones.
Not Shay- based on Karpathy blog (3) Delayed reward- may not see it at all until the end of episode, and not know exact effect on individual actions.
-