4 Matching Annotations
  1. May 2023
    1. A cardinal value of Cloud Orthodoxy is convenience.

      See also tyranny of convenience in the excellent book "Hyperconnectivity and Its Discontents" (Brubaker)

  2. Jul 2022
  3. Jan 2022
    1. Meanwhile, the Web – if we can anthropomorphise for a moment – is disappointed by the distracted academics’practices.

      Indeed :(

  4. Apr 2019
    1. Reinforcement Learning

      From Shai's paper: There are several key differences between the fully general RL model and the specific case of SL. These differences makes the general RL problem much harder.

      (1) In SL \(a_t\) and \(s_{t+1}\) are independent:

      • we can take our sample set and look for a good predictor over it (this is like a policy in RL). In RL, the sample set is dependent on the policy - training data generation tied to policy learning problem
      • An action at round \( t \) may have an effect for in the future whereas in SL only local- current reward.

      (2) In SL the problem definition is such that we have knowledge of the reward for every action (this is the loss function - you can say that \( r_{t}=-l\left(a_{t},s_{t}\right) \). This allows us to calculate the derivative of reward/loss w.r.t the chosen action. In RL, only get to see reward for specific action. This is called bandit feedback, and one of the main reasons for need of exploration- we don't know if actions took were the best ones.

      Not Shay- based on Karpathy blog (3) Delayed reward- may not see it at all until the end of episode, and not know exact effect on individual actions.

    Annotators