Hypothesis

4 Matching Annotations

May 2023
jon-e.net jon-e.net

Surveillance Graphs

1
1. rtk 29 May 2023
  
  in Public
  
  A cardinal value of Cloud Orthodoxy is convenience.
  
  See also tyranny of convenience in the excellent book "Hyperconnectivity and Its Discontents" (Brubaker)
Visit annotations in context

Annotators

rtk

URL

jon-e.net/surveillance-graphs
Jul 2022
s3.us-west-2.amazonaws.com s3.us-west-2.amazonaws.com

RP-Argumentation2022ps.pdf

1
1. rtk 05 Jul 2022
  
  in Public
  
  They simply operate on the outputs left by others.
  
  stigmergy
Visit annotations in context

Annotators

rtk

URL

s3.us-west-2.amazonaws.com/secure.notion-static.com/a81a1f8d-fbc9-4a2c-bd3a-ecbeed18bae8/RP-Argumentation2022ps.pdf
Jan 2022
s3.us-west-2.amazonaws.com s3.us-west-2.amazonaws.com

Linked Research on the Decentralised Web.pdf

1
1. rtk 28 Jan 2022
  
  in Public
  
  Meanwhile, the Web – if we can anthropomorphise for a moment – is disappointed by the distracted academics’practices.
  
  Indeed :(
Visit annotations in context

Annotators

rtk

URL

s3.us-west-2.amazonaws.com/secure.notion-static.com/b6000244-350b-49d9-afdf-783140e12d8c/Linked_Research_on_the_Decentralised_Web.pdf
Apr 2019
Local file Local file

cs231n_2018_lecture14-3.pdf

1
1. rtk 28 Apr 2019
  
  in Public
  
  Reinforcement Learning
  
  From Shai's paper: There are several key differences between the fully general RL model and the specific case of SL. These differences makes the general RL problem much harder.
  
  (1) In SL \(a_t\) and \(s_{t+1}\) are independent:
  
  we can take our sample set and look for a good predictor over it (this is like a policy in RL). In RL, the sample set is dependent on the policy - training data generation tied to policy learning problem
  
  An action at round \( t \) may have an effect for in the future whereas in SL only local- current reward.
  
  (2) In SL the problem definition is such that we have knowledge of the reward for every action (this is the loss function - you can say that \( r_{t}=-l\left(a_{t},s_{t}\right) \). This allows us to calculate the derivative of reward/loss w.r.t the chosen action. In RL, only get to see reward for specific action. This is called bandit feedback, and one of the main reasons for need of exploration- we don't know if actions took were the best ones.
  
  Not Shay- based on Karpathy blog (3) Delayed reward- may not see it at all until the end of episode, and not know exact effect on individual actions.
Annotators

rtk

Annotators

URL

Annotators

URL

Annotators

URL

Annotators