33 Matching Annotations
  1. Apr 2024
  2. Mar 2024
  3. Aug 2023
    1. a DAG

      More specifically, "multiple inputs".

      A DAG itself doesn't necessitate buffering; it's the need for re-using certain pieces of data that motivated the introduction of buffers in the previous paragraph.

  4. Apr 2023
    1. 1.3B InstructGPT model over outputs from a 175B GPT-3 model

      1.3B can outperform 175B on one particular task, which is instruction-following. Don't mistake this as "1.3B is magically always better than 175B". The 1.3B can do this thanks to fine-tuning.

    2. The resulting InstructGPT models are much better at following instructions than GPT-3. They also make up facts less often, and show small decreases in toxic output generation.

      Really, that's all thanks to the human feedback provided in RLHF. Since they add information to the model, those feedbacks should be considered an auxiliary data set.

    1. The AI gradually builds a model of the goal of the task by finding the reward function that best explains the human’s judgments.

      so there's no need for humans to write the reward function itself.

    2. ask for human feedback on trajectory pairs where it’s most uncertain about

      Emphasize on "most uncertain", to maximize the productivity of human attention.

  5. Sep 2022
    1. Some people just want to watch the world burn: the prevalence, psychology and politics of the ‘Need for Chaos’

      Takeaways: - People can be put into 4 categories: Low Chaos, Medium Chaos, Rebuild and High Chaos. - In US, UK, CA, and AU, around 20% people want some chaos, but a considerable fraction of this 20% want to rebuild the society non-violently. - Right wing correlates with High Chaos, but they only share the view against immgration.

  6. Dec 2021
    1. requires everyone to be equally thirsty; otherwise we’d still get bad outcomes when less-thirsty newcomers displace their thirstier counterparts

      This seems to be the key answer to my confusion.

    2. so either system serves exactly the same number of drinkers.

      Yes, but don't you end up with angry clients rather than simply pitiful ones?

    3. Well, actually they’d leave the line and try to re-enter as newcomers, but let’s suppose for the moment that we can effectively prohibit that behavior

      Assume the total number of customers is limited. With a traditional queue, you end up having:

      • A group of customers who got served in satisfactorily short period of time.
      • A group of customers who got served in unsatisfactorily long period of time. With a Landsburg/reversed queue, you will end up with:
      • A group of customers who got served in satisfactorily short period of time.
      • A group of customers who didn't get served. They will be angry.

      Is it a good deal to trade hatred towards your business for merely apparently shorter queues?

  7. Jun 2021
    1. any Verifier program that succeeds in extracting information must also be able to extract information from a protocol run where rewinding is used and no information is available in the first place

      I still don't understand this part.