2 Matching Annotations
  1. Aug 2023
    1. Outer alignment asks the question - "What should we aim our model at?" In other words, is the model optimizing for the correct reward such that there are no exploitable loopholes? It is also known as the reward misspecification problem.

      [!NOTE] Outer Alignment / Reward Misspecification Promblem 是指什么?

      flashcard

      模型是否在向人类真正的目标优化

    1. Inner alignment asks the question - “Is the model trying to do what humans want it to do?”, or in other words can we robustly aim our AI optimizers at any objective function at all?

      [!NOTE] Inner Alignment 的基本思想是?

      flashcard

      确保模型确实在向特定目标优化