2 Matching Annotations
  1. May 2019
    1. policy change index - machine learning on corpus of text to identify and predict policy changes in China

  2. Mar 2019
    1. A potential draw-back with such pre-training approach is that themodel may suffer from the mismatch of dialoguestate distributions between supervised training andinteractive learning stages. While interacting withusers, the agent’s response at each turn has a di-rect influence on the distribution of dialogue statethat the agent will operate on in the upcoming di-alogue turns.

      策略学习也是对话过程很重要的一环。 最近的策略学习过程有用基于有监督的预训练然后线上强化学习再训练的来提高学习的方案。但是这种方案有个潜在的毛病,在离线的数据中受限于数据量,线上一旦碰到了不常见的情况,容易直接恢复不来。(这个问题应该只是推断吧?有什么实证么?)

      所以本文其实想说的是用一种方法来减轻线上和离线的差距。