20 Matching Annotations
  1. Jul 2023
    1. Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.

      TRPO defines the problem as a straight optimization problem, no learning is actually involved.

  2. Aug 2022
  3. Feb 2022
  4. Nov 2021
  5. Jun 2021
  6. Aug 2020
  7. Jul 2020
  8. Jun 2020
  9. May 2020
  10. Dec 2019
    1. Regarding recommended practices in international ethical policy documents, these are not sufficiently disseminated or internalized, hence gaps still exist in relation to best practices and critical aspects of data practices. To address this challenge, it is not only essential to disseminate and promote these policies, but to also adapt them to the contexts and situations where they are applicable through training and capacity building.

      Given that the article is framed as being about policy diffusion and using a policy learning framework, I would have expected more details here.

  11. Nov 2019
    1. Private post-secondary institutions that provide educational services in the State of New Mexico are subject to either the New Mexico Post-Secondary Educational Institution Act (Section 21-23-1 et seq. NMSA 1978) or the Interstate Distance Education Act (Section 21-23B-1 et seq. NMSA 1978) and can use this site to apply for State Authorization or submit other required applications to comply with State regulations. Students may request transcripts of closed schools where the New Mexico Higher Education Department is the designated custodian of records or may file complaints against any post-secondary institution that provides educational services in our State.

      The NMHE website is about providing academic, financial and policies to new mexico public higher education institutions and community.

  12. May 2019
    1. policy change index - machine learning on corpus of text to identify and predict policy changes in China

  13. Mar 2019
    1. A potential draw-back with such pre-training approach is that themodel may suffer from the mismatch of dialoguestate distributions between supervised training andinteractive learning stages. While interacting withusers, the agent’s response at each turn has a di-rect influence on the distribution of dialogue statethat the agent will operate on in the upcoming di-alogue turns.

      策略学习也是对话过程很重要的一环。 最近的策略学习过程有用基于有监督的预训练然后线上强化学习再训练的来提高学习的方案。但是这种方案有个潜在的毛病,在离线的数据中受限于数据量,线上一旦碰到了不常见的情况,容易直接恢复不来。(这个问题应该只是推断吧?有什么实证么?)

      所以本文其实想说的是用一种方法来减轻线上和离线的差距。