26 Matching Annotations
  1. Nov 2022
    1. "Decision Transformer: Reinforcement Learning via Sequence Modeling" (Chen, NeurIPS, 2021)

      Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.

  2. Sep 2022
    1. We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
    1. AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.

      S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.

  3. Jul 2022
  4. Jun 2022
  5. May 2022
    1. Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.

    1. Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.

  6. Mar 2022
  7. Jul 2021
  8. Jun 2021
  9. Mar 2021
    1. Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.

      Same is true of reinforcement learning algorithms.

  10. Sep 2020
  11. May 2020
  12. Apr 2020
  13. Mar 2019
  14. Feb 2019
    1. We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data