Hypothesis

65 Matching Annotations

Jul 2024
en.wikipedia.org en.wikipedia.org

Monte Carlo tree search - Wikipedia

2
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
  
  The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
  
  reinforcement-learning ece457c
2. mark.crowley 22 Jul 2024
  
  in Public
  
  The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
  
  Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Monte_Carlo_tree_search
www.nature.com www.nature.com

Mastering the game of Go with deep neural networks and tree search

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  The summary paper for AlphaGo.
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

nature.com/articles/nature16961.pdf
en.wikipedia.org en.wikipedia.org

AlphaZero - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Wikipedia: AlphaZero
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaZero
arxiv.org arxiv.org

2403.07691.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
  
  reinforcement-learning ppo ece457c
Visit annotations in context

Tags

reinforcement-learning

ece457c

ppo

Annotators

mark.crowley

URL

arxiv.org/pdf/2403.07691
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

1
1. mark.crowley 15 Jul 2024
  
  in Public
  
  Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
  
  reinforcement-learning ece457c
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
Feb 2024
arxiv.org arxiv.org

2205.08192.pdf

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  T. Herlau, "Moral Reinforcement Learning Using Actual Causation," 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 2022, pp. 179-185, doi: 10.1109/ICCCR54399.2022.9790262. keywords: {Digital control;Ethics;Costs;Philosophical considerations;Toy manufacturing industry;Reinforcement learning;Forestry;Causality;Reinforcement learning;Actual Causation;Ethical reinforcement learning}
  
  ai-ethics ai-morality reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ai-ethics

ai-morality

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.08192.pdf
pdf.sciencedirectassets.com pdf.sciencedirectassets.com

Can model-free reinforcement learning explain deontological moral judgments?

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  Can model-free reinforcement learning explain deontological moraljudgments?Alisabeth AyarsUniversity of Arizona, Dept. of Psychology, Tucson, AZ, USA
  
  ai-morality ai-ethics reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ai-ethics

ai-morality

Annotators

mark.crowley

URL

pdf.sciencedirectassets.com/271061/1-s2.0-S0010027716X00030/1-s2.0-S0010027716300300/am.pdf
Nov 2023
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

reinforcement-learning

rdgrp-f23

transformers

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

reinforcement-learning

rdgrp-f23

transformers

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
Oct 2023
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

generative-models

rdgrp-f23

minecraft

transformers

reinforcement-learning

reading_group_crowley

minerl

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2305.15486.pdf

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
  
  reinforcement-learning nlp large-language-models chatgpt minecraft evaluation-methods rdgrp-f23
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
  
  Them's fighten' words!
  
  I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
  
  The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
  
  reinforcement-learning rdgrp-f23 reading_group_crowley nlp larg deep-learning self-supervised supervised-learning evaluation-methods
Visit annotations in context

Tags

rdgrp-f23

reinforcement-learning

evaluation-methods

chatgpt

minecraft

large-language-models

deep-learning

supervised-learning

reading_group_crowley

larg

nlp

self-supervised

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
arxiv.org arxiv.org

2203.02155.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Training language models to follow instructionswith human feedback
  
  Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
  
  large-language-models reinforcement-learning chatgpt
Visit annotations in context

Tags

large-language-models

chatgpt

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.02155
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

reinforcement-learning

to-read

ece457c

atari-games

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf
Sep 2023
arxiv.org arxiv.org

1908.01046.pdf

1
1. mark.crowley 15 Sep 2023
  
  in Public
  
  Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
  
  autonomous-driving multi-agent-reinforcement-learning black-box-testing
Visit annotations in context

Tags

multi-agent-reinforcement-learning

black-box-testing

autonomous-driving

Annotators

mark.crowley

URL

arxiv.org/pdf/1908.01046.pdf
Jul 2023
proceedings.mlr.press proceedings.mlr.press

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA: Scalable Distributed Deep-RL with Importance WeightedActor-Learner Architectures
  
  (Espeholt, ICML, 2018) "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures"
  
  reinforcement-learning impala
Visit annotations in context

Tags

reinforcement-learning

impala

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
proceedings.mlr.press proceedings.mlr.press

Deterministic Policy Gradient Algorithms

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduced the DPG Algorithm
  
  DPG reinforcement-learning
Visit annotations in context

Tags

DPG

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v32/silver14.pdf
openreview.net openreview.net

babyai_a_platform_to_study_the.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Link to page with information about the paper: https://openreview.net/forum?id=rJeXCo0cYX
  
  reinforcement-learning curriculum-learning grid-world babyai
Visit annotations in context

Tags

reinforcement-learning

babyai

grid-world

curriculum-learning

Annotators

mark.crowley

URL

openreview.net/pdf
openreview.net openreview.net

a_path_towards_autonomous_mach.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Yann LeCun released his vision for the future of Artificial Intelligence research in 2022, and it sounds a lot like Reinforcement Learning.
  
  reinforcement-learning agi
Visit annotations in context

Tags

reinforcement-learning

agi

Annotators

mark.crowley

URL

openreview.net/pdf
www.cs.toronto.edu www.cs.toronto.edu

dqn.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  The paper that introduced the DQN algorithm for using Deep Learning with Reinforcement Learning to play Atari game.
  
  reinforcement-learning dqn atari-games deep-learning
Visit annotations in context

Tags

reinforcement-learning

dqn

deep-learning

atari-games

Annotators

mark.crowley

URL

cs.toronto.edu/~vmnih/docs/dqn.pdf
arxiv.org arxiv.org

1509.06461.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that evaluated the existing Double Q-Learning algorithm on the new DQN approach and validated that it is very effective in the Deep RL realm.
  
  reinforcement-learning dqn deep-learning
Visit annotations in context

Tags

reinforcement-learning

dqn

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.06461.pdf
arxiv.org arxiv.org

1509.02971.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
  
  ddpg reinforcement-learning SAC DPG PPO
Visit annotations in context

Tags

DPG

PPO

ddpg

SAC

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.02971.pdf
arxiv.org arxiv.org

1710.02298.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This famous paper gives a great review of the DQN algorithm a couple years after it changed everything in Deep RL. It compares six different extensions to DQN for Deep Reinforcement Learning, many of which have now become standard additions to DQN and other Deep RL algorithms. It also combines all of them together to produce the "rainbow" algorithm, which outperformed many other models for a while.
  
  reinforcement-learning experimental-design
Visit annotations in context

Tags

reinforcement-learning

experimental-design

Annotators

mark.crowley

URL

arxiv.org/pdf/1710.02298
arxiv.org arxiv.org

2104.10986.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Arxiv paper from 2021 on reinforcement learning in a scenario where your aim is to learn a workable POMDP policy, but you start with a fully observable MDP and adjust it over time towards a POMDP.
  
  reinforcement-learning pomdp mdp
Visit annotations in context

Tags

reinforcement-learning

pomdp

mdp

Annotators

mark.crowley

URL

arxiv.org/pdf/2104.10986.pdf
arxiv.org arxiv.org

1707.06347.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

reinforcement-learning

trpo

policy-gradients

ppo

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  Introduction of VPT : New semi-supervied pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

minecraft

reinforcement-learning

pretrained-models

proj-minerl

foundation-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
arxiv.org arxiv.org

Liang15.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  Response paper to DQN showing that well designed Value Function Approximations can also do well at these complex tasks without the use of Deep Learning
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  dqn reinforcement-learning atari-games deep-learning shallow-learning
Visit annotations in context

Tags

shallow-learning

deep-learning

reinforcement-learning

dqn

atari-games

Annotators

mark.crowley

URL

arxiv.org/pdf/1512.01563.pdf
arxiv.org arxiv.org

1511.05952.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Tom Schaul, John Quan, Ioannis Antonoglou and David Silver. "PRIORITIZED EXPERIENCE REPLAY", ICLR, 2016.
  
  reinforcement-learning ppo deep-learning deep-rl policy-gradient direct-policy-search trust-region
Visit annotations in context

Tags

policy-gradient

deep-learning

deep-rl

reinforcement-learning

direct-policy-search

trust-region

ppo

Annotators

mark.crowley

URL

arxiv.org/pdf/1511.05952.pdf
Jun 2023
www.fandm.edu www.fandm.edu

617813975725918530-aamas2016-shallow-rl.pdf

1
1. mark.crowley 16 Jun 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  reinforcement-learning dqn deep-learning shallow-learning atari-games uwece457C
Visit annotations in context

Tags

shallow-learning

deep-learning

uwece457C

reinforcement-learning

dqn

atari-games

Annotators

mark.crowley

URL

fandm.edu/uploads/files/617813975725918530-aamas2016-shallow-rl.pdf
assets.pubpub.org assets.pubpub.org

01621566588509.pdf

1
1. mark.crowley 09 Jun 2023
  
  in Public
  
  LeBlanc, D. G., & Lee, G. (2021). General Deep Reinforcement Learning in NES Games. Canadian AI 2021. Canadian Artificial Intelligence Association (CAIAC). https://doi.org/10.21428/594757db.8472938b
  
  canadian-ai reinforcement-learning video-games
Visit annotations in context

Tags

reinforcement-learning

video-games

canadian-ai

Annotators

mark.crowley

URL

assets.pubpub.org/uonw8d4k/01621566588509.pdf
Apr 2023
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  New supervised pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

minecraft

reinforcement-learning

pretrained-models

proj-minerl

foundation-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
Mar 2023
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

minecraft

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950.pdf
Feb 2023
arxiv.org arxiv.org

2010.03950.pdf

4
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Definition 3.2 (simple reward machine).
  
  The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
  
  reinforcement-learning reward-machines
2. mark.crowley 16 Feb 2023
  
  in Public
  
  e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
  
  So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
  
  reinforcement-learning reward-machines
3. mark.crowley 16 Feb 2023
  
  in Public
  
  However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
  
  Fascinating idea, why not? Why are we hiding the reward from the agent really?
  
  reinforcement-learning reward-machines
4. mark.crowley 02 Feb 2023
  
  in Public
  
  Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
  
  [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reinforcement-learning

reward-machines

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950.pdf
proceedings.mlr.press proceedings.mlr.press

Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

1
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
  
  [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reinforcement-learning

reward-machines

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
Dec 2022
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

conf-neurips-2022

transformers

reinforcement-learning

transfer-learning

proj-minerl

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

reinforcement-learning

conf-neurips-2022

multi-agent-reinforcement-learning

marl

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

offline-learning

transformers

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2112.09099.pdf

1
1. mark.crowley 12 Sep 2022
  
  in Public
  
  AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
  
  S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
  
  reinforcement-learning marl
Visit annotations in context

Tags

reinforcement-learning

marl

Annotators

mark.crowley

URL

arxiv.org/pdf/2112.09099.pdf
Jul 2022
ieeexplore.ieee.org ieeexplore.ieee.org

IEEE Xplore Full-Text PDF:

1
1. mark.crowley 26 Jul 2022
  
  in Public
  
  A recent overview of RL methods used for autonomous driving.
  
  reinforcement-learning autonomous-driving
Visit annotations in context

Tags

reinforcement-learning

autonomous-driving

Annotators

mark.crowley

URL

ieeexplore.ieee.org/stamp/stamp.jsp
Jun 2022
assets.pubpub.org assets.pubpub.org

01652987005906.pdf

1
1. mark.crowley 04 Jun 2022
  
  in Public
  
  Discussion on
  
  Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
  
  reinforcement-learning artificial-intelligence proj-chemgymrl digital-chemistry material-design national-research-council-of-canada CanAI2022
Visit annotations in context

Tags

national-research-council-of-canada

digital-chemistry

artificial-intelligence

reinforcement-learning

material-design

proj-chemgymrl

CanAI2022

Annotators

mark.crowley

URL

assets.pubpub.org/99r5anzw/01652987005906.pdf
May 2022
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

15756507305185 1..25

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
  
  reinforcement-learning eligibility-traces rl-course
Visit annotations in context

Tags

reinforcement-learning

eligibility-traces

rl-course

Annotators

mark.crowley

URL

ncbi.nlm.nih.gov/pmc/articles/PMC6897511/pdf/elife-47463.pdf
arxiv.org arxiv.org

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

reinforcement-learning

eligibility-traces

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/2008.10040.pdf
arxiv.org arxiv.org

1810.09967v1.pdf

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

reinforcement-learning

eligibility-traces

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.09967v1.pdf
arxiv.org arxiv.org

2102.03406.pdf

1
1. mark.crowley 27 May 2022
  
  in Public
  
  Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
  
  reinforcement-learning rl-course artificial-intelligence
Visit annotations in context

Tags

reinforcement-learning

artificial-intelligence

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/2408.03314
Mar 2022
arxiv.org arxiv.org

1907.13440.pdf

1
1. mark.crowley 23 Mar 2022
  
  in Public
  
  The paper that introduced the MineRL challenge dataset.
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.13440.pdf
Jul 2021
psyarxiv.com psyarxiv.com

Choice-confirmation bias and gradual perseveration in human reinforcement learning

1
1. lucyparfitt16 08 Jul 2021
  
  in BehSci
  
  Palminteri, S. (2021). Choice-confirmation bias and gradual perseveration in human reinforcement learning [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/dpqj6
  
  is:preprint lang:en computational model confirmation bias reinforcement learning repetition bias behavioral science psychology modeling bias
Visit annotations in context

Tags

confirmation bias

is:preprint

behavioral science

psychology

bias

modeling

reinforcement learning

lang:en

computational model

repetition bias

Annotators

lucyparfitt16

URL

psyarxiv.com/dpqj6/
Jun 2021
psyarxiv.com psyarxiv.com

Reinforcement Learning Based Decision Support Tool For Epidemic Control

1
1. lucyparfitt16 30 Jun 2021
  
  in BehSci
  
  Chadi, M.-A., & Mousannif, H. (2021). Reinforcement Learning Based Decision Support Tool For Epidemic Control [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/tcr8s
  
  is:preprint lang:en COVID-19 epidemics control reinforcement learning modeling simulation vaccine epidemiology transmission economy intervention public health policy
Visit annotations in context

Tags

economy

is:preprint

epidemiology

epidemics control

transmission

simulation

policy

COVID-19

public health

vaccine

modeling

intervention

reinforcement learning

lang:en

Annotators

lucyparfitt16

URL

psyarxiv.com/tcr8s/
Mar 2021
www.opendemocracy.net www.opendemocracy.net

Neurocapitalism

1
1. davidk01 13 Mar 2021
  
  in Public
  
  Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.
  
  Same is true of reinforcement learning algorithms.
  
  ai reinforcement learning algorithms
Visit annotations in context

Tags

ai

reinforcement learning algorithms

Annotators

davidk01

URL

opendemocracy.net/en/neurocapitalism/
Sep 2020
arxiv.org arxiv.org

The emergence of segregation: from observable markers to group specific norms

1
1. ErikStuchly 15 Sep 2020
  
  in BehSci
  
  Ozaita, J., Baronchelli, A., & Sánchez, A. (2020). The emergence of segregation: From observable markers to group specific norms. ArXiv:2009.05354 [Physics, q-Bio]. http://arxiv.org/abs/2009.05354
  
  is:preprint lang:en social trait social norm observable marker segregation emergence modeling ethnicity strategy conformity greed game reinforcement learning
Visit annotations in context

Tags

is:preprint

social trait

emergence

conformity

social norm

game

greed

modeling

observable marker

ethnicity

strategy

reinforcement learning

lang:en

segregation

Annotators

ErikStuchly

URL

arxiv.org/abs/2009.05354
journals.sagepub.com journals.sagepub.com

Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? - Vera U. Ludwig, Kirk Warren Brown, Judson A. Brewer, 2020

1
1. ErikStuchly 08 Sep 2020
  
  in BehSci
  
  Ludwig, V. U., Brown, K. W., & Brewer, J. A. (2020). Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? Perspectives on Psychological Science, 1745691620931460. https://doi.org/10.1177/1745691620931460
  
  is:article lang:en self-regulation awareness reward behavior change motivation value satisfaction reinforcement learning valuation sustainability behavioral science
Visit annotations in context

Tags

value

behavioral science

behavior change

satisfaction

awareness

is:article

valuation

self-regulation

reward

sustainability

reinforcement learning

motivation

lang:en

Annotators

ErikStuchly

URL

journals.sagepub.com/doi/abs/10.1177/1745691620931460
May 2020
psyarxiv.com psyarxiv.com

On the convergent validity of risk sensitivity measures

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Radulescu, A., Holmes, K., & Niv, Y. (2020). On the convergent validity of risk sensitivity measures [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/qdhx4
  
  is:preprint lang:en decision-making reinforcement learning risk sensitivity risk behavior trait measure instrument risk paradigm
Visit annotations in context

Tags

decision-making

trait

is:preprint

risk sensitivity

instrument

behavior

risk

measure

reinforcement learning

risk paradigm

lang:en

Annotators

Marlene_Wulf

URL

psyarxiv.com/qdhx4/
psyarxiv.com psyarxiv.com

Protection from uncertainty in the exploration/exploitation trade-off

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  https://twitter.com/SciBeh/status/1255403798463471616
  
  is:preprint lang:en uncertainty exploration exploitation tradeoff attention cognition decision-making reinforcement learning
Visit annotations in context

Tags

decision-making

is:preprint

exploration

attention

reinforcement

tradeoff

learning

cognition

lang:en

uncertainty

exploitation

Annotators

Marlene_Wulf

URL

psyarxiv.com/5y643/
psyarxiv.com psyarxiv.com

Cognitive learning processes account for asymmetries in adaptations to new social norms

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Hertz, U. (2020). Cognitive learning processes account for asymmetries in adaptations to new social norms [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/7thku
  
  is:preprint lang:en COVID-19 reinforcement learning social norm social cognition pandemic adaption behavior computational modeling
Visit annotations in context

Tags

is:preprint

adaption

social norm

social cognition

COVID-19

reinforcement

pandemic

behavior

learning

lang:en

computational modeling

Annotators

Marlene_Wulf

URL

psyarxiv.com/7thku/
Apr 2020
psyarxiv.com psyarxiv.com

The elusive effects of incidental anxiety on reinforcement-learning

1
1. edampf 23 Apr 2020
  
  in BehSci
  
  Ting, C., Palminteri, S., Lebreton, M., & Engelmann, J. B. (2020, March 25). The elusive effects of incidental anxiety on reinforcement-learning. https://doi.org/10.31234/osf.io/7d4tc MLA
  
  is:preprint lang:en anxiety computation modeling threat shock valence-induced bias learning context-dependent reinforcement neuroscience decision making
Visit annotations in context

Tags

learning

is:preprint

decision making

context-dependent

valence-induced bias

neuroscience

anxiety

reinforcement

modeling

computation

shock

lang:en

threat

Annotators

edampf

URL

psyarxiv.com/7d4tc/
Mar 2019
cjc.ict.ac.cn cjc.ict.ac.cn

liuq-201811662728.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning review
Visit annotations in context

Tags

reinforcement-learning

review

Annotators

haiy

URL

cjc.ict.ac.cn/online/onlinepaper/liuq-201811662728.pdf
cjc.ict.ac.cn cjc.ict.ac.cn

lq-2017119103322.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning tutorial
Visit annotations in context

Tags

reinforcement-learning

tutorial

Annotators

haiy

URL

cjc.ict.ac.cn/online/cre/lq-2017119103322.pdf
github.com github.com

dennybritz/reinforcement-learning

1
1. haiy 08 Mar 2019
  
  in Public
  
  reinforcement-learning code and paper tutorials
  
  reinforcement-learning valuable tutorial
Visit annotations in context

Tags

reinforcement-learning

tutorial

valuable

Annotators

haiy

URL

github.com/dennybritz/reinforcement-learning
Feb 2019
gitee.com gitee.com

SuttonBartoIPRLBook2ndEd.pdf

1
1. haiy 21 Feb 2019
  
  in Public
  
  reinforcement-learning book
Visit annotations in context

Tags

reinforcement-learning

book

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/SuttonBartoIPRLBook2ndEd.pdf
gitee.com gitee.com

强化学习在阿里的技术演进与业务创新.pdf

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/强化学习在阿里的技术演进与业务创新.pdf
gitee.com gitee.com

nips_oral6

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning ppt
Visit annotations in context

Tags

reinforcement-learning

ppt

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/2017NIPS大会Facebook人工智能研究院演讲.pdf
gitee.com gitee.com

1709.02349.pdf

1
1. haiy 19 Feb 2019
  
  in Public
  
  We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data
  
  chatbot reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

chatbot

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/1709.02349.pdf