Hypothesis

68 Matching Annotations

Oct 2025
www.youtube.com www.youtube.com

Richard Sutton – Father of RL thinks LLMs are a dead end

1
1. stopresetgo 07 Oct 2025
  
  in Public
  
  for - like - Michael Levins - Richard Sutton - youtube interview
  
  Summary - interesting talk on learning - reminds me of Michael Levin's work - the priority is on goal directed activity
  
  Reinforcement Learning Richard Sutton adjacency - Richard Sutton - Michael Levin
Visit annotations in context

Tags

Richard Sutton

adjacency - Richard Sutton - Michael Levin

Reinforcement Learning

Annotators

stopresetgo

URL

youtube.com/watch
Jun 2025
www.cs.toronto.edu www.cs.toronto.edu

dqn.pdf

1
1. mark.crowley 09 Jun 2025
  
  in Public
  
  Playing Atari with Deep Reinforcement Learning 19 Dec 2013 · Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
  
  The paper from 2013 that introduced the DQN algorithm for using Deep Learning with Reinforcement Learning to play Atari game.
  
  reinforcement-learning dqn atari-games deep-learning
Visit annotations in context

Tags

atari-games

dqn

reinforcement-learning

deep-learning

Annotators

mark.crowley

URL

cs.toronto.edu/~vmnih/docs/dqn.pdf
May 2025
storage.googleapis.com storage.googleapis.com

The%20Era%20of%20Experience%20Paper.pdf

1
1. mark.crowley 13 May 2025
  
  in Public
  
  Welcome to the Era of ExperienceDavid Silver, Richard S. Sutton
  
  Welcome to the Era of Experience David Silver, Richard S. Sutton
  
  "This is a preprint of a chapter that will appear in the book Designing an Intelligence, published by MIT Press"
  
  #reinforcement-learning
Visit annotations in context

Tags

#reinforcement-learning

Annotators

mark.crowley

URL

storage.googleapis.com/deepmind-media/Era-of-Experience /The Era of Experience Paper.pdf
Jan 2025
openreview.net openreview.net

74_Mapping_Social_Choice_Theor.pdf

1
1. mark.crowley 31 Jan 2025
  
  in Public
  
  MAPPING SOCIAL CHOICE THEORY TO RLHF Jessica Dai and Eve Fleisig ICLR Workshop on Reliable and Responsible Foundation Models 2024
  
  Nice overview of how social choice theory has been connected to RLHF and AI alignment ideas.
  
  #ai-morality align rlhf llm #reinforcement-learning
Visit annotations in context

Tags

align

rlhf

#ai-morality

llm

#reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
Jul 2024
en.wikipedia.org en.wikipedia.org

Monte Carlo tree search - Wikipedia

2
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
  
  The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
  
  reinforcement-learning ece457c
2. mark.crowley 22 Jul 2024
  
  in Public
  
  The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
  
  Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Monte_Carlo_tree_search
www.nature.com www.nature.com

Mastering the game of Go with deep neural networks and tree search

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  The summary paper for AlphaGo.
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

nature.com/articles/nature16961.pdf
en.wikipedia.org en.wikipedia.org

AlphaZero - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Wikipedia: AlphaZero
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaZero
arxiv.org arxiv.org

2403.07691.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
  
  reinforcement-learning ppo ece457c
Visit annotations in context

Tags

reinforcement-learning

ppo

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2403.07691
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

1
1. mark.crowley 15 Jul 2024
  
  in Public
  
  Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
  
  reinforcement-learning ece457c
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
Feb 2024
arxiv.org arxiv.org

2205.08192.pdf

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  T. Herlau, "Moral Reinforcement Learning Using Actual Causation," 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 2022, pp. 179-185, doi: 10.1109/ICCCR54399.2022.9790262. keywords: {Digital control;Ethics;Costs;Philosophical considerations;Toy manufacturing industry;Reinforcement learning;Forestry;Causality;Reinforcement learning;Actual Causation;Ethical reinforcement learning}
  
  ai-ethics ai-morality reinforcement-learning
Visit annotations in context

Tags

ai-morality

ai-ethics

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.08192.pdf
pdf.sciencedirectassets.com pdf.sciencedirectassets.com

Can model-free reinforcement learning explain deontological moral judgments?

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  Can model-free reinforcement learning explain deontological moraljudgments?Alisabeth AyarsUniversity of Arizona, Dept. of Psychology, Tucson, AZ, USA
  
  ai-morality ai-ethics reinforcement-learning
Visit annotations in context

Tags

ai-morality

ai-ethics

reinforcement-learning

Annotators

mark.crowley

URL

pdf.sciencedirectassets.com/271061/1-s2.0-S0010027716X00030/1-s2.0-S0010027716300300/am.pdf
Nov 2023
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

rdgrp-f23

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

rdgrp-f23

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
Oct 2023
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

minecraft

reinforcement-learning

rdgrp-f23

minerl

generative-models

transformers

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2305.15486.pdf

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
  
  reinforcement-learning nlp large-language-models chatgpt minecraft evaluation-methods rdgrp-f23
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
  
  Them's fighten' words!
  
  I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
  
  The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
  
  reinforcement-learning rdgrp-f23 reading_group_crowley nlp larg deep-learning self-supervised supervised-learning evaluation-methods
Visit annotations in context

Tags

self-supervised

nlp

larg

minecraft

rdgrp-f23

evaluation-methods

supervised-learning

reinforcement-learning

deep-learning

large-language-models

reading_group_crowley

chatgpt

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
arxiv.org arxiv.org

2203.02155.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Training language models to follow instructionswith human feedback
  
  Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
  
  large-language-models reinforcement-learning chatgpt
Visit annotations in context

Tags

chatgpt

large-language-models

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.02155
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

atari-games

to-read

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf
Sep 2023
arxiv.org arxiv.org

1908.01046.pdf

1
1. mark.crowley 15 Sep 2023
  
  in Public
  
  Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
  
  autonomous-driving multi-agent-reinforcement-learning black-box-testing
Visit annotations in context

Tags

black-box-testing

autonomous-driving

multi-agent-reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1908.01046.pdf
Jul 2023
proceedings.mlr.press proceedings.mlr.press

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA: Scalable Distributed Deep-RL with Importance WeightedActor-Learner Architectures
  
  (Espeholt, ICML, 2018) "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures"
  
  reinforcement-learning impala
Visit annotations in context

Tags

impala

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
proceedings.mlr.press proceedings.mlr.press

Deterministic Policy Gradient Algorithms

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduced the DPG Algorithm
  
  DPG reinforcement-learning
Visit annotations in context

Tags

DPG

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v32/silver14.pdf
openreview.net openreview.net

babyai_a_platform_to_study_the.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Link to page with information about the paper: https://openreview.net/forum?id=rJeXCo0cYX
  
  reinforcement-learning curriculum-learning grid-world babyai
Visit annotations in context

Tags

grid-world

babyai

reinforcement-learning

curriculum-learning

Annotators

mark.crowley

URL

openreview.net/pdf
openreview.net openreview.net

a_path_towards_autonomous_mach.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Yann LeCun released his vision for the future of Artificial Intelligence research in 2022, and it sounds a lot like Reinforcement Learning.
  
  reinforcement-learning agi
Visit annotations in context

Tags

agi

reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
arxiv.org arxiv.org

Deep Reinforcement Learning with Double Q-learning

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that evaluated the existing Double Q-Learning algorithm on the new DQN approach and validated that it is very effective in the Deep RL realm.
  
  reinforcement-learning dqn deep-learning
Visit annotations in context

Tags

dqn

reinforcement-learning

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.06461v3
arxiv.org arxiv.org

Continuous control with deep reinforcement learning

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
  
  ddpg reinforcement-learning SAC DPG PPO
Visit annotations in context

Tags

DPG

ddpg

SAC

PPO

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.02971
arxiv.org arxiv.org

1710.02298.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This famous paper gives a great review of the DQN algorithm a couple years after it changed everything in Deep RL. It compares six different extensions to DQN for Deep Reinforcement Learning, many of which have now become standard additions to DQN and other Deep RL algorithms. It also combines all of them together to produce the "rainbow" algorithm, which outperformed many other models for a while.
  
  reinforcement-learning experimental-design
Visit annotations in context

Tags

experimental-design

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1710.02298
arxiv.org arxiv.org

2104.10986.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Arxiv paper from 2021 on reinforcement learning in a scenario where your aim is to learn a workable POMDP policy, but you start with a fully observable MDP and adjust it over time towards a POMDP.
  
  reinforcement-learning pomdp mdp
Visit annotations in context

Tags

pomdp

mdp

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2104.10986.pdf
arxiv.org arxiv.org

1707.06347.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

trpo

policy-gradients

reinforcement-learning

ppo

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  Introduction of VPT : New semi-supervied pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

minecraft

proj-minerl

reinforcement-learning

foundation-models

pretrained-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
arxiv.org arxiv.org

Liang15.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  Response paper to DQN showing that well designed Value Function Approximations can also do well at these complex tasks without the use of Deep Learning
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  dqn reinforcement-learning atari-games deep-learning shallow-learning
Visit annotations in context

Tags

dqn

deep-learning

atari-games

shallow-learning

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1512.01563
arxiv.org arxiv.org

1511.05952.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Tom Schaul, John Quan, Ioannis Antonoglou and David Silver. "PRIORITIZED EXPERIENCE REPLAY", ICLR, 2016.
  
  reinforcement-learning ppo deep-learning deep-rl policy-gradient direct-policy-search trust-region
Visit annotations in context

Tags

policy-gradient

deep-learning

deep-rl

trust-region

reinforcement-learning

ppo

direct-policy-search

Annotators

mark.crowley

URL

arxiv.org/pdf/1511.05952.pdf
Jun 2023
www.fandm.edu www.fandm.edu

617813975725918530-aamas2016-shallow-rl.pdf

1
1. mark.crowley 16 Jun 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  reinforcement-learning dqn deep-learning shallow-learning atari-games uwece457C
Visit annotations in context

Tags

shallow-learning

dqn

deep-learning

atari-games

uwece457C

reinforcement-learning

Annotators

mark.crowley

URL

fandm.edu/uploads/files/617813975725918530-aamas2016-shallow-rl.pdf
assets.pubpub.org assets.pubpub.org

01621566588509.pdf

1
1. mark.crowley 09 Jun 2023
  
  in Public
  
  LeBlanc, D. G., & Lee, G. (2021). General Deep Reinforcement Learning in NES Games. Canadian AI 2021. Canadian Artificial Intelligence Association (CAIAC). https://doi.org/10.21428/594757db.8472938b
  
  canadian-ai reinforcement-learning video-games
Visit annotations in context

Tags

canadian-ai

reinforcement-learning

video-games

Annotators

mark.crowley

URL

assets.pubpub.org/uonw8d4k/01621566588509.pdf
Apr 2023
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  New supervised pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

minecraft

proj-minerl

reinforcement-learning

foundation-models

pretrained-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
Mar 2023
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

minecraft

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950
Feb 2023
arxiv.org arxiv.org

2010.03950.pdf

4
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Definition 3.2 (simple reward machine).
  
  The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
  
  reinforcement-learning reward-machines
2. mark.crowley 16 Feb 2023
  
  in Public
  
  e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
  
  So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
  
  reinforcement-learning reward-machines
3. mark.crowley 16 Feb 2023
  
  in Public
  
  However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
  
  Fascinating idea, why not? Why are we hiding the reward from the agent really?
  
  reinforcement-learning reward-machines
4. mark.crowley 02 Feb 2023
  
  in Public
  
  Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
  
  [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950
proceedings.mlr.press proceedings.mlr.press

Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

1
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
  
  [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
Dec 2022
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

proj-minerl

conf-neurips-2022

transformers

reinforcement-learning

transfer-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

marl

reinforcement-learning

multi-agent-reinforcement-learning

conf-neurips-2022

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

transformers

reinforcement-learning

offline-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2112.09099.pdf

1
1. mark.crowley 12 Sep 2022
  
  in Public
  
  AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
  
  S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
  
  reinforcement-learning marl
Visit annotations in context

Tags

marl

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2112.09099.pdf
Jul 2022
ieeexplore.ieee.org ieeexplore.ieee.org

IEEE Xplore Full-Text PDF:

1
1. mark.crowley 26 Jul 2022
  
  in Public
  
  A recent overview of RL methods used for autonomous driving.
  
  reinforcement-learning autonomous-driving
Visit annotations in context

Tags

reinforcement-learning

autonomous-driving

Annotators

mark.crowley

URL

ieeexplore.ieee.org/stamp/stamp.jsp
Jun 2022
assets.pubpub.org assets.pubpub.org

01652987005906.pdf

1
1. mark.crowley 04 Jun 2022
  
  in Public
  
  Discussion on
  
  Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
  
  reinforcement-learning artificial-intelligence proj-chemgymrl digital-chemistry material-design national-research-council-of-canada CanAI2022
Visit annotations in context

Tags

digital-chemistry

proj-chemgymrl

artificial-intelligence

national-research-council-of-canada

CanAI2022

material-design

reinforcement-learning

Annotators

mark.crowley

URL

assets.pubpub.org/99r5anzw/01652987005906.pdf
May 2022
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

15756507305185 1..25

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
  
  reinforcement-learning eligibility-traces rl-course
Visit annotations in context

Tags

reinforcement-learning

eligibility-traces

rl-course

Annotators

mark.crowley

URL

ncbi.nlm.nih.gov/pmc/articles/PMC6897511/pdf/elife-47463.pdf
arxiv.org arxiv.org

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

reinforcement-learning

eligibility-traces

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/2008.10040.pdf
arxiv.org arxiv.org

1810.09967v1.pdf

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

reinforcement-learning

eligibility-traces

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.09967v1.pdf
arxiv.org arxiv.org

2102.03406.pdf

1
1. mark.crowley 27 May 2022
  
  in Public
  
  Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
  
  reinforcement-learning rl-course artificial-intelligence
Visit annotations in context

Tags

reinforcement-learning

artificial-intelligence

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/2509.25140
Mar 2022
arxiv.org arxiv.org

1907.13440.pdf

1
1. mark.crowley 23 Mar 2022
  
  in Public
  
  The paper that introduced the MineRL challenge dataset.
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.13440.pdf
Jul 2021
psyarxiv.com psyarxiv.com

Choice-confirmation bias and gradual perseveration in human reinforcement learning

1
1. lucyparfitt16 08 Jul 2021
  
  in BehSci
  
  Palminteri, S. (2021). Choice-confirmation bias and gradual perseveration in human reinforcement learning [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/dpqj6
  
  is:preprint lang:en computational model confirmation bias reinforcement learning repetition bias behavioral science psychology modeling bias
Visit annotations in context

Tags

confirmation bias

reinforcement learning

computational model

psychology

modeling

lang:en

behavioral science

repetition bias

bias

is:preprint

Annotators

lucyparfitt16

URL

psyarxiv.com/dpqj6/
Jun 2021
psyarxiv.com psyarxiv.com

Reinforcement Learning Based Decision Support Tool For Epidemic Control

1
1. lucyparfitt16 30 Jun 2021
  
  in BehSci
  
  Chadi, M.-A., & Mousannif, H. (2021). Reinforcement Learning Based Decision Support Tool For Epidemic Control [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/tcr8s
  
  is:preprint lang:en COVID-19 epidemics control reinforcement learning modeling simulation vaccine epidemiology transmission economy intervention public health policy
Visit annotations in context

Tags

policy

reinforcement learning

transmission

economy

vaccine

modeling

intervention

public health

lang:en

COVID-19

epidemiology

is:preprint

simulation

epidemics control

Annotators

lucyparfitt16

URL

psyarxiv.com/tcr8s/
Mar 2021
www.opendemocracy.net www.opendemocracy.net

Neurocapitalism

1
1. davidk01 13 Mar 2021
  
  in Public
  
  Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.
  
  Same is true of reinforcement learning algorithms.
  
  ai reinforcement learning algorithms
Visit annotations in context

Tags

reinforcement learning algorithms

ai

Annotators

davidk01

URL

opendemocracy.net/en/neurocapitalism/
Sep 2020
arxiv.org arxiv.org

The emergence of segregation: from observable markers to group specific norms

1
1. ErikStuchly 15 Sep 2020
  
  in BehSci
  
  Ozaita, J., Baronchelli, A., & Sánchez, A. (2020). The emergence of segregation: From observable markers to group specific norms. ArXiv:2009.05354 [Physics, q-Bio]. http://arxiv.org/abs/2009.05354
  
  is:preprint lang:en social trait social norm observable marker segregation emergence modeling ethnicity strategy conformity greed game reinforcement learning
Visit annotations in context

Tags

social trait

reinforcement learning

ethnicity

observable marker

modeling

conformity

social norm

lang:en

greed

game

is:preprint

segregation

emergence

strategy

Annotators

ErikStuchly

URL

arxiv.org/abs/2009.05354
journals.sagepub.com journals.sagepub.com

Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? - Vera U. Ludwig, Kirk Warren Brown, Judson A. Brewer, 2020

1
1. ErikStuchly 08 Sep 2020
  
  in BehSci
  
  Ludwig, V. U., Brown, K. W., & Brewer, J. A. (2020). Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? Perspectives on Psychological Science, 1745691620931460. https://doi.org/10.1177/1745691620931460
  
  is:article lang:en self-regulation awareness reward behavior change motivation value satisfaction reinforcement learning valuation sustainability behavioral science
Visit annotations in context

Tags

reinforcement learning

is:article

motivation

value

valuation

reward

awareness

satisfaction

lang:en

self-regulation

behavioral science

sustainability

behavior change

Annotators

ErikStuchly

URL

journals.sagepub.com/doi/abs/10.1177/1745691620931460
May 2020
psyarxiv.com psyarxiv.com

On the convergent validity of risk sensitivity measures

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Radulescu, A., Holmes, K., & Niv, Y. (2020). On the convergent validity of risk sensitivity measures [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/qdhx4
  
  is:preprint lang:en decision-making reinforcement learning risk sensitivity risk behavior trait measure instrument risk paradigm
Visit annotations in context

Tags

reinforcement learning

instrument

risk sensitivity

risk

decision-making

risk paradigm

trait

measure

lang:en

is:preprint

behavior

Annotators

Marlene_Wulf

URL

psyarxiv.com/qdhx4/
psyarxiv.com psyarxiv.com

Protection from uncertainty in the exploration/exploitation trade-off

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  https://twitter.com/SciBeh/status/1255403798463471616
  
  is:preprint lang:en uncertainty exploration exploitation tradeoff attention cognition decision-making reinforcement learning
Visit annotations in context

Tags

learning

attention

exploration

tradeoff

decision-making

exploitation

uncertainty

cognition

lang:en

reinforcement

is:preprint

Annotators

Marlene_Wulf

URL

psyarxiv.com/5y643/
psyarxiv.com psyarxiv.com

Cognitive learning processes account for asymmetries in adaptations to new social norms

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Hertz, U. (2020). Cognitive learning processes account for asymmetries in adaptations to new social norms [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/7thku
  
  is:preprint lang:en COVID-19 reinforcement learning social norm social cognition pandemic adaption behavior computational modeling
Visit annotations in context

Tags

learning

computational modeling

adaption

social cognition

pandemic

social norm

lang:en

reinforcement

COVID-19

is:preprint

behavior

Annotators

Marlene_Wulf

URL

psyarxiv.com/7thku/
Apr 2020
psyarxiv.com psyarxiv.com

The elusive effects of incidental anxiety on reinforcement-learning

1
1. edampf 23 Apr 2020
  
  in BehSci
  
  Ting, C., Palminteri, S., Lebreton, M., & Engelmann, J. B. (2020, March 25). The elusive effects of incidental anxiety on reinforcement-learning. https://doi.org/10.31234/osf.io/7d4tc MLA
  
  is:preprint lang:en anxiety computation modeling threat shock valence-induced bias learning context-dependent reinforcement neuroscience decision making
Visit annotations in context

Tags

learning

shock

decision making

computation

modeling

context-dependent

neuroscience

anxiety

lang:en

threat

valence-induced bias

reinforcement

is:preprint

Annotators

edampf

URL

psyarxiv.com/7d4tc/
Mar 2019
cjc.ict.ac.cn cjc.ict.ac.cn

liuq-201811662728.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning review
Visit annotations in context

Tags

review

reinforcement-learning

Annotators

haiy

URL

cjc.ict.ac.cn/online/onlinepaper/liuq-201811662728.pdf
cjc.ict.ac.cn cjc.ict.ac.cn

lq-2017119103322.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning tutorial
Visit annotations in context

Tags

tutorial

reinforcement-learning

Annotators

haiy

URL

cjc.ict.ac.cn/online/cre/lq-2017119103322.pdf
github.com github.com

dennybritz/reinforcement-learning

1
1. haiy 08 Mar 2019
  
  in Public
  
  reinforcement-learning code and paper tutorials
  
  reinforcement-learning valuable tutorial
Visit annotations in context

Tags

valuable

tutorial

reinforcement-learning

Annotators

haiy

URL

github.com/dennybritz/reinforcement-learning
Feb 2019
gitee.com gitee.com

SuttonBartoIPRLBook2ndEd.pdf

1
1. haiy 21 Feb 2019
  
  in Public
  
  reinforcement-learning book
Visit annotations in context

Tags

book

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/SuttonBartoIPRLBook2ndEd.pdf
gitee.com gitee.com

强化学习在阿里的技术演进与业务创新.pdf

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/强化学习在阿里的技术演进与业务创新.pdf
gitee.com gitee.com

nips_oral6

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning ppt
Visit annotations in context

Tags

reinforcement-learning

ppt

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/2017NIPS大会Facebook人工智能研究院演讲.pdf
gitee.com gitee.com

1709.02349.pdf

1
1. haiy 19 Feb 2019
  
  in Public
  
  We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data
  
  chatbot reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

chatbot

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/1709.02349.pdf