Hypothesis

79 Matching Annotations

May 2026
80000hours.org 80000hours.org

Untitled document

1
1. fxp007 15 May 2026
  
  in Public
  
  Reinforcement learning is evil. This is not something new. People in AI safety have been talking about the fundamental flaw in training by reinforcement learning to achieve something in the world: it gives rise to the problems of instrumental goals and reward hacking.
  
  这一强烈批评指出了强化学习的根本缺陷，即工具性目标和奖励黑客问题，对当前AI训练方法提出了重要质疑。
  
  reinforcement learning reward hacking
Visit annotations in context

Tags

reward hacking

reinforcement learning

Annotators

fxp007

URL

80000hours.org/podcast/episodes/yoshua-bengio-scientist-ai/
sakana.ai sakana.ai

Sakana AI

1
1. fxp007 08 May 2026
  
  in Public
  
  Traditional Reinforcement Learning (REINFORCE) failed because the gradients had a low signal-to-noise ratio due to binary rewards and weak parameter coupling.
  
  大多数人认为强化学习是解决复杂协调问题的理想方法，但作者明确指出传统RL方法在此类问题上完全失败，挑战了RL在AI协调中的主流应用。
  
  non-consensus reinforcement-learning gradient-methods
Visit annotations in context

Tags

gradient-methods

non-consensus

reinforcement-learning

Annotators

fxp007

URL

sakana.ai/trinity/
openai.com openai.com

https://openai.com/index/where-the-goblins-came-from/

1
1. fxp007 01 May 2026
  
  in Public
  
  The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them.
  
  关键概念解释：强化学习可能导致行为泛化，即使是在特定条件下学习的行为也可能在其他情境中表现出来。
  
  key-concept reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

key-concept

Annotators

fxp007

URL

openai.com/index/where-the-goblins-came-from/
Apr 2026
nrehiew.github.io nrehiew.github.io

https://nrehiew.github.io/blog/minimal_editing/

1
1. fxp007 23 Apr 2026
  
  in Public
  
  The fact that the RL model has larger improvements on Levenshtein Distance and Added Cognitive Complexity than on Pass@1 is further evidence that it is not just memorizing corruption reversals but has actually generalized to minimal editing.
  
  大多数人认为强化学习模型只能记住特定情况，但作者发现强化学习模型在最小化编辑任务上不仅能够记住，而且能够泛化到更广泛的场景。
  
  counterintuitive reinforcement-learning generalization
Visit annotations in context

Tags

generalization

counterintuitive

reinforcement-learning

Annotators

fxp007

URL

nrehiew.github.io/blog/minimal_editing/
arxiv.org arxiv.org

https://arxiv.org/abs/2604.02869

1
1. fxp007 08 Apr 2026
  
  in Public
  
  our GTPO hybrid advantage formulation eliminates the advantage misalignment problem
  
  大多数人认为在强化学习中，优势函数的计算和优化是一个相对直接的过程，但作者指出存在'优势不匹配问题'，并提出了GTPO混合优势公式来解决它。这挑战了强化学习中的基本假设，表明即使是优势函数这样的核心概念也需要仔细设计才能在多轮任务中有效工作。
  
  non-consensus reinforcement-learning algorithm-design
Visit annotations in context

Tags

non-consensus

reinforcement-learning

algorithm-design

Annotators

fxp007

URL

arxiv.org/abs/2604.02869
Nov 2025
www.mextesol.net www.mextesol.net

1-Bisai-Singh

1
1. thelivingnexus 02 Nov 2025
  
  in Public
  
  Hence, in a translanguaging classroom, one language is used to reinforce the performance in otherlanguages and students learn many new words from each other which enrich their vocabulary.
  
  Finding: Reinforcement across languages enriches vocabulary and performance (p. 9). Why it matters: Ties CUP to observed learning gains: useful for my analysis.
  
  findings reinforcement vocabulary
Visit annotations in context

Tags

reinforcement

findings

vocabulary

Annotators

thelivingnexus

URL

mextesol.net/journal/public/files/ae98620fbd4f9c4247095f5d97c2faf0.pdf
Oct 2025
www.youtube.com www.youtube.com

YouTube

1
1. stopresetgo 07 Oct 2025
  
  in Public
  
  for - like - Michael Levins - Richard Sutton - youtube interview
  
  Summary - interesting talk on learning - reminds me of Michael Levin's work - the priority is on goal directed activity
  
  Reinforcement Learning Richard Sutton adjacency - Richard Sutton - Michael Levin
Visit annotations in context

Tags

Reinforcement Learning

Richard Sutton

adjacency - Richard Sutton - Michael Levin

Annotators

stopresetgo

URL

youtube.com/@halfasinteresting
Jun 2025
www.cs.toronto.edu www.cs.toronto.edu

dqn.pdf

1
1. mark.crowley 09 Jun 2025
  
  in Public
  
  Playing Atari with Deep Reinforcement Learning 19 Dec 2013 · Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
  
  The paper from 2013 that introduced the DQN algorithm for using Deep Learning with Reinforcement Learning to play Atari game.
  
  reinforcement-learning dqn atari-games deep-learning
Visit annotations in context

Tags

atari-games

reinforcement-learning

deep-learning

dqn

Annotators

mark.crowley

URL

cs.toronto.edu/~vmnih/docs/dqn.pdf
May 2025
storage.googleapis.com storage.googleapis.com

The%20Era%20of%20Experience%20Paper.pdf

1
1. mark.crowley 13 May 2025
  
  in Public
  
  Welcome to the Era of ExperienceDavid Silver, Richard S. Sutton
  
  Welcome to the Era of Experience David Silver, Richard S. Sutton
  
  "This is a preprint of a chapter that will appear in the book Designing an Intelligence, published by MIT Press"
  
  #reinforcement-learning
Visit annotations in context

Tags

#reinforcement-learning

Annotators

mark.crowley

URL

storage.googleapis.com/deepmind-media/Era-of-Experience /The Era of Experience Paper.pdf
Jan 2025
openreview.net openreview.net

74_Mapping_Social_Choice_Theor.pdf

1
1. mark.crowley 31 Jan 2025
  
  in Public
  
  MAPPING SOCIAL CHOICE THEORY TO RLHF Jessica Dai and Eve Fleisig ICLR Workshop on Reliable and Responsible Foundation Models 2024
  
  Nice overview of how social choice theory has been connected to RLHF and AI alignment ideas.
  
  #ai-morality align rlhf llm #reinforcement-learning
Visit annotations in context

Tags

#reinforcement-learning

#ai-morality

llm

align

rlhf

Annotators

mark.crowley

URL

openreview.net/pdf
Jul 2024
en.wikipedia.org en.wikipedia.org

Monte Carlo tree search - Wikipedia

2
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
  
  The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
  
  reinforcement-learning ece457c
2. mark.crowley 22 Jul 2024
  
  in Public
  
  The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
  
  Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Monte_Carlo_tree_search
www.nature.com www.nature.com

Mastering the game of Go with deep neural networks and tree search

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  The summary paper for AlphaGo.
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

nature.com/articles/nature16961.pdf
en.wikipedia.org en.wikipedia.org

AlphaZero - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Wikipedia: AlphaZero
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaZero
arxiv.org arxiv.org

2403.07691.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
  
  reinforcement-learning ppo ece457c
Visit annotations in context

Tags

ppo

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2403.07691
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

1
1. mark.crowley 15 Jul 2024
  
  in Public
  
  Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
  
  reinforcement-learning ece457c
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
Feb 2024
arxiv.org arxiv.org

2205.08192.pdf

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  T. Herlau, "Moral Reinforcement Learning Using Actual Causation," 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 2022, pp. 179-185, doi: 10.1109/ICCCR54399.2022.9790262. keywords: {Digital control;Ethics;Costs;Philosophical considerations;Toy manufacturing industry;Reinforcement learning;Forestry;Causality;Reinforcement learning;Actual Causation;Ethical reinforcement learning}
  
  ai-ethics ai-morality reinforcement-learning
Visit annotations in context

Tags

ai-morality

reinforcement-learning

ai-ethics

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.08192.pdf
pdf.sciencedirectassets.com pdf.sciencedirectassets.com

Can model-free reinforcement learning explain deontological moral judgments?

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  Can model-free reinforcement learning explain deontological moraljudgments?Alisabeth AyarsUniversity of Arizona, Dept. of Psychology, Tucson, AZ, USA
  
  ai-morality ai-ethics reinforcement-learning
Visit annotations in context

Tags

ai-morality

reinforcement-learning

ai-ethics

Annotators

mark.crowley

URL

pdf.sciencedirectassets.com/271061/1-s2.0-S0010027716X00030/1-s2.0-S0010027716300300/am.pdf
Nov 2023
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

rdgrp-f23

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

rdgrp-f23

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
Oct 2023
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

reading_group_crowley

minecraft

reinforcement-learning

minerl

transformers

rdgrp-f23

generative-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2305.15486.pdf

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
  
  reinforcement-learning nlp large-language-models chatgpt minecraft evaluation-methods rdgrp-f23
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
  
  Them's fighten' words!
  
  I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
  
  The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
  
  reinforcement-learning rdgrp-f23 reading_group_crowley nlp larg deep-learning self-supervised supervised-learning evaluation-methods
Visit annotations in context

Tags

minecraft

nlp

rdgrp-f23

supervised-learning

reading_group_crowley

large-language-models

evaluation-methods

deep-learning

chatgpt

reinforcement-learning

self-supervised

larg

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
arxiv.org arxiv.org

2203.02155.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Training language models to follow instructionswith human feedback
  
  Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
  
  large-language-models reinforcement-learning chatgpt
Visit annotations in context

Tags

large-language-models

reinforcement-learning

chatgpt

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.02155
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

atari-games

reinforcement-learning

to-read

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf
Sep 2023
arxiv.org arxiv.org

1908.01046.pdf

1
1. mark.crowley 15 Sep 2023
  
  in Public
  
  Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
  
  autonomous-driving multi-agent-reinforcement-learning black-box-testing
Visit annotations in context

Tags

autonomous-driving

black-box-testing

multi-agent-reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1908.01046.pdf
Jul 2023
proceedings.mlr.press proceedings.mlr.press

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA: Scalable Distributed Deep-RL with Importance WeightedActor-Learner Architectures
  
  (Espeholt, ICML, 2018) "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures"
  
  reinforcement-learning impala
Visit annotations in context

Tags

impala

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
proceedings.mlr.press proceedings.mlr.press

Deterministic Policy Gradient Algorithms

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduced the DPG Algorithm
  
  DPG reinforcement-learning
Visit annotations in context

Tags

DPG

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v32/silver14.pdf
openreview.net openreview.net

babyai_a_platform_to_study_the.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Link to page with information about the paper: https://openreview.net/forum?id=rJeXCo0cYX
  
  reinforcement-learning curriculum-learning grid-world babyai
Visit annotations in context

Tags

curriculum-learning

reinforcement-learning

grid-world

babyai

Annotators

mark.crowley

URL

openreview.net/pdf
openreview.net openreview.net

a_path_towards_autonomous_mach.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Yann LeCun released his vision for the future of Artificial Intelligence research in 2022, and it sounds a lot like Reinforcement Learning.
  
  reinforcement-learning agi
Visit annotations in context

Tags

agi

reinforcement-learning

Annotators

mark.crowley

URL

openreview.net/pdf
arxiv.org arxiv.org

Deep Reinforcement Learning with Double Q-learning

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that evaluated the existing Double Q-Learning algorithm on the new DQN approach and validated that it is very effective in the Deep RL realm.
  
  reinforcement-learning dqn deep-learning
Visit annotations in context

Tags

reinforcement-learning

deep-learning

dqn

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.06461v3
arxiv.org arxiv.org

Continuous control with deep reinforcement learning

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
  
  ddpg reinforcement-learning SAC DPG PPO
Visit annotations in context

Tags

SAC

DPG

ddpg

PPO

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.02971
arxiv.org arxiv.org

1710.02298.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This famous paper gives a great review of the DQN algorithm a couple years after it changed everything in Deep RL. It compares six different extensions to DQN for Deep Reinforcement Learning, many of which have now become standard additions to DQN and other Deep RL algorithms. It also combines all of them together to produce the "rainbow" algorithm, which outperformed many other models for a while.
  
  reinforcement-learning experimental-design
Visit annotations in context

Tags

reinforcement-learning

experimental-design

Annotators

mark.crowley

URL

arxiv.org/pdf/1710.02298
arxiv.org arxiv.org

2104.10986.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Arxiv paper from 2021 on reinforcement learning in a scenario where your aim is to learn a workable POMDP policy, but you start with a fully observable MDP and adjust it over time towards a POMDP.
  
  reinforcement-learning pomdp mdp
Visit annotations in context

Tags

reinforcement-learning

pomdp

mdp

Annotators

mark.crowley

URL

arxiv.org/pdf/2104.10986.pdf
arxiv.org arxiv.org

1707.06347.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

ppo

reinforcement-learning

trpo

policy-gradients

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  Introduction of VPT : New semi-supervied pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

pretrained-models

minecraft

proj-minerl

reinforcement-learning

foundation-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
arxiv.org arxiv.org

Liang15.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  Response paper to DQN showing that well designed Value Function Approximations can also do well at these complex tasks without the use of Deep Learning
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  dqn reinforcement-learning atari-games deep-learning shallow-learning
Visit annotations in context

Tags

atari-games

dqn

shallow-learning

reinforcement-learning

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1512.01563
arxiv.org arxiv.org

1511.05952.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Tom Schaul, John Quan, Ioannis Antonoglou and David Silver. "PRIORITIZED EXPERIENCE REPLAY", ICLR, 2016.
  
  reinforcement-learning ppo deep-learning deep-rl policy-gradient direct-policy-search trust-region
Visit annotations in context

Tags

ppo

reinforcement-learning

policy-gradient

direct-policy-search

trust-region

deep-rl

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1511.05952.pdf
Jun 2023
www.fandm.edu www.fandm.edu

617813975725918530-aamas2016-shallow-rl.pdf

1
1. mark.crowley 16 Jun 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  reinforcement-learning dqn deep-learning shallow-learning atari-games uwece457C
Visit annotations in context

Tags

atari-games

dqn

shallow-learning

reinforcement-learning

uwece457C

deep-learning

Annotators

mark.crowley

URL

fandm.edu/uploads/files/617813975725918530-aamas2016-shallow-rl.pdf
assets.pubpub.org assets.pubpub.org

01621566588509.pdf

1
1. mark.crowley 09 Jun 2023
  
  in Public
  
  LeBlanc, D. G., & Lee, G. (2021). General Deep Reinforcement Learning in NES Games. Canadian AI 2021. Canadian Artificial Intelligence Association (CAIAC). https://doi.org/10.21428/594757db.8472938b
  
  canadian-ai reinforcement-learning video-games
Visit annotations in context

Tags

canadian-ai

reinforcement-learning

video-games

Annotators

mark.crowley

URL

assets.pubpub.org/uonw8d4k/01621566588509.pdf
Apr 2023
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  New supervised pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

pretrained-models

minecraft

proj-minerl

reinforcement-learning

foundation-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
inflecthealth.medium.com inflecthealth.medium.com

I’m an ER doctor: Here’s what I found when I asked ChatGPT to diagnose my patients

1
1. tonz 06 Apr 2023
  
  in Public
  
  If my patient notes don’t include a question I haven’t yet asked, ChatGPT’s output will encourage me to keep missing that question. Like with my young female patient who didn’t know she was pregnant. If a possible ectopic pregnancy had not immediately occurred to me, ChatGPT would have kept enforcing that omission, only reflecting back to me the things I thought were obvious — enthusiastically validating my bias like the world’s most dangerous yes-man.
  
  Things missing in a prompt will not result from a prompt. This may reinforce one's own blind spots / omissions, lowering the probability of an intuitive leap to other possibilities. The machine helps you search under the light you switched on with your prompt. Regardless of whether you're searching in the right place.
  
  generativeai chatgpt healthcare reinforcement blindspot confirmationbias
Visit annotations in context

Tags

chatgpt

generativeai

reinforcement

confirmationbias

healthcare

blindspot

Annotators

tonz

URL

inflecthealth.medium.com/im-an-er-doctor-here-s-what-i-found-when-i-asked-chatgpt-to-diagnose-my-patients-7829c375a9da
Mar 2023
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

minecraft

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950
Feb 2023
arxiv.org arxiv.org

2010.03950.pdf

4
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Definition 3.2 (simple reward machine).
  
  The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
  
  reinforcement-learning reward-machines
2. mark.crowley 16 Feb 2023
  
  in Public
  
  e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
  
  So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
  
  reinforcement-learning reward-machines
3. mark.crowley 16 Feb 2023
  
  in Public
  
  However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
  
  Fascinating idea, why not? Why are we hiding the reward from the agent really?
  
  reinforcement-learning reward-machines
4. mark.crowley 02 Feb 2023
  
  in Public
  
  Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
  
  [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950
proceedings.mlr.press proceedings.mlr.press

Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

1
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
  
  [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
Dec 2022
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

proj-minerl

transfer-learning

transformers

reinforcement-learning

conf-neurips-2022

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

marl

reinforcement-learning

conf-neurips-2022

multi-agent-reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

transformers

reinforcement-learning

offline-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2112.09099.pdf

1
1. mark.crowley 12 Sep 2022
  
  in Public
  
  AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
  
  S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
  
  reinforcement-learning marl
Visit annotations in context

Tags

marl

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2112.09099.pdf
Jul 2022
ieeexplore.ieee.org ieeexplore.ieee.org

IEEE Xplore Full-Text PDF:

1
1. mark.crowley 26 Jul 2022
  
  in Public
  
  A recent overview of RL methods used for autonomous driving.
  
  reinforcement-learning autonomous-driving
Visit annotations in context

Tags

autonomous-driving

reinforcement-learning

Annotators

mark.crowley

URL

ieeexplore.ieee.org/stamp/stamp.jsp
Jun 2022
assets.pubpub.org assets.pubpub.org

01652987005906.pdf

1
1. mark.crowley 04 Jun 2022
  
  in Public
  
  Discussion on
  
  Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
  
  reinforcement-learning artificial-intelligence proj-chemgymrl digital-chemistry material-design national-research-council-of-canada CanAI2022
Visit annotations in context

Tags

digital-chemistry

artificial-intelligence

proj-chemgymrl

CanAI2022

reinforcement-learning

material-design

national-research-council-of-canada

Annotators

mark.crowley

URL

assets.pubpub.org/99r5anzw/01652987005906.pdf
May 2022
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

15756507305185 1..25

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
  
  reinforcement-learning eligibility-traces rl-course
Visit annotations in context

Tags

eligibility-traces

reinforcement-learning

rl-course

Annotators

mark.crowley

URL

ncbi.nlm.nih.gov/pmc/articles/PMC6897511/pdf/elife-47463.pdf
arxiv.org arxiv.org

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

eligibility-traces

reinforcement-learning

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/2008.10040.pdf
arxiv.org arxiv.org

1810.09967v1.pdf

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

eligibility-traces

reinforcement-learning

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.09967v1.pdf
arxiv.org arxiv.org

2102.03406.pdf

1
1. mark.crowley 27 May 2022
  
  in Public
  
  Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
  
  reinforcement-learning rl-course artificial-intelligence
Visit annotations in context

Tags

reinforcement-learning

rl-course

artificial-intelligence

Annotators

mark.crowley

URL

arxiv.org/pdf/2503.18813
Mar 2022
arxiv.org arxiv.org

1907.13440.pdf

1
1. mark.crowley 23 Mar 2022
  
  in Public
  
  The paper that introduced the MineRL challenge dataset.
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.13440.pdf
Jan 2022
www.grandin.com www.grandin.com

A stroke convinced B.F. Skinner that the brain and biology could no longer be ignored

1
1. bilalali 11 Jan 2022
  
  in Public
  
  reinforcement
  
  "Reinforcement means to the act of reinforcing."
  
  https://www.dictionary.com/browse/reinforcement
Visit annotations in context

Tags

https://www.dictionary.com/browse/reinforcement

Annotators

bilalali

URL

grandin.com/inc/animals.in.translation.ch1.html
Jul 2021
psyarxiv.com psyarxiv.com

Choice-confirmation bias and gradual perseveration in human reinforcement learning

1
1. lucyparfitt16 08 Jul 2021
  
  in BehSci
  
  Palminteri, S. (2021). Choice-confirmation bias and gradual perseveration in human reinforcement learning [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/dpqj6
  
  is:preprint lang:en computational model confirmation bias reinforcement learning repetition bias behavioral science psychology modeling bias
Visit annotations in context

Tags

modeling

bias

reinforcement learning

lang:en

repetition bias

confirmation bias

psychology

behavioral science

computational model

is:preprint

Annotators

lucyparfitt16

URL

psyarxiv.com/dpqj6/
Jun 2021
psyarxiv.com psyarxiv.com

Reinforcement Learning Based Decision Support Tool For Epidemic Control

1
1. lucyparfitt16 30 Jun 2021
  
  in BehSci
  
  Chadi, M.-A., & Mousannif, H. (2021). Reinforcement Learning Based Decision Support Tool For Epidemic Control [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/tcr8s
  
  is:preprint lang:en COVID-19 epidemics control reinforcement learning modeling simulation vaccine epidemiology transmission economy intervention public health policy
Visit annotations in context

Tags

COVID-19

modeling

transmission

reinforcement learning

lang:en

simulation

economy

public health

policy

epidemiology

vaccine

is:preprint

epidemics control

intervention

Annotators

lucyparfitt16

URL

psyarxiv.com/tcr8s/
Mar 2021
www.opendemocracy.net www.opendemocracy.net

Neurocapitalism

1
1. davidk01 13 Mar 2021
  
  in Public
  
  Using chemicals to improve our economy of attention and become emotionally "fitter" is an option that penetrated public consciousness some time ago.
  
  Same is true of reinforcement learning algorithms.
  
  ai reinforcement learning algorithms
Visit annotations in context

Tags

reinforcement learning algorithms

ai

Annotators

davidk01

URL

opendemocracy.net/en/neurocapitalism/
Sep 2020
arxiv.org arxiv.org

The emergence of segregation: from observable markers to group specific norms

1
1. ErikStuchly 15 Sep 2020
  
  in BehSci
  
  Ozaita, J., Baronchelli, A., & Sánchez, A. (2020). The emergence of segregation: From observable markers to group specific norms. ArXiv:2009.05354 [Physics, q-Bio]. http://arxiv.org/abs/2009.05354
  
  is:preprint lang:en social trait social norm observable marker segregation emergence modeling ethnicity strategy conformity greed game reinforcement learning
Visit annotations in context

Tags

modeling

strategy

emergence

reinforcement learning

lang:en

game

social norm

conformity

greed

is:preprint

observable marker

segregation

ethnicity

social trait

Annotators

ErikStuchly

URL

arxiv.org/abs/2009.05354
journals.sagepub.com journals.sagepub.com

Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? - Vera U. Ludwig, Kirk Warren Brown, Judson A. Brewer, 2020

1
1. ErikStuchly 08 Sep 2020
  
  in BehSci
  
  Ludwig, V. U., Brown, K. W., & Brewer, J. A. (2020). Self-Regulation Without Force: Can Awareness Leverage Reward to Drive Behavior Change? Perspectives on Psychological Science, 1745691620931460. https://doi.org/10.1177/1745691620931460
  
  is:article lang:en self-regulation awareness reward behavior change motivation value satisfaction reinforcement learning valuation sustainability behavioral science
Visit annotations in context

Tags

value

reinforcement learning

valuation

lang:en

sustainability

self-regulation

satisfaction

is:article

behavioral science

motivation

behavior change

awareness

reward

Annotators

ErikStuchly

URL

journals.sagepub.com/doi/abs/10.1177/1745691620931460
Jul 2020
psyarxiv.com psyarxiv.com

COVID-19 Prevention via the Science of Habit Formation

1
1. Marlene_Wulf 14 Jul 2020
  
  in BehSci
  
  Harvey, A., Armstrong, C. C., Callaway, C. A., Gumport, N. B., & Gasperetti, C. E. (2020). COVID-19 Prevention via the Science of Habit Formation [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/57jyg
  
  is:preprint lang:en COVID-19 habit elimination habit formation behavioral intervention treatment guideline lifesaving adherence habit formation process intervention reinforcement
Visit annotations in context

Tags

COVID-19

behavioral intervention

adherence

lang:en

guideline

habit formation process

is:preprint

habit formation

lifesaving

treatment

reinforcement

habit elimination

intervention

Annotators

Marlene_Wulf

URL

psyarxiv.com/57jyg/
May 2020
psyarxiv.com psyarxiv.com

On the convergent validity of risk sensitivity measures

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Radulescu, A., Holmes, K., & Niv, Y. (2020). On the convergent validity of risk sensitivity measures [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/qdhx4
  
  is:preprint lang:en decision-making reinforcement learning risk sensitivity risk behavior trait measure instrument risk paradigm
Visit annotations in context

Tags

reinforcement learning

measure

behavior

lang:en

decision-making

instrument

trait

risk

risk paradigm

is:preprint

risk sensitivity

Annotators

Marlene_Wulf

URL

psyarxiv.com/qdhx4/
psyarxiv.com psyarxiv.com

Protection from uncertainty in the exploration/exploitation trade-off

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  https://twitter.com/SciBeh/status/1255403798463471616
  
  is:preprint lang:en uncertainty exploration exploitation tradeoff attention cognition decision-making reinforcement learning
Visit annotations in context

Tags

lang:en

exploitation

uncertainty

attention

decision-making

exploration

learning

is:preprint

reinforcement

cognition

tradeoff

Annotators

Marlene_Wulf

URL

psyarxiv.com/5y643/
psyarxiv.com psyarxiv.com

Cognitive learning processes account for asymmetries in adaptations to new social norms

1
1. Marlene_Wulf 29 May 2020
  
  in BehSci
  
  Hertz, U. (2020). Cognitive learning processes account for asymmetries in adaptations to new social norms [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/7thku
  
  is:preprint lang:en COVID-19 reinforcement learning social norm social cognition pandemic adaption behavior computational modeling
Visit annotations in context

Tags

COVID-19

adaption

behavior

lang:en

social norm

learning

computational modeling

pandemic

is:preprint

reinforcement

social cognition

Annotators

Marlene_Wulf

URL

psyarxiv.com/7thku/
arxiv.org arxiv.org

Complex social contagion induces bistability on multiplex networks

1
1. edampf 11 May 2020
  
  in BehSci
  
  Liu, L., Wang, X., Tang, S., & Zheng, Z. (2020). Complex social contagion induces bistability on multiplex networks. ArXiv:2005.00664 [Physics]. http://arxiv.org/abs/2005.00664
  
  is:preprint lang:en complex social contagion social reinforcement dynamics collective behavior network multilayer social circle ignorant-spreader-ignorant modeling transmissibility digital internet online behavior
Visit annotations in context

Tags

transmissibility

modeling

ignorant-spreader-ignorant

internet

online behavior

digital

lang:en

social circle

collective behavior

dynamics

social reinforcement

complex social contagion

multilayer

network

is:preprint

Annotators

edampf

URL

arxiv.org/abs/2005.00664
Apr 2020
psyarxiv.com psyarxiv.com

The elusive effects of incidental anxiety on reinforcement-learning

1
1. edampf 23 Apr 2020
  
  in BehSci
  
  Ting, C., Palminteri, S., Lebreton, M., & Engelmann, J. B. (2020, March 25). The elusive effects of incidental anxiety on reinforcement-learning. https://doi.org/10.31234/osf.io/7d4tc MLA
  
  is:preprint lang:en anxiety computation modeling threat shock valence-induced bias learning context-dependent reinforcement neuroscience decision making
Visit annotations in context

Tags

modeling

context-dependent

decision making

lang:en

valence-induced bias

threat

learning

anxiety

shock

computation

reinforcement

is:preprint

neuroscience

Annotators

edampf

URL

psyarxiv.com/7d4tc/
Mar 2019
cjc.ict.ac.cn cjc.ict.ac.cn

liuq-201811662728.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning review
Visit annotations in context

Tags

reinforcement-learning

review

Annotators

haiy

URL

cjc.ict.ac.cn/online/onlinepaper/liuq-201811662728.pdf
cjc.ict.ac.cn cjc.ict.ac.cn

lq-2017119103322.pdf

1
1. haiy 08 Mar 2019
  
  in Public
  
  深度强化学习综述
  
  reinforcement-learning tutorial
Visit annotations in context

Tags

tutorial

reinforcement-learning

Annotators

haiy

URL

cjc.ict.ac.cn/online/cre/lq-2017119103322.pdf
github.com github.com

dennybritz/reinforcement-learning

1
1. haiy 08 Mar 2019
  
  in Public
  
  reinforcement-learning code and paper tutorials
  
  reinforcement-learning valuable tutorial
Visit annotations in context

Tags

tutorial

reinforcement-learning

valuable

Annotators

haiy

URL

github.com/dennybritz/reinforcement-learning
Feb 2019
gitee.com gitee.com

SuttonBartoIPRLBook2ndEd.pdf

1
1. haiy 21 Feb 2019
  
  in Public
  
  reinforcement-learning book
Visit annotations in context

Tags

reinforcement-learning

book

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/SuttonBartoIPRLBook2ndEd.pdf
gitee.com gitee.com

强化学习在阿里的技术演进与业务创新.pdf

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/强化学习在阿里的技术演进与业务创新.pdf
gitee.com gitee.com

nips_oral6

1
1. haiy 20 Feb 2019
  
  in Public
  
  reinforcement-learning ppt
Visit annotations in context

Tags

ppt

reinforcement-learning

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/2017NIPS大会Facebook人工智能研究院演讲.pdf
gitee.com gitee.com

1709.02349.pdf

1
1. haiy 19 Feb 2019
  
  in Public
  
  We present MILABOT: a deep reinforcement learning chatbot developed by theMontreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prizecompetition. MILABOT is capable of conversing with humans on popular smalltalk topics through both speech and text. The system consists of an ensemble ofnatural language generation and retrieval models, including template-based models,bag-of-words models, sequence-to-sequence neural network and latent variableneural network models. By applying reinforcement learning to crowdsourced dataand real-world user interactions, the system has been trained to select an appropriateresponse from the models in its ensemble. The system has been evaluated throughA/B testing with real-world users, where it performed significantly better thanmany competing systems. Due to its machine learning architecture, the system islikely to improve with additional data
  
  chatbot reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

chatbot

Annotators

haiy

URL

gitee.com/arthurhu/pdfs/raw/master/deeplearning/nlp/1709.02349.pdf
Jul 2016
thesocialwrite.com thesocialwrite.com

What Is Confidence?

1
1. CaseyTheColeman 06 Jul 2016
  
  in Public
  
  Think of all the hard work and the sweat you put in to the things that your proudest of.
  
  Always feels good to say, "I worked out today!"
  
  Get There Self-Acceptance Positive Reinforcement
Visit annotations in context

Tags

Get There

Self-Acceptance

Positive Reinforcement

Annotators

CaseyTheColeman

URL

thesocialwrite.com/2015/08/11/what-is-confidence-2/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators