440 Matching Annotations

Mar 2023
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

minecraft

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950
Feb 2023
arxiv.org arxiv.org

2010.03950.pdf

6
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Definition 3.2 (simple reward machine).
  
  The MDP does not change, it's dynamics are the same, with or without the RM, as they are with or without a standard reward model. Additionally, the rewards from the RM can be non-Markovian with respect to the MDP because they inherently have a kind of memory or where you've been, limited to the agents "movement" (almost "in it's mind") about where it is along the goals for this task.
  
  reinforcement-learning reward-machines
2. mark.crowley 16 Feb 2023
  
  in Public
  
  e thenshow that an RM can be interpreted as specifying a single reward function over a largerstate space, and consider types of reward functions that can be expressed using RMs
  
  So by specifying a reward machine you are augmenting the state space of the MDP with higher level goals/subgoals/concepts that provide structure about what is good and what isn't.
  
  reinforcement-learning reward-machines
3. mark.crowley 16 Feb 2023
  
  in Public
  
  However, an agent that hadaccess to the specification of the reward function might be able to use such information tolearn optimal policies faster.
  
  Fascinating idea, why not? Why are we hiding the reward from the agent really?
  
  reinforcement-learning reward-machines
4. mark.crowley 02 Feb 2023
  
  in Public
  
  U is a finite set of states,
  
  Apply a set of logical rules to the state space to obtain a finite set of states.
5. mark.crowley 02 Feb 2023
  
  in Public
  
  state-reward function,
  
  reward is a constant number assigned to each set of states
6. mark.crowley 02 Feb 2023
  
  in Public
  
  Reward Machines: Exploiting Reward FunctionStructure in Reinforcement Learning
  
  [Icarte, JAIR, 2022] "Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950
proceedings.mlr.press proceedings.mlr.press

Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning

1
1. mark.crowley 16 Feb 2023
  
  in Public
  
  Using Reward Machines for High-Level Task Specificationand Decomposition in Reinforcement Learning
  
  [Icarte, PMLR, 2018] "Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning"
  
  reinforcement-learning reward-machines
Visit annotations in context

Tags

reward-machines

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
royalsocietypublishing.org royalsocietypublishing.org

Untitled document

1
1. mark.crowley 07 Feb 2023
  
  in Public
  
  Bell’s theorem is aboutcorrelations (joint probabilities) of stochastic real variables and therefore doesnot apply to quantum theory, which neither describes stochastic motion nor usesreal-valued observables
  
  strong statement, what do people think about this? is it accepted by anyone or dismissed?
  
  bells-theorem quantum-physics
Visit annotations in context

Tags

bells-theorem

quantum-physics

Annotators

mark.crowley

URL

royalsocietypublishing.org/doi/pdf/10.1098/rspa.2011.0420
Jan 2023
www.cs.princeton.edu www.cs.princeton.edu

rubik.dvi

1
1. mark.crowley 13 Jan 2023
  
  in Public
  
  "Finding Optimal Solutions to Rubik's Cub e Using Pattern Databases" by Richard E. Korf, AAAI 1997.
  
  The famous "Korf Algorithm" for finding the optimal solution to any Rubik's Cube state.
  
  algorithms ece406 rubiks-cube path-search a-star iterative-deepening
Visit annotations in context

Tags

algorithms

ece406

path-search

iterative-deepening

a-star

rubiks-cube

Annotators

mark.crowley

URL

cs.princeton.edu/courses/archive/fall06/cos402/papers/korfrubik.pdf
openai.com openai.com

Aligning Language Models to Follow Instructions

3
1. mark.crowley 12 Jan 2023
  
  in Public
  
  make up facts less often
  
  but not "never"
2. mark.crowley 12 Jan 2023
  
  in Public
  
  On prompts submitted by our customers to the API,[1
  
  really? so that's how they make money.
  
  Question: what kind of bias does this introduce into the model?
  
  which topics and questions grt trained on?
  
  what is the goal of training? truth? clickability?
3. mark.crowley 12 Jan 2023
  
  in Public
  
  Blog post from OpenAI in Jan 2022 explaining some of the approaches they use to train, reduce and tube their LLM for particular tasks. This was all precursor to the ChatGPT system we now see.
  
  nlp, llm, chagpt
Visit annotations in context

Tags

nlp, llm, chagpt

Annotators

mark.crowley

URL

openai.com/blog/instruction-following/
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net

Untitled document

1
1. mark.crowley 10 Jan 2023
  
  in Public
  
  "Talking About Large Language Models" by Murray Shanahan
  
  nlp large-language-models deep-learning transformers
Visit annotations in context

Tags

transformers

large-language-models

nlp

deep-learning

Annotators

mark.crowley

URL

inst-fs-iad-prod.inscloudgate.net/files/ada31e51-be16-45cc-8ec7-53e2dc795590/Talking About Large Language Models.pdf
Dec 2022
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-reinforcement-learning-with-state-observation-costs-in-action-contingent-noiselessly-observable-markov-decision-processes-Paper.pdf

1
1. mark.crowley 19 Dec 2022
  
  in Public
  
  [Nam, NeurIPS, 2022]. "Reinforcement Learning with State ObservationCosts in Action-Contingent Noiselessly Observable Markov Decision Processes"
  
  proj-chemgymrl digital-chemistry ai-for-science
Visit annotations in context

Tags

ai-for-science

proj-chemgymrl

digital-chemistry

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper/2021/file/83e8fe6279ad25f15b23c6298c6a3584-Paper.pdf
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

reinforcement-learning

proj-minerl

conf-neurips-2022

transfer-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
arxiv.org arxiv.org

2210.00849.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  [Neumann, Gros, NeurIPS, 2022] - "SCALING LAWS FOR A MULTI-AGENT REINFORCEMENT LEARNING MODEL"
  
  reinforcement-learning marl multi-agent-reinforcement-learning conf-neurips-2022
Visit annotations in context

Tags

reinforcement-learning

conf-neurips-2022

marl

multi-agent-reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2210.00849.pdf
Nov 2022
arxiv.org arxiv.org

2106.01345.pdf

2
1. mark.crowley 22 Nov 2022
  
  in Public
  
  10K
  
  Kind of ambiguous to use 10K when one of the most important variables is K.
2. mark.crowley 08 Nov 2022
  
  in Public
  
  n embedding for each timestep is learned and added to eachtoken – note this is different than the standard positional embedding used by transformers, as onetimestep corresponds to three tokens
  
  one timestep corresponds to three tokens
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 22 Nov 2022
  
  in Public
  
  we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
  
  Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
  
  transformers attention-mechanism
Visit annotations in context

Tags

transformers

attention-mechanism

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
agupubs.onlinelibrary.wiley.com agupubs.onlinelibrary.wiley.com

Burn Severity in Canada's Mountain National Parks: Patterns, Drivers, and Predictions

1
1. mark.crowley 08 Nov 2022
  
  in Public
  
  "Burn Severity in Canada's Mountain National Parks: Patterns, Drivers, and Predictions" Weiwei Wang, Xianli Wang, et al Geophysical Research Letters
  
  forest-fire-management burn-severity-prediction
Visit annotations in context

Tags

forest-fire-management

burn-severity-prediction

Annotators

mark.crowley

URL

agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2022GL097945
arxiv.org arxiv.org

On the Opportunities and Risks of Foundation ModelsOn the Opportunities and Risks of Foundation Models

1
1. mark.crowley 08 Nov 2022
  
  in Public
  
  "On the Opportunities and Risks of Foundation Models" This is a large report by the Center for Research on Foundation Models at Stanford. They are creating and promoting the use of these models and trying to coin this name for them. They are also simply called large pre-trained models. So take it with a grain of salt, but also it has a lot of information about what they are, why they work so well in some domains and how they are changing the nature of ML research and application.
  
  foundation-models machine-learning pretrained-models
Visit annotations in context

Tags

foundation-models

pretrained-models

machine-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2108.07258
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

offline-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2112.09099.pdf

1
1. mark.crowley 12 Sep 2022
  
  in Public
  
  AAAI 2022 Paper : Decentralized Mean Field Games Happy to discuss online.
  
  S. Ganapathi Subramanian, M. Taylor, M. Crowley, and P. Poupart., “Decentralized mean field games,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), vol. 36, pp. 9439–9447, February 2022. 1.
  
  reinforcement-learning marl
Visit annotations in context

Tags

reinforcement-learning

marl

Annotators

mark.crowley

URL

arxiv.org/pdf/2112.09099.pdf
Jul 2022
ieeexplore.ieee.org ieeexplore.ieee.org

IEEE Xplore Full-Text PDF:

1
1. mark.crowley 26 Jul 2022
  
  in Public
  
  A recent overview of RL methods used for autonomous driving.
  
  reinforcement-learning autonomous-driving
Visit annotations in context

Tags

reinforcement-learning

autonomous-driving

Annotators

mark.crowley

URL

ieeexplore.ieee.org/stamp/stamp.jsp
minerl.bhagat.io minerl.bhagat.io

Abstract

1
1. mark.crowley 18 Jul 2022
  
  in Public
  
  As a baseline model we took the feature representation from a large pre-trained CNN such as ResNet50, by using the model and excluding the final dense layer, and using this in place of our convolution layers. We had predicted that this would likely get us some performance, but would inherently be worse, since we had fixed some of our trainable parameters.
  
  They didn't try to train the CNN from scratch.
Visit annotations in context

Annotators

mark.crowley

URL

minerl.bhagat.io/
Jun 2022
assets.pubpub.org assets.pubpub.org

01652987005906.pdf

1
1. mark.crowley 04 Jun 2022
  
  in Public
  
  Discussion on
  
  Bellinger C, Drozdyuk A, Crowley M, Tamblyn I. Balancing Information with Observation Costs in Deep Reinforcement Learning. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/0jmy7gpd
  
  reinforcement-learning artificial-intelligence proj-chemgymrl digital-chemistry material-design national-research-council-of-canada CanAI2022
Visit annotations in context

Tags

national-research-council-of-canada

reinforcement-learning

artificial-intelligence

proj-chemgymrl

digital-chemistry

CanAI2022

material-design

Annotators

mark.crowley

URL

assets.pubpub.org/99r5anzw/01652987005906.pdf
May 2022
link.springer.com link.springer.com

Intelligence - Consider This and Respond!

1
1. mark.crowley 29 May 2022
  
  in Public
  
  Interesting sounding high level paper about the limits and constraints on general intelligence and how this might relate to the struggles AI/ML research has had historically.
  
  artificial-intelligence artificial-general-intelligence agi aiml cognitive-science
Visit annotations in context

Tags

artificial-intelligence

cognitive-science

agi

artificial-general-intelligence

aiml

Annotators

mark.crowley

URL

link.springer.com/chapter/10.1007/978-3-030-65596-9_48
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

15756507305185 1..25

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Another piece to the "what can we do with eligibility traces" puzzle for Deep RL.
  
  reinforcement-learning eligibility-traces rl-course
Visit annotations in context

Tags

reinforcement-learning

rl-course

eligibility-traces

Annotators

mark.crowley

URL

ncbi.nlm.nih.gov/pmc/articles/PMC6897511/pdf/elife-47463.pdf
arxiv.org arxiv.org

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

reinforcement-learning

rl-course

eligibility-traces

Annotators

mark.crowley

URL

arxiv.org/pdf/2008.10040.pdf
arxiv.org arxiv.org

1810.09967v1.pdf

1
1. mark.crowley 28 May 2022
  
  in Public
  
  Question: What happened to Eligibility Traces in the Deep RL era? This paper highlights some of the reasons they are not used widely and proposes a way they could still be effective.
  
  reinforcement-learning rl-course eligibility-traces
Visit annotations in context

Tags

reinforcement-learning

rl-course

eligibility-traces

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.09967v1.pdf
arxiv.org arxiv.org

2102.03406.pdf

1
1. mark.crowley 27 May 2022
  
  in Public
  
  Hypothesis page to discuss this high level description of DeepMind's new Gato framework.
  
  reinforcement-learning rl-course artificial-intelligence
Visit annotations in context

Tags

artificial-intelligence

reinforcement-learning

rl-course

Annotators

mark.crowley

URL

arxiv.org/pdf/2511.20639
Mar 2022
arxiv.org arxiv.org

1907.13440.pdf

1
1. mark.crowley 23 Mar 2022
  
  in Public
  
  The paper that introduced the MineRL challenge dataset.
  
  reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.13440.pdf
arxiv.org arxiv.org

2004.09666.pdf

1
1. mark.crowley 22 Mar 2022
  
  in Public
  
  Weak supervision also objectively identifies relevant morphological features from the tissue microenv-iornment without any a priori knowledge or subjective annotation. In three separate analyses, we showed thatour models can identify well-known morphological features and accordingly, has the capability of identify-ing new morphological features of diagnostic, prognostic, and therapeutic relevance.
  
  Their target images are very large and there is a known (supervised) label for the entire image, but no labels for parts of an image (e.g. where is the tumor exactly?). So the powerful property of their method is the ability to learn what parts of the image relate to the label on it's own.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2004.09666
Jan 2022
www.theglobeandmail.com www.theglobeandmail.com

Opinion: 2022 is the year America falls off a cliff. How will Canada hang on?

4
1. mark.crowley 04 Jan 2022
  
  in Public
  
  The Canadian experiment has been built, in large part, around the American experiment: They have the melting pot, we have the cultural mosaic; they have the free market, we have sensible regulation; they have “life, liberty and the pursuit of happiness,” we have “peace, order and good government.”
  
  I agree with this.
2. mark.crowley 04 Jan 2022
  
  in Public
  
  Northrop Frye once defined a Canadian as “an American who rejects the Revolution.”
  
  I see what he means but I wouldn't go this far. Canadians do have a seperate cultural identity. It is defined by its lack of definiton and certainty, in contrast to American certainty. This ks why it isore resilient. It cannot have certainty because our nation was founded on "two solitudes" of French and English, Catholic and Protestant, and also the very different, though equally destructive relationship of the Eurooean colonizers with the Indigenous Peoples of Canada.
3. mark.crowley 04 Jan 2022
  
  in Public
  
  A flaw lurked right at the core of the experiment, as flaws so often do in works of ambitious genius.
  
  The flaw was an assumption that everyone had the nation's best interests at heart, that they all wanted the same thing deep down.
4. mark.crowley 04 Jan 2022
  
  in Public
  
  Difference is the core of the American experience. Difference is its genius. There has never been a country so comfortable with difference, so full of difference.
  
  Diversity is Strength. This is really one of their founding principles, even in its hypocrisy. For them the diversity was in religious faith and ways of thinking but did not include gender, ethnicity or anything else. In time this changed and it is the only reason America has done so well.
Visit annotations in context

Annotators

mark.crowley

URL

theglobeandmail.com/opinion/article-2022-is-the-year-america-falls-off-a-cliff-how-will-canada-hang-on/
Jul 2021
golem.ph.utexas.edu golem.ph.utexas.edu

The n-Category Café

2
1. mark.crowley 26 Jul 2021
  
  in Public
  
  Such a map, plus the universal property of AA A<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math>, is in fact enough to reconstruct the entire Turing structure of CC \mathsf{C}<math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mstyle mathvariant="sans-serif"><mi>C</mi></mstyle></mrow><annotation encoding="application/x-tex">\mathsf{C}</annotation></semantics></math>.
  
  The minimal necessary to construct a Turing machine
2. mark.crowley 26 Jul 2021
  
  in Public
  
  not necessarily extensional, only intensional)
  
  Whats the difference?
  
  Question
Visit annotations in context

Tags

Question

Annotators

mark.crowley

URL

golem.ph.utexas.edu/category/2019/08/turing_categories.html

Mark Crowley

Associate Professor as the University of Waterloo.

Research and teaching on topics in Artificial Intelligence, Machine Learning and Reinforcement Learning.

Reading group links: https://markcrowley.ca/reading-groups/

Annotations: 440

Joined: April 4, 2020

Location: Waterloo, Canada

Link: markcrowley.ca

ORCID: 0000-0003-3921-4762

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL