439 Matching Annotations

Oct 2023
www.nature.com www.nature.com

Scientific discovery in the age of artificial intelligence

23
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Petersen, B. K. et al. Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations (2020).
  
  Description: Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add.
  
  rdgrp-f23 to-read
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next140. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add
  
  very interesting approach
  
  to-read
3. mark.crowley 25 Oct 2023
  
  in Public
  
  In chemistry, models such as simplified molecular-input line-entry system (SMILES)-VAE155 can transform SMILES strings, which are molecular notations of chemical structures in the form of a discrete series of symbols that computers can easily understand, into a differentiable latent space that can be optimized using Bayesian optimization techniques (Fig. 3c).
  
  This could be useful for chemistry research for robotic labs.
  
  proj-chemgymrl to-read
4. mark.crowley 25 Oct 2023
  
  in Public
  
  Neural operators are guaranteed to be discretization invariant, meaning that they can work on any discretization of inputs and converge to a limit upon mesh refinement. Once neural operators are trained, they can be evaluated at any resolution without the need for re-training. In contrast, the performance of standard neural networks can degrade when data resolution during deployment changes from model training.
  
  Look this up: anyone familiar with this? sounds complicated but very promising for domains with a large range of resolutions (medical-imaging, wildfire-management)
  
  medical-imaging forest-wildfire to-read
5. mark.crowley 23 Oct 2023
  
  in Public
  
  Standard neural network models can be inadequate for scientific applications as they assume a fixed data discretization. This approach is unsuitable for many scientific datasets collected at varying resolutions and grids.
  
  Is discretized resolution of neural networks an issue for science?
6. mark.crowley 23 Oct 2023
  
  in Public
  
  generating hypotheses
  
  Are any of the "generated hypotheses" more general than a molecular shape? Are they full hypothetical explanations for a problem? (yes)
7. mark.crowley 23 Oct 2023
  
  in Public
  
  Applications of symbolic regression in physics use grammar VAEs150. These models represent discrete symbolic expressions as parse trees using context-free grammar and map the trees into a differentiable latent space. Bayesian optimization is then employed to optimize the latent space for symbolic laws while ensuring that the expressions are syntactically valid. In a related study, Brunton and colleagues151 introduced a method for differentiating symbolic rules by assigning trainable weights to predefined basis functions. Sparse regression was used to select a linear combination of the basis functions that accurately represented the dynamic system while maintaining compactness. Unlike equivariant neural networks, which use a predefined inductive bias to enforce symmetry, symmetry can be discovered as the characteristic behaviour of a domain. For instance, Liu and Tegmark152 described asymmetry as a smooth loss function and minimized the loss function to extract previously unknown symmetries. This approach was applied to uncover hidden symmetries in black-hole waveform datasets, revealing unexpected space–time structures that were historically challenging to find.
  
  This seems very important, even though I only understand half of it. My question is, can similar approaches be used to apply to planning in complex domains or to meaning and truth in language?
  
  question
8. mark.crowley 23 Oct 2023
  
  in Public
  
  to address the difficulties that scientists care about, the development and evaluation of AI methods must be done in real-world scenarios, such as plausibly realizable synthesis paths in drug design217,218, and include well calibrated uncertainty estimators to assess the model’s reliability before transitioning it to real-world implementation
  
  It's important to move beyond toy models.
9. mark.crowley 23 Oct 2023
  
  in Public
  
  However, current transfer-learning schemes can be ad hoc, lack theoretical guidance213 and are vulnerable to shifts in underlying distributions214. Although preliminary attempts have addressed this challenge215,216, more exploration is needed to systematically measure transferability across domains and prevent negative transfer.
  
  There is still a lot of work to do to know how to best use human knowledge to guide learning systems and how to reuse models in different domains.
10. mark.crowley 23 Oct 2023
  
  in Public
  
  Another approach for using neural networks to solve mathematical problems is transforming a mathematical formula into a binary sequence of symbols. A neural network policy can then probabilistically and sequentially grow the sequence one binary character at a time6. By designing a reward that measures the ability to refute the conjecture, this approach can find a refutation to a mathematical conjecture without prior knowledge about the mathematical problem.
  
  A nice idea to learn a formula of symbols which can be evaluated logically for truth. But do they mention more general approaches such as using SAT solvers for this task? See Vijay Ganesh work.
  
  question satisfiability
11. mark.crowley 23 Oct 2023
  
  in Public
  
  foresighted
  
  is "foresighted" a word?
12. mark.crowley 23 Oct 2023
  
  in Public
  
  AI methods have become invaluable when hypotheses involve complex objects such as molecules. For instance, in protein folding, AlphaFold210 can predict the 3D atom coordinates of proteins from amino acid sequences with atomic accuracy, even for proteins whose structure is unlike any of the proteins in the training dataset.
  
  This is an important category, but it can't apply to all fields and will have a limit to what it can do to move science forward. It's also very dependent on vast computing resources.
13. mark.crowley 23 Oct 2023
  
  in Public
  
  Transformer architectures
  
  Question: what is the inductive bias of Transformers for NLP? Can we define the symmetries that are implicitly leveraged in the architecture.
14. mark.crowley 23 Oct 2023
  
  in Public
  
  Such pretrained models96,97,98 with a broad understanding of a scientific domain are general-purpose predictors that can be adapted for various tasks, thereby improving label efficiency and surpassing purely supervised methods8.
  
  Pre-trained models: these are obviously important and powerful, they almost always work better than training from scratch.
  
  general-purpose predictors: However, we should be suspicious of accepting this claim that they are general purpose predictors. Why?
  
  Have all of the scenarios been tested?
  
  Does the system have a general underlying model?
  
  Is there some bias in the training and testing data?
  
  Example: - you pretrain a model on motion of objects on a plane, such a pool table. You learn a very good model to predict movement. - Now, does it work if the table is curved? or even has bumps and imperfections? - Now train it on 3D Netwonian examples, will it predict relativitistic effects? (No)
15. mark.crowley 23 Oct 2023
  
  in Public
  
  In the analysis of scientific images, objects do not change when translated in the image, meaning that image segmentation masks are translationally equivariant as they change equivalently when input pixels are translated.
  
  an example of symmetry
16. mark.crowley 23 Oct 2023
  
  in Public
  
  Symmetry is a widely studied concept in geometry69. It can be described in terms of invariance and equivariance (Box 1) to represent the behaviour of a mathematical function, such as a neural feature encoder, under a group of transformations, such as the SE(3) group in rigid body dynamics.
  
  Symmetry is a very broad concept even beyond geometry, although that is the easiest area to think about. If you are interested, it is worth looking into category theory and symmetry more generally. If you can find a type of symmetry that no one has, for a meaningful categorical/geometric pattern that relates to a real type of data, task or domain, then you might be able to start the next new architecture revolution.
17. mark.crowley 23 Oct 2023
  
  in Public
  
  Another strategy for data labelling leverages surrogate models trained on manually labelled data to annotate unlabelled samples and uses these predicted pseudo-labels to supervise downstream predictive models.
  
  This kind of bootstrapping of human labelling is what made ChatGPT (v3) break through the level of coherence that caused so much excitement in Nov 2022 and afterwards.
  
  It is also becoming a very common strategy, seemingly replacing an entire industry of full human labelling, with a more focussed process of label-learn-pseudolabel-refine-repeat.
18. mark.crowley 23 Oct 2023
  
  in Public
  
  To identify rare events for future scientific enquiry, deep-learning methods18 replace pre-programmed hardware event triggers with algorithms that search for outlying signals to detect unforeseen or rare phenomena
  
  The importance of filtering out irrelevant data.
19. mark.crowley 23 Oct 2023
  
  in Public
  
  Recent findings demonstrate the potential for unsupervised language AI models to capture complex scientific concepts15, such as the periodic table, and predict applications of functional materials years before their discovery, suggesting that latent knowledge regarding future discoveries may be embedded in past publications.
  
  This is one I often point to and wasn't even using the latest transformer approach to language modelling.
20. mark.crowley 23 Oct 2023
  
  in Public
  
  inductive biases (Box 1), which are assumptions representing structure, symmetry, constraints and prior knowledge as compact mathematical statements. However, applying these laws can lead to equations that are too complex for humans to solve, even with traditional numerical methods9. An emerging approach is incorporating scientific knowledge into AI models by including information about fundamental equations, such as the laws of physics or principles of molecular structure and binding in protein folding. Such inductive biases can enhance AI models by reducing the number of training examples needed to achieve the same level of accuracy10 and scaling analyses to a vast space of unexplored scientific hypotheses11.
  
  Inductive biases: these are becoming more and more critical to understand, and are a good place for academic researchers to focus for new advances, since they don't generally depend on scale or vast amounts of data. These are fundamental insights into the symmetries and structure of a domain, task or architecture.
21. mark.crowley 23 Oct 2023
  
  in Public
  
  Box 1 Glossary
  
  A good set of definitions of various terms.
22. mark.crowley 23 Oct 2023
  
  in Public
  
  and coupled with new algorithms
  
  almost an afterthought here, I would cast it differently, the new algorithms are a major part of it as well.
  
  Listed algorithm types: * geometric deep learning * self-supervised learning of foundation models * generative models * reinforcement learning
23. mark.crowley 23 Oct 2023
  
  in Public
  
  geometric deep learning (Box 1) has proved to be helpful in integrating scientific knowledge, presented as compact mathematical statements of physical relationships, prior distributions, constraints and other complex descriptors, such as the geometry of atoms in molecules
  
  geometric deep learning : An interesting broad category for graph learning and other methods, is this a common way to refer to this subfield?
Visit annotations in context

Tags

forest-wildfire

rdgrp-f23

to-read

satisfiability

medical-imaging

proj-chemgymrl

question

Annotators

mark.crowley

URL

nature.com/articles/s41586-023-06221-2
arxiv.org arxiv.org

2308.13067.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
  
  transformers large-language-models nlp reading_group_crowley rdgrp-f23
Visit annotations in context

Tags

rdgrp-f23

large-language-models

transformers

reading_group_crowley

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2308.13067.pdf
link.springer.com link.springer.com

Untitled document

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Chapter 21 "Adversarial Autonencoders" from our book "Elements of Dimensionality Reduction and Manifold Learning", Springer 2023.
  
  manifold-book manifold-learning dimensionality-reduction representation-learning autoencoders
Visit annotations in context

Tags

manifold-learning

manifold-book

representation-learning

dimensionality-reduction

autoencoders

Annotators

mark.crowley

URL

link.springer.com/content/pdf/10.1007/978-3-031-10602-6_21.pdf
assets.pubpub.org assets.pubpub.org

71652816875953.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Discussion of the paper:
  
  Ghojogh B, Ghodsi A, Karray F, Crowley M. Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/7eqtuyyc
  
  CanAI2022 dimensionality-reduction manifold-learning machine-learning university-waterloo
Visit annotations in context

Tags

manifold-learning

CanAI2022

dimensionality-reduction

university-waterloo

machine-learning

Annotators

mark.crowley

URL

assets.pubpub.org/zbfq7fzb/71652816875953.pdf
arxiv.org arxiv.org

Estimating causal effects with optimization-based methods: A review and empirical comparison

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Bias-variance trade-off
  
  The Bias - Variance Tradeoff!
  
  ece657a
Visit annotations in context

Tags

ece657a

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.00097.pdf
www.gatesnotes.com www.gatesnotes.com

The Age of AI has begun

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "The Age of AI has begun : Artificial intelligence is as revolutionary as mobile phones and the Internet." Bill Gates, March 21, 2023. GatesNotes
  
  aig chatgpt large-language-models
Visit annotations in context

Tags

chatgpt

aig

large-language-models

Annotators

mark.crowley

URL

gatesnotes.com/the-age-of-ai-has-begun
www.inc.com www.inc.com

Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Minda Zetlin. "Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now". Inc.com, March 2023.
  
  chatgpt openai large-language-models
Visit annotations in context

Tags

openai

chatgpt

large-language-models

Annotators

mark.crowley

URL

inc.com/minda-zetlin/bill-gates-says-were-witnessing-a-stunning-new-technology-age-5-ways-to-prepare.html
openai.com openai.com

New AI classifier for indicating AI-written text

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  It should not be used as a primary decision-making tool, but instead as a complement to other methods of determining the source of a piece of text.
  
  This is true of any of these LLM models actually for any task.
  
  chatgpt
Visit annotations in context

Tags

chatgpt

Annotators

mark.crowley

URL

openai.com/blog/new-ai-classifier-for-indicating-ai-written-text
arxiv.org arxiv.org

2212.05032.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
  
  Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
  
  chatgpt large-language-models nlp transformers ece657a
Visit annotations in context

Tags

chatgpt

ece657a

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2212.05032.pdf
arxiv.org arxiv.org

2203.02155.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Training language models to follow instructionswith human feedback
  
  Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
  
  large-language-models reinforcement-learning chatgpt
Visit annotations in context

Tags

chatgpt

reinforcement-learning

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.02155
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

atari-games

to-read

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf
cdn.openai.com cdn.openai.com

Language Models are Unsupervised Multitask Learners

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-2 Introduction paper
  
  Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

reading_group_crowley

rdgrp-s23

machine-learning

gpt

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Attention is All You Need" Foundational paper introducing the Transformer Architecture.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

reading_group_crowley

rdgrp-s23

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-3 introduction paper
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

reading_group_crowley

rdgrp-s23

machine-learning

gpt

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
arxiv.org arxiv.org

2105.03322.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Are Pre-trained Convolutions Better than Pre-trained Transformers?"
  
  transformers deep-learning nlp large-language-models reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

rdgrp-s23

large-language-models

nlp

transformers

reading_group_crowley

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2105.03322.pdf
arxiv.org arxiv.org

2201.08239.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  LaMDA: Language Models for Dialog Application
  
  "LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

reading_group_crowley

rdgrp-s23

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2201.08239.pdf
arxiv.org arxiv.org

2305.15486.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
  
  Them's fighten' words!
  
  I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
  
  The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
  
  reinforcement-learning rdgrp-f23 reading_group_crowley nlp larg deep-learning self-supervised supervised-learning evaluation-methods
Visit annotations in context

Tags

larg

rdgrp-f23

self-supervised

deep-learning

reading_group_crowley

supervised-learning

evaluation-methods

reinforcement-learning

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
osf.io osf.io

Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
  
  reading_group_crowley transformers reading_group_crowley rdgrp-s23 nlp large-language-models
Visit annotations in context

Tags

reading_group_crowley

rdgrp-s23

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

osf.io/m6gcn/
Sep 2023
arxiv.org arxiv.org

1908.01046.pdf

1
1. mark.crowley 15 Sep 2023
  
  in Public
  
  Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
  
  autonomous-driving multi-agent-reinforcement-learning black-box-testing
Visit annotations in context

Tags

black-box-testing

multi-agent-reinforcement-learning

autonomous-driving

Annotators

mark.crowley

URL

arxiv.org/pdf/1908.01046.pdf
Aug 2023
arxiv.org arxiv.org

2308.09543.pdf

1
1. mark.crowley 22 Aug 2023
  
  in Public
  
  Title: Delays, Detours, and Forks in the Road: Latent State Models of Training Dynamics Authors: Michael Y. Hu1 Angelica Chen1 Naomi Saphra1 Kyunghyun Cho Note: This paper seems cool, using older interpretable machine learning models, graphical models to understand what is going on inside a deep neural network
  
  Link: https://arxiv.org/pdf/2308.09543.pdf
  
  deep-learning machine-learning hidden-markov-models graphical-models interpretability visualization regularization
Visit annotations in context

Tags

regularization

visualization

hidden-markov-models

interpretability

machine-learning

graphical-models

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2308.09543.pdf
Jul 2023
arxiv.org arxiv.org

2303.02186.pdf

1
1. mark.crowley 29 Jul 2023
  
  in Public
  
  “Rung 1.5” Pearl’s ladder of causation [1, 10] ranks structures in a similar way as we do, i.e., increasing amodel’s causal knowledge will yield a higher place upon his ladder. Like Pearl, we have three different levelsin our scale. However, they do not correspond one-to-one.
  
  They rescale Pearl's ladder levels downwards and define a new scale, arguing that the original definition of counterfactual as a different level on it's own actually combines together mutiple types of added reasoning complexity.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2303.02186.pdf
proceedings.mlr.press proceedings.mlr.press

Scaling Laws for Reward Model Overoptimization

4
1. mark.crowley 27 Jul 2023
  
  in Public
  
  They think BON moves reward mass around from low reward samples to high reward samples
2. mark.crowley 27 Jul 2023
  
  in Public
  
  We find empirically that for best-of-n (BoN) sampling
  
  they foudn this relationship surpsing, but it does seem to fit better than other functions with mimic the general shape.
  
  question: is tehre. agodo reason why?
3. mark.crowley 27 Jul 2023
  
  in Public
  
  d
  
  they use sqrt since KL scales quadtraically, so it gets rid of the power 2.
4. mark.crowley 27 Jul 2023
  
  in Public
  
  RL
  
  "for ... we don't see any overoptimization, we just see the .. monotonically improves"
  
  For which, I don't see a linear growth here that might not bend down later.
Visit annotations in context

Annotators

mark.crowley

URL

proceedings.mlr.press/v202/gao23h/gao23h.pdf
arxiv.org arxiv.org

1911.08265.pdf

1
1. mark.crowley 21 Jul 2023
  
  in Public
  
  The MuZero paper for model based learning when the mdoel is not directly available.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/1911.08265
arxiv.org arxiv.org

2307.09288.pdf

1
1. mark.crowley 19 Jul 2023
  
  in Public
  
  LLAMA 2 Release Paper
  
  large-language-models transformers
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2307.09288
arxiv.org arxiv.org

2001.09977.pdf

1
1. mark.crowley 19 Jul 2023
  
  in Public
  
  Daniel Adiwardana Minh-Thang Luong David R. So Jamie Hall, Noah Fiedel Romal Thoppilan Zi Yang Apoorv Kulshreshtha, Gaurav Nemade Yifeng Lu Quoc V. Le "Towards a Human-like Open-Domain Chatbot" Google Research, Brain Team
  
  Defined the SSI metric for chatbots used in LAMDA paper by google.
  
  large-language-models ssi nlp
Visit annotations in context

Tags

ssi

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2001.09977.pdf
arxiv.org arxiv.org

2201.08239.pdf

6
1. mark.crowley 19 Jul 2023
  
  in Public
  
  LaMDA pre-training as a language model.
  
  Does this figure really mean anything? There is no 3 in the paper at all.
2. mark.crowley 19 Jul 2023
  
  in Public
  
  Safety does not seem to benefit much from model scaling without fine-tuning.
  
  Safety does not seem to be improved by larger models.
3. mark.crowley 19 Jul 2023
  
  in Public
  
  How LaMDA handles groundedness through interactions with an external information retrieval system
  
  Does LAmbda always ask these questions? How far down the chain does it go?
4. mark.crowley 19 Jul 2023
  
  in Public
  
  Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang,Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and Quoc V. Le. Towards a human-like open-domain chatbot.arXiv preprint arXiv:2001.09977, 2020
  
  SSI metric deifnitions
5. mark.crowley 19 Jul 2023
  
  in Public
  
  Using one model for both generation and discrimination enables an efficient combined generate-and-discriminateprocedure.
  
  bidirectional model benefits
6. mark.crowley 19 Jul 2023
  
  in Public
  
  LaMDA Mount Everest provides factsthat could not be attributed to known sources in about 30% of response
  
  Even with all this work, it will hallucinate about 30% of the time
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2201.08239.pdf
arxiv.org arxiv.org

2306.00937.pdf

1
1. mark.crowley 18 Jul 2023
  
  in Public
  
  linear
  
  W_\theta
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2306.00937.pdf
arxiv.org arxiv.org

Continuous control with deep reinforcement learning

19
1. mark.crowley 17 Jul 2023
  
  in Public
  
  Because DDPG is an off-policy algorithm, the replay buffer can be large, allowingthe algorithm to benefit from learning across a set of uncorrelated transitions.
  
  Off-policy algorithms can have a larger replay buffer.
2. mark.crowley 17 Jul 2023
  
  in Public
  
  One challenge when using neural networks for reinforcement learning is that most optimization al-gorithms assume that the samples are independently and identically distributed. Obviously, whenthe samples are generated from exploring sequentially in an environment this assumption no longerholds. Additionally, to make efficient use of hardware optimizations, it is essential to learn in mini-batches, rather than online.As in DQN, we used a replay buffer to address these issues
  
  Motivation for mini-batches of training experiences and for the use of replay buffers for Deep RL.
3. mark.crowley 17 Jul 2023
  
  in Public
  
  The DPG algorithm maintains a parameterized actor function μ(s|θμ) which specifies the currentpolicy by deterministically mapping states to a specific action. The critic Q(s, a) is learned usingthe Bellman equation as in Q-learning. The actor is updated by following the applying the chain ruleto the expected return from the start distribution J with respect to the actor parameters:∇θμ J ≈ Est∼ρβ[∇θμ Q(s, a|θQ)|s=st,a=μ(st|θμ)]= Est∼ρβ[∇aQ(s, a|θQ)|s=st,a=μ(st)∇θμ μ(s|θμ)|s=st] (6)Silver et al. (2014) proved that this is the policy gradient, the gradient of the policy’s performance
  
  The original DPG algorithm (non-deep) takes the Actor-Critic idea and makes the Actor deterministic.
4. mark.crowley 17 Jul 2023
  
  in Public
  
  Interestingly, all of our experiments used substantially fewer steps of experience than was used byDQN learning to find solutions in the Atari domain.
  
  Training with DDPG seems to require less steps/examples than DQN.
5. mark.crowley 17 Jul 2023
  
  in Public
  
  The original DPG paper evaluated the algorithm with toy problems using tile-coding and linearfunction approximators. It demonstrated data efficiency advantages for off-policy DPG over bothon- and off-policy stochastic actor critic.
  
  (non-deep) DPG used tile-coding and linear VFAs.
6. mark.crowley 17 Jul 2023
  
  in Public
  
  It can be challenging to learn accurate value estimates. Q-learning, for example, is prone to over-estimating values (Hasselt, 2010). We examined DDPG’s estimates empirically by comparing thevalues estimated by Q after training with the true returns seen on test episodes. Figure 3 shows thatin simple tasks DDPG estimates returns accurately without systematic biases. For harder tasks theQ estimates are worse, but DDPG is still able learn good policies.
  
  DDPG avoids the over-estimation problem that Q-learning has without using Double Q-learning.
7. mark.crowley 17 Jul 2023
  
  in Public
  
  It is not possible to straightforwardly apply Q-learning to continuous action spaces, because in con-tinuous spaces finding the greedy policy requires an optimization of at at every timestep; this opti-mization is too slow to be practical with large, unconstrained function approximators and nontrivialaction spaces
  
  Why it is not possible for pure Q-learning to handle continuous action spaces.
8. mark.crowley 17 Jul 2023
  
  in Public
  
  Our contribution here is to provide modifications to DPG, inspired bythe success of DQN, which allow it to use neural network function approximators to learn in largestate and action spaces online
  
  contribution of this paper.
9. mark.crowley 17 Jul 2023
  
  in Public
  
  Directly implementing Q learning (equation 4) with neural networks proved to be unstable in manyenvironments.
10. mark.crowley 17 Jul 2023
  
  in Public
  
  As with Q learning, introducing non-linear function approximators means that convergence is nolonger guaranteed. However, such approximators appear essential in order to learn and generalizeon large state spaces.
  
  Why Q-learning can't have guaranteed convergence.
11. mark.crowley 10 Jul 2023
  
  in Public
  
  We refer to our algorithm as Deep DPG (DDPG, Algorithm 1).
12. mark.crowley 10 Jul 2023
  
  in Public
  
  This means that the target values are constrained to change slowly, greatlyimproving the stability of learning.
13. mark.crowley 10 Jul 2023
  
  in Public
  
  A major challenge of learning in continuous action spaces is exploration. An advantage of off-policies algorithms such as DDPG is that we can treat the problem of exploration independentlyfrom the learning algorithm.
  
  Learning and Exploration are handled seperately.
14. mark.crowley 10 Jul 2023
  
  in Public
  
  but modified for actor-critic and using “soft” target updates, rather thandirectly copying the weights.
15. mark.crowley 10 Jul 2023
  
  in Public
  
  his simple change moves the relatively unstable problem oflearning the action-value function closer to the case of supervised learning, a problem for whichrobust solutions exist.
16. mark.crowley 10 Jul 2023
  
  in Public
  
  One approach to this problem is to manually scale the features so they are in similar ranges acrossenvironments and units. We address this issue by adapting a recent technique from deep learningcalled batch normalization
17. mark.crowley 10 Jul 2023
  
  in Public
  
  minimize covariance shif
18. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduces the DDPG algorithm which builds on the existing DPG algorithm from classic RL theory. The main idea is to define a deterministic policy, or nearly deterministic, for situations where the environment is very sensitive to suboptimal actions, and one action setting usually dominates in each state. This showed good performance, but could not beat algorithms such as PPO until the additions of SAC were added. SAC adds an entropy penalty which essentially penalizes uncertainty in any states. Using this, the deterministic policy gradient approach performs well.
  
  ddpg reinforcement-learning SAC DPG PPO
19. mark.crowley 10 Jul 2023
  
  in Public
  
  normalizes each dimensionacross the samples in a minibatch to have unit mean and variance
Visit annotations in context

Tags

ddpg

DPG

SAC

reinforcement-learning

PPO

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.02971
proceedings.mlr.press proceedings.mlr.press

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

19
1. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA: Scalable Distributed Deep-RL with Importance WeightedActor-Learner Architectures
  
  (Espeholt, ICML, 2018) "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures"
  
  reinforcement-learning impala
2. mark.crowley 10 Jul 2023
  
  in Public
  
  We achieve stable learning at high through-put by combining decoupled acting and learningwith a novel off-policy correction method calledV-trace.
3. mark.crowley 10 Jul 2023
  
  in Public
  
  we aim to solve a large collection oftasks using a single reinforcement learning agentwith a single set of parameters
4. mark.crowley 10 Jul 2023
  
  in Public
  
  the progress has been primarily in singletask performance
5. mark.crowley 10 Jul 2023
  
  in Public
  
  multi-task reinforcement learning
  
  Task: Multi-task Reinforcement Learning
6. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA (Figure 1) uses an actor-critic setup to learn apolicy π and a baseline function V π . The process of gener-ating experiences is decoupled from learning the parametersof π and V π . The architecture consists of a set of actors,repeatedly generating trajectories of experience, and one ormore learners that use the experiences sent from actors tolearn π off-policy.
7. mark.crowley 10 Jul 2023
  
  in Public
  
  an agent is trained on each task
8. mark.crowley 10 Jul 2023
  
  in Public
  
  scalability
9. mark.crowley 10 Jul 2023
  
  in Public
  
  separately
10. mark.crowley 10 Jul 2023
  
  in Public
  
  We are interested in developing new methodscapable of mastering a diverse set of tasks simultaneously aswell as environments suitable for evaluating such methods.
  
  Task: train agents that can do more than one thing.
11. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA actors communicate trajectoriesof experience (sequences of states, actions, and rewards) to acentralised learner
12. mark.crowley 10 Jul 2023
  
  in Public
  
  full trajectories of experience
13. mark.crowley 10 Jul 2023
  
  in Public
  
  aggressivelyparallelising all time independent operations
14. mark.crowley 10 Jul 2023
  
  in Public
  
  learning becomes off-policy
15. mark.crowley 10 Jul 2023
  
  in Public
  
  IM-PALA achieves exceptionally high data throughput rates of250,000 frames per second, making it over 30 times fasterthan single-machine A3C
16. mark.crowley 10 Jul 2023
  
  in Public
  
  With the introduction of very deep model architectures, thespeed of a single GPU is often the limiting factor duringtraining.
17. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA is also moredata efficient than A3C based agents and more robust tohyperparameter values and network architectures
18. mark.crowley 10 Jul 2023
  
  in Public
  
  IMPALA use synchronised parameter update which is vitalto maintain data efficiency when scaling to many machines
19. mark.crowley 10 Jul 2023
  
  in Public
  
  A3C
Visit annotations in context

Tags

impala

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
proceedings.mlr.press proceedings.mlr.press

Deterministic Policy Gradient Algorithms

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  This paper introduced the DPG Algorithm
  
  DPG reinforcement-learning
Visit annotations in context

Tags

DPG

reinforcement-learning

Annotators

mark.crowley

URL

proceedings.mlr.press/v32/silver14.pdf
openreview.net openreview.net

babyai_a_platform_to_study_the.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Link to page with information about the paper: https://openreview.net/forum?id=rJeXCo0cYX
  
  reinforcement-learning curriculum-learning grid-world babyai
Visit annotations in context

Tags

babyai

curriculum-learning

reinforcement-learning

grid-world

Annotators

mark.crowley

URL

openreview.net/pdf
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net

a_path_towards_autonomous_mach.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Yann LeCun released his vision for the future of Artificial Intelligence research in 2022, and it sounds a lot like Reinforcement Learning.
  
  reinforcement-learning agi
Visit annotations in context

Tags

agi

reinforcement-learning

Annotators

mark.crowley

URL

inst-fs-iad-prod.inscloudgate.net/files/93295340-6262-42fa-8e05-d1752bfe92ba/LeCun_a_path_towards_autonomous_mach.pdf
arxiv.org arxiv.org

Deep Reinforcement Learning with Double Q-learning

15
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that evaluated the existing Double Q-Learning algorithm on the new DQN approach and validated that it is very effective in the Deep RL realm.
  
  reinforcement-learning dqn deep-learning
2. mark.crowley 10 Jul 2023
  
  in Public
  
  Q-learning(Watkins, 1989) is one of the most popular reinforcementlearning algorithms, but it is known to sometimes learn un-realistically high action values because it includes a maxi-mization step over estimated action values, which tends toprefer overestimated to underestimated values
  
  Q-learning tends to overestimate the value of an action.
3. mark.crowley 10 Jul 2023
  
  in Public
  
  noise
4. mark.crowley 10 Jul 2023
  
  in Public
  
  unify these views
5. mark.crowley 10 Jul 2023
  
  in Public
  
  we can learn a parameterized value function
6. mark.crowley 10 Jul 2023
  
  in Public
  
  insufficiently flexible function approximation
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Both the target networkand the experience replay dramatically improve the perfor-mance of the algorithm
8. mark.crowley 10 Jul 2023
  
  in Public
  
  The target used by DQN is then
9. mark.crowley 10 Jul 2023
  
  in Public
  
  show overestimationscan occur when the action values are inaccurate, irrespectiveof the source of approximation error
  
  They show overestimations occur when there is approximation error in the value function approximation for Q(s,a).
10. mark.crowley 10 Jul 2023
  
  in Public
  
  θt
11. mark.crowley 10 Jul 2023
  
  in Public
  
  upward bias
12. mark.crowley 10 Jul 2023
  
  in Public
  
  In the original Double Q-learning algorithm, two valuefunctions are learned by assigning each experience ran-domly to update one of the two value functions, such thatthere are two sets of weights, θ and θ′
13. mark.crowley 10 Jul 2023
  
  in Public
  
  θ′t
14. mark.crowley 10 Jul 2023
  
  in Public
  
  while Double Q-learning is unbiased.
15. mark.crowley 10 Jul 2023
  
  in Public
  
  The orange bars show the bias in a single Q-learning update when the action values are Q(s, a) =V∗(s) + a and the errors {a}ma=1 are independent standardnormal random variables. The second set of action valuesQ′, used for the blue bars, was generated identically and in-dependently. All bars are the average of 100 repetitions.
Visit annotations in context

Tags

dqn

reinforcement-learning

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1509.06461v3
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

7
1. mark.crowley 10 Jul 2023
  
  in Public
  
  DDPG
2. mark.crowley 10 Jul 2023
  
  in Public
  
  multiplying the rewards gen-erated from an environment by some scalar
3. mark.crowley 10 Jul 2023
  
  in Public
  
  ELU
4. mark.crowley 10 Jul 2023
  
  in Public
  
  his is akin to clipping therewards to [0, 1]
5. mark.crowley 10 Jul 2023
  
  in Public
  
  network structure of
  
  differernt activiation functions tried
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Hyperparameters
  
  hyperparameters: alpha, dropbox prob, number of layers in your network, width of network layers, activation function (RELU, ELU, tanh, ...), CNN?, RNN?, ..., , epsilon (for e-greedy policy)
  
  parameters: specific to problem - paramters of Q(S,a) and policy pi (theta, w), gamma (? how important is the future)
7. mark.crowley 10 Jul 2023
  
  in Public
  
  PPO
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
arxiv.org arxiv.org

1707.06347.pdf

17
1. mark.crowley 10 Jul 2023
  
  in Public
  
  TRPO uses a hard constraint rather than a penalty because it is hardto choose a single value of β that performs well across different problems
2. mark.crowley 10 Jul 2023
  
  in Public
  
  gradient estimator
3. mark.crowley 10 Jul 2023
  
  in Public
  
  we only ignore the change in probability ratio when it would make the objective improve,and we include it when it makes the objective worse.
4. mark.crowley 10 Jul 2023
  
  in Public
  
  ot sufficient to simply choose a fixed penalty coefficient β and optimize the penalizedobjective Equation (5) with SGD
5. mark.crowley 10 Jul 2023
  
  in Public
  
  objective function (the “surrogate” objective) is maximized
  
  PPO is a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Generalizingthis choice, we can use a truncated version of generalized advantage estimation
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Without a constraint, maximization of LCP I would lead to an excessively large policyupdate; hence, we now consider how to modify the objective, to penalize changes to the policy thatmove rt(θ) away from 1
  
  The policy iteration objective proposes steps which are too large. It uses a likelihood ratio of the current policy with and older version of the policy multiplied by the Advantage function. So, it uses the change in the policy probability for an action to weight the Advantage function.
8. mark.crowley 10 Jul 2023
  
  in Public
  
  our goalof a first-order algorithm that emulates the monotonic improvement of TRPO,
9. mark.crowley 10 Jul 2023
  
  in Public
  
  A proximal policy optimization (PPO) algorithm that uses fixed-length trajectory segments isshown below. Each iteration, each of N (parallel) actors collect T timesteps of data. Then weconstruct the surrogate loss on these N T timesteps of data, and optimize it with minibatch SGD
10. mark.crowley 10 Jul 2023
  
  in Public
  
  Thefirst term inside the min is LCP I . The second term, clip(rt(θ), 1 − , 1 + ) ˆAt, modifies the surrogateobjective by clipping the probability ratio, which removes the incentive for moving rt outside of theinterval [1 − , 1 + ]. Finally, we take the minimum of the clipped and unclipped objective, so thefinal objective is a lower bound (i.e., a pessimistic bound) on the unclipped objective
  
  The "clip" function cuts off the probability ratio output so that some changes in Advantage are ignored.
11. mark.crowley 10 Jul 2023
  
  in Public
  
  Clipped Surrogate Objective
12. mark.crowley 10 Jul 2023
  
  in Public
  
  Wecan see that LCLIP is a lower bound on LCP I , with a penalty for having too large of a policyupdate
  
  The clipped loss is a lower bound on the actual loss defined in TRPO. So it is simpler to compute, and will provide some guidance at least, it will never overestimate the true loss.
13. mark.crowley 10 Jul 2023
  
  in Public
  
  hese methods havethe stability and reliability of trust-region methods but are much simpler to implement, requiringonly few lines of code change to a vanilla policy gradient implementation, applicable in more generalsetting
14. mark.crowley 10 Jul 2023
  
  in Public
  
  shows howseveral objectives vary as we interpolate along the policy update direction
15. mark.crowley 10 Jul 2023
  
  in Public
  
  Surrogate objectives, as we interpolate between the initial policy parameter θold, and the updatedpolicy parameter, which we compute after one iteration of PPO.
  
  Another figure to show intuition for the approach by showing how each component changes with respect to following the policy update along the gradient direction.
16. mark.crowley 10 Jul 2023
  
  in Public
  
  lower bound (i.e., a pessimistic bound
17. mark.crowley 10 Jul 2023
  
  in Public
  
  Paper that introduced the PPO algorithm. PPO is, in a way, a response to the TRPO algorithm, trying to use the core idea but implement a more efficient and simpler algorithm.
  
  TRPO defines the problem as a straight optimization problem, no learning is actually involved.
  
  ppo reinforcement-learning policy-gradients trpo
Visit annotations in context

Tags

trpo

policy-gradients

ppo

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
arxiv.org arxiv.org

1710.02298.pdf

14
1. mark.crowley 10 Jul 2023
  
  in Public
  
  samples transitions with probability ptrelative to the last encountered absolute TD error
2. mark.crowley 10 Jul 2023
  
  in Public
  
  RMSprop
3. mark.crowley 10 Jul 2023
  
  in Public
  
  his means thatin the loss above, the time index t will be a random time in-dex from the last million transitions, rather than the currenttime.
4. mark.crowley 10 Jul 2023
  
  in Public
  
  Multi-step learning.
5. mark.crowley 10 Jul 2023
  
  in Public
  
  Prioritized replay.
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Prioritized
7. mark.crowley 10 Jul 2023
  
  in Public
  
  parameters θ of the online network (which is alsoused to select actions
8. mark.crowley 10 Jul 2023
  
  in Public
  
  ablation
9. mark.crowley 10 Jul 2023
  
  in Public
  
  θ represents the parame-ters of a target network
10. mark.crowley 10 Jul 2023
  
  in Public
  
  a periodic copy of the online net-work which is not directly optimized.
11. mark.crowley 10 Jul 2023
  
  in Public
  
  Noisy Nets. The limitations of exploring using -greedypolicies are clear in games such as Montezuma’s Revenge,where many actions must be executed to collect the first re-ward
12. mark.crowley 10 Jul 2023
  
  in Public
  
  t is a time step randomly picked from the replaymemory
13. mark.crowley 10 Jul 2023
  
  in Public
  
  DDQN
14. mark.crowley 10 Jul 2023
  
  in Public
  
  This famous paper gives a great review of the DQN algorithm a couple years after it changed everything in Deep RL. It compares six different extensions to DQN for Deep Reinforcement Learning, many of which have now become standard additions to DQN and other Deep RL algorithms. It also combines all of them together to produce the "rainbow" algorithm, which outperformed many other models for a while.
  
  reinforcement-learning experimental-design
Visit annotations in context

Tags

experimental-design

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1710.02298
arxiv.org arxiv.org

2206.11795.pdf

3
1. mark.crowley 10 Jul 2023
  
  in Public
  
  For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craftdiamond tools, which can take proficient humans upwards of 20 minutes (24,000environment actions) of gameplay to accomplish
2. mark.crowley 10 Jul 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  Introduction of VPT : New semi-supervied pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
3. mark.crowley 10 Jul 2023
  
  in Public
  
  e extend the internet-scalepretraining paradigm to sequential decision domains through semi-supervisedimitation learning wherein agents learn to act by watching online unlabeled videos.
Visit annotations in context

Tags

pretrained-models

minecraft

foundation-models

reinforcement-learning

proj-minerl

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
arxiv.org arxiv.org

2106.14876.pdf

7
1. mark.crowley 10 Jul 2023
  
  in Public
  
  an agent isinstructed to obtain a desired goal item
  
  Problem: Agent must complete the instructed task in MineCraft
2. mark.crowley 10 Jul 2023
  
  in Public
  
  urriculum learning (at least using current RL methods) is that the agentachieves a small success probability (within available/reasonable compute) on a new task aftermastering a previous task.
  
  Curriculum Learning
3. mark.crowley 10 Jul 2023
  
  in Public
  
  We study curriculum learning on a set of goal-conditioned Minecraft tasks, in which the agent istasked to collect one out of a set of 107 items from the Minecraft tech tree
4. mark.crowley 10 Jul 2023
  
  in Public
  
  Results
  
  Experiments: They compared a variety of policies and training approachs
5. mark.crowley 10 Jul 2023
  
  in Public
  
  It has 5 minutes (1500 time steps) to complete the task and obtains areward of +1 upon success. After each success or failure a new task is selected without resettingthe world or respawning the agent
  
  Agent has 5 min to find item
  
  Next item chosen without resetting world
6. mark.crowley 10 Jul 2023
  
  in Public
  
  Simon Says”
7. mark.crowley 10 Jul 2023
  
  in Public
  
  Learning progress curriculum
  
  Approach: Curriculum Learning
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.14876.pdf
arxiv.org arxiv.org

2104.10986.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Arxiv paper from 2021 on reinforcement learning in a scenario where your aim is to learn a workable POMDP policy, but you start with a fully observable MDP and adjust it over time towards a POMDP.
  
  reinforcement-learning pomdp mdp
Visit annotations in context

Tags

mdp

reinforcement-learning

pomdp

Annotators

mark.crowley

URL

arxiv.org/pdf/2104.10986.pdf
arxiv.org arxiv.org

Liang15.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  Response paper to DQN showing that well designed Value Function Approximations can also do well at these complex tasks without the use of Deep Learning
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  dqn reinforcement-learning atari-games deep-learning shallow-learning
Visit annotations in context

Tags

atari-games

shallow-learning

dqn

reinforcement-learning

deep-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/1512.01563
arxiv.org arxiv.org

1511.05952.pdf

1
1. mark.crowley 10 Jul 2023
  
  in Public
  
  Tom Schaul, John Quan, Ioannis Antonoglou and David Silver. "PRIORITIZED EXPERIENCE REPLAY", ICLR, 2016.
  
  reinforcement-learning ppo deep-learning deep-rl policy-gradient direct-policy-search trust-region
Visit annotations in context

Tags

trust-region

ppo

direct-policy-search

deep-learning

reinforcement-learning

policy-gradient

deep-rl

Annotators

mark.crowley

URL

arxiv.org/pdf/1511.05952.pdf
openaccess.thecvf.com openaccess.thecvf.com

Temporal Recurrent Networks for Online Action Detection

1
1. mark.crowley 07 Jul 2023
  
  in Public
  
  Xu, ICCV, 2019 "Temporal Recurrent Networks for Online Action Detection"
  
  arxiv: https://arxiv.org/abs/1811.07391 hypothesis: https://hyp.is/go?url=https%3A%2F%2Fopenaccess.thecvf.com%2Fcontent_ICCV_2019%2Fpapers%2FXu_Temporal_Recurrent_Networks_for_Online_Action_Detection_ICCV_2019_paper.pdf&group=world
  
  driver-behaviour-learning autonomous-driving lstm rnn deep-learning recurrent-neural-networks time-series
Visit annotations in context

Tags

time-series

driver-behaviour-learning

lstm

recurrent-neural-networks

rnn

autonomous-driving

deep-learning

Annotators

mark.crowley

URL

openaccess.thecvf.com/content_ICCV_2019/papers/Xu_Temporal_Recurrent_Networks_for_Online_Action_Detection_ICCV_2019_paper.pdf
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 05 Jul 2023
  
  in Public
  
  Few-Shot (FS) - the model is given a few demonstrations of the task at inference time asconditioning [ RWC+19 ], but no weights are updated
  
  hints are given but the model is not updated
Visit annotations in context

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Jun 2023
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

4
1. mark.crowley 28 Jun 2023
  
  in Public
  
  fuzzy
  
  fuzzy!
2. mark.crowley 28 Jun 2023
  
  in Public
  
  [KMH+20] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess,Rewon Child, Scott Gray, Alec Ra
  
  Justification for low learning rate in large language models.
3. mark.crowley 28 Jun 2023
  
  in Public
  
  As found in [ KMH+20 , MKAT18 ], larger models can typically use a larger batch size, but requirea smaller learning rate. We measure the gradient noise scale during training and use it to guideour choice of batch size [MKAT18 ]. Table A.1 shows the parameter settings we used. To train thelarger models without running out of memory, we use a mixture of model parallelism within eachmatrix multiply and model parallelism across the layers of the network. All models were trained onV100 GPU’s on part of a high-bandwidth cluster. Details of the training process and hyperparametersettings are described in the appendix.
  
  Why is this?
4. mark.crowley 28 Jun 2023
  
  in Public
  
  We use the same model and architecture as GPT-2
  
  What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
  
  machine-learning transformers gpt ml-practice
Visit annotations in context

Tags

ml-practice

gpt

transformers

machine-learning

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
cdn.openai.com cdn.openai.com

Language Models are Unsupervised Multitask Learners

5
1. mark.crowley 28 Jun 2023
  
  in Public
  
  While zero-shot performance establishes a baseline of thepotential performance of GPT-2 on many tasks, it is notclear where the ceiling is with finetuning.
  
  So finetuning could lead to better models.
2. mark.crowley 28 Jun 2023
  
  in Public
  
  13.19%
  
  that's a lot!
  
  evaluation-methods
3. mark.crowley 28 Jun 2023
  
  in Public
  
  The Bloom filterswere constructed such that the false positive rate is upperbounded by 1108 . We further verified the low false positiverate by generating 1M strings, of which zero were found bythe filter
  
  Bloom filters used to determine how much overlap there is between train and test set, to be more sure of their results.
  
  evaluation-methods
4. mark.crowley 28 Jun 2023
  
  in Public
  
  Bloom filters
  
  Bloom Filter:
  
  The high level idea is to map elements x∈X to values y=h(x)∈Y using a hash function h, and then test for membership of x' in X by checking whether y'=h(x')∈Y, and do that using multiple hash functions h.
  
  Bloom Filter - Wikipedia
5. mark.crowley 28 Jun 2023
  
  in Public
  
  Recent work in computer vision has shown that common im-age datasets contain a non-trivial amount of near-duplicateimages. For instance CIFAR-10 has 3.3% overlap betweentrain and test images (Barz & Denzler, 2019). This results inan over-reporting of the generalization performance of ma-chine learning systems.
  
  CIFAR-10 performance results are overestimates since some of the training data is essentially in the test set.
  
  image-processing convolutional-neural-networks deep-learning machine-learning datasets
Visit annotations in context

Tags

evaluation-methods

datasets

image-processing

convolutional-neural-networks

machine-learning

deep-learning

Annotators

mark.crowley

URL

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
www.fandm.edu www.fandm.edu

617813975725918530-aamas2016-shallow-rl.pdf

1
1. mark.crowley 16 Jun 2023
  
  in Public
  
  Liang, Machado, Talvite, Bowling - AAMAS 2016 "State of the Art Control of Atari Games Using Shallow Reinforcement Learning"
  
  A great paper showing how to think differently about the latest advances in Deep RL. All is not always what it seems!
  
  reinforcement-learning dqn deep-learning shallow-learning atari-games uwece457C
Visit annotations in context

Tags

uwece457C

atari-games

shallow-learning

dqn

reinforcement-learning

deep-learning

Annotators

mark.crowley

URL

fandm.edu/uploads/files/617813975725918530-aamas2016-shallow-rl.pdf
arxiv.org arxiv.org

2306.00937.pdf

1
1. mark.crowley 13 Jun 2023
  
  in Public
  
  e also add 8,000 text-instructions generated bythe OpenAI API gpt-3.5-turbo model [38],
  
  how does this work? takes images as well as input?
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2306.00937.pdf
www.cs.mcgill.ca www.cs.mcgill.ca

rnn_nips.pdf

1
1. mark.crowley 12 Jun 2023
  
  in Public
  
  Paper from 2016 soon after DQN paper, about how to use eligbility traces to improve performance further.
Visit annotations in context

Annotators

mark.crowley

URL

cs.mcgill.ca/~jmerhe1/rnn_nips.pdf
assets.pubpub.org assets.pubpub.org

01621566588509.pdf

1
1. mark.crowley 09 Jun 2023
  
  in Public
  
  LeBlanc, D. G., & Lee, G. (2021). General Deep Reinforcement Learning in NES Games. Canadian AI 2021. Canadian Artificial Intelligence Association (CAIAC). https://doi.org/10.21428/594757db.8472938b
  
  canadian-ai reinforcement-learning video-games
Visit annotations in context

Tags

video-games

canadian-ai

reinforcement-learning

Annotators

mark.crowley

URL

assets.pubpub.org/uonw8d4k/01621566588509.pdf
assets.pubpub.org assets.pubpub.org

61682631405782.pdf

1
1. mark.crowley 07 Jun 2023
  
  in Public
  
  hypothesis test for CANAI23 paper
  
  canadian-ai
Visit annotations in context

Tags

canadian-ai

Annotators

mark.crowley

URL

assets.pubpub.org/j00xvl6z/61682631405782.pdf
jmlr.org jmlr.org

20-074.pdf

2
1. mark.crowley 06 Jun 2023
  
  in Public
  
  introducing a unified framework that converts all text-basedlanguage problems into a text-to-text format
  
  this is their goal, to have a single model, including hyperparameters and setup, that can be used for any NLP task.
  
  nlp transformers
2. mark.crowley 06 Jun 2023
  
  in Public
  
  Paper introducing the T5 Text-to-Text transformer mdoel from google. (Raffel, JMLR, 2020)
  
  transformers nlp
Visit annotations in context

Tags

transformers

nlp

Annotators

mark.crowley

URL

jmlr.org/papers/volume21/20-074/20-074.pdf
Apr 2023
srush.github.io srush.github.io

The Annotated S4

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
  
  A new approach to transformers
  
  transformers large-language-models
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

srush.github.io/annotated-s4/
arxiv.org arxiv.org

Efficiently Modeling Long Sequences with Structured State Spaces

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
  
  transformers large-language-models
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2111.00396
arxiv.org arxiv.org

2206.11795.pdf

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Bowen Baker et. al. (Open AI) "Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos" Arkiv, June 2022.
  
  New supervised pre-trained model for sequential decision making on Minecraft. Data are from human video playthroughs but are unlabelled.
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
  
  reinforcement-learning foundation-models pretrained-models proj-minerl minecraft
Visit annotations in context

Tags

pretrained-models

minecraft

foundation-models

reinforcement-learning

proj-minerl

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.11795.pdf
openai.com openai.com

Learning to play Minecraft with Video PreTraining

1
1. mark.crowley 12 Apr 2023
  
  in Public
  
  Open AI page describing their video pretraining for minecraft.
  
  proj-minerl
Visit annotations in context

Tags

proj-minerl

Annotators

mark.crowley

URL

openai.com/research/vpt
en.wikipedia.org en.wikipedia.org

Travelling salesman problem - Wikipedia

1
1. mark.crowley 03 Apr 2023
  
  in Public
  
  Wikipedia article for the Travelling Salesman Problem (TSP)
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Travelling_salesman_problem
en.wikipedia.org en.wikipedia.org

Complexity class - Wikipedia

2
1. mark.crowley 03 Apr 2023
  
  in Public
  
  In this way, an NTM can be thought of as simultaneously exploring all computational possibilities in parallel and selecting an accepting branch
  
  Non-deterministic Turing Machines are able to get lucky and choose the single path to the answer in polynomial time, or be given a "hint" or "proof" or "certificate" for that path. This isn't realistic, but it separates the difficulty of the problem of verifying a solution and finding one into two different tasks.
2. mark.crowley 03 Apr 2023
  
  in Public
  
  Computational problems[edit] Intuitively, a computational problem is just a question that can be solved by an algorithm. For example, "is the natural number n {\displaystyle n} prime?" is a computational problem. A computational problem is mathematically represented as the set of answers to the problem. In the primality example, the problem (call it PRIME {\displaystyle {\texttt {PRIME}}} ) is represented by the set of all natural numbers that are prime: PRIME = { n ∈ N | n is prime } {\displaystyle {\texttt {PRIME}}=\{n\in \mathbb {N} |n{\text{ is prime}}\}} . In the theory of computation, these answers are represented as strings; for example, in the primality example the natural numbers could be represented as strings of bits that represent binary numbers. For this reason, computational problems are often synonymously referred to as languages, since strings of bits represent formal languages (a concept borrowed from linguistics); for example, saying that the PRIME {\displaystyle {\texttt {PRIME}}} problem is in the complexity class NP is equivalent to saying that the language PRIME {\displaystyle {\texttt {PRIME}}} is in NP.
  
  Explanation of why computational complexity class proofs with Turing Machines use "strings" instead of algorithms or programs.
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Complexity_class
en.wikipedia.org en.wikipedia.org

Presburger arithmetic - Wikipedia

1
1. mark.crowley 03 Apr 2023
  
  in Public
  
  Presburger arithmetic is much weaker than Peano arithmetic, which includes both addition and multiplication operations. Unlike Peano arithmetic, Presburger arithmetic is a decidable theory. This means it is possible to algorithmically determine, for any sentence in the language of Presburger arithmetic, whether that sentence is provable from the axioms of Presburger arithmetic. The asymptotic running-time computational complexity of this algorithm is at least doubly exponential, however, as shown by Fischer & Rabin (1974).
  
  This is an example of a decision problem that is at least doubly exponential \(2^{2^n}\). It is a simpler form of arithmetic where the Halting problem/incompleteness theorem does not apply.
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Presburger_arithmetic
en.wikipedia.org en.wikipedia.org

NP-completeness - Wikipedia

4
1. mark.crowley 03 Apr 2023
  
  in Public
  
  NP-hard if everything in NP can be transformed in polynomial time into it even though it may not be in NP
  
  Definition of NP-hard problem
2. mark.crowley 03 Apr 2023
  
  in Public
  
  At present, all known algorithms for NP-complete problems require time that is superpolynomial in the input size, in fact exponential in O ( n k ) {\displaystyle O(n^{k})} [clarify] for some k > 0 {\displaystyle k>0} and it is unknown whether there are any faster algorithms.
  
  So how hard are NP-complete problems?
3. mark.crowley 03 Apr 2023
  
  in Public
  
  The Subgraph Isomorphism problem is NP-complete. The graph isomorphism problem is suspected to be neither in P nor NP-complete, though it is in NP. This is an example of a problem that is thought to be hard, but is not thought to be NP-complete. This class is called NP-Intermediate problems and exists if and only if P≠NP.
  
  There might even be some problems in NP but not in P and that are not NP-complete.
4. mark.crowley 03 Apr 2023
  
  in Public
  
  NP-complete problems are often addressed by using heuristic methods and approximation algorithms.
  
  usually solved with approximation algorithms
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/NP-completeness
openai.com openai.com

GPT-4

4
1. mark.crowley 03 Apr 2023
  
  in Public
  
  my annotations for the OpenAI GPT4 info page.
  
  chatgpt
2. mark.crowley 03 Apr 2023
  
  in Public
  
  GPT-4 outperforms ChatGPT by scoring in higher approximate percentiles among test-takers.
  
  oh, great.
  
  chatgpt cheating
3. mark.crowley 03 Apr 2023
  
  in Public
  
  40% more likely to produce factual responses than GPT-3.5
  
  great, 40% more than what though?
  
  chatgpt
4. mark.crowley 03 Apr 2023
  
  in Public
  
  We used GPT-4 to help create training data for model fine-tuning and iterate on classifiers across training, evaluations, and monitoring.
  
  Interesting, you need to consider, is this like data augmentation, like bootstrapping, like adversarial training, or is it like overfitting to your data?
  
  chatgpt
Visit annotations in context

Tags

chatgpt

cheating

Annotators

mark.crowley

URL

openai.com/product/gpt-4
Mar 2023
www.inc.com www.inc.com

Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now

1
1. mark.crowley 27 Mar 2023
  
  in Public
  
  "There is a robust debate going on in the computing industry about how to create it, and whether it can even be created at all."
  
  Is there? By whom? Why industry only and not government, academia and civil society?
  
  ai-for-good aigpt20230326 large-language-models chat open
Visit annotations in context

Tags

aigpt20230326

open

ai-for-good

large-language-models

chat

Annotators

mark.crowley

URL

inc.com/minda-zetlin/bill-gates-says-were-witnessing-a-stunning-new-technology-age-5-ways-to-prepare.html
arxiv.org arxiv.org

2010.03950.pdf

1
1. mark.crowley 07 Mar 2023
  
  in Public
  
  asks for the Minecraft domain.
  
  They demonstrate the model on a "minecraft-like" domain (introduced earlier by someone else) where there are resources in the world and the agent has tasks.
  
  minecraft reinforcement-learning
Visit annotations in context

Tags

minecraft

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2010.03950

Mark Crowley

Associate Professor as the University of Waterloo.

Research and teaching on topics in Artificial Intelligence, Machine Learning and Reinforcement Learning.

Reading group links: https://markcrowley.ca/reading-groups/

Annotations: 439

Joined: April 4, 2020

Location: Waterloo, Canada

Link: markcrowley.ca

ORCID: 0000-0003-3921-4762

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags