- Nov 2023
-
proceedings.mlr.press proceedings.mlr.press
-
Reading this one on Nov 27, 2023 for the reading group.
-
-
proceedings.neurips.cc proceedings.neurips.cc
-
Reading this one on Nov 27, 2023 for the reading group.
-
- Oct 2023
-
arxiv.org arxiv.org
-
(Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
-
-
arxiv.org arxiv.org
-
Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
-
Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
Them's fighten' words!
I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
-
-
www.nature.com www.nature.com
-
Wang et. al. "Scientific discovery in the age of artificial intelligence", Nature, 2023.
A paper about the current state of using AI/ML for scientific discovery, connected with the AI4Science workshops at major conferences.
(NOTE: since Springer/Nature don't allow public pdfs to be linked without a paywall, we can't use hypothesis directly on the pdf of the paper, this link is to the website version of it which is what we'll use to guide discussion during the reading group.)
-
Petersen, B. K. et al. Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations (2020).
Description: Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add.
-
-
arxiv.org arxiv.org
-
Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
-