- Aug 2024
-
proceedings.neurips.cc proceedings.neurips.cc
-
MvP : "Direct Multi-view Multi-person 3D Pose Estimation" Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
Influential paper on learning consistent skeletal models of human pose from multiview images
-
-
openaccess.thecvf.com openaccess.thecvf.com
-
Really interesting and innovative method for using multiview perspective data to learn human pose and pedestrian detection.
-
- Jan 2024
-
arxiv.org arxiv.org
-
Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
-
-
cdn.openai.com cdn.openai.com
-
GPT-4 System CardOpenAIMarch 23, 2023
-
- Nov 2023
-
proceedings.mlr.press proceedings.mlr.press
-
Reading this one on Nov 27, 2023 for the reading group.
-
-
proceedings.neurips.cc proceedings.neurips.cc
-
Reading this one on Nov 27, 2023 for the reading group.
-
- Oct 2023
-
-
Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
-
-
arxiv.org arxiv.org
-
(Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
-
-
arxiv.org arxiv.org
-
Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
-
-
arxiv.org arxiv.org
-
Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
-
-
cdn.openai.com cdn.openai.com
-
GPT-2 Introduction paper
Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
-
-
www.semanticscholar.org www.semanticscholar.org
-
"Attention is All You Need" Foundational paper introducing the Transformer Architecture.
-
-
-
GPT-3 introduction paper
-
-
arxiv.org arxiv.org
-
"Are Pre-trained Convolutions Better than Pre-trained Transformers?"
-
-
arxiv.org arxiv.org
-
LaMDA: Language Models for Dialog Application
"LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
-
-
-
Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
-
-
typeshare.co typeshare.co
-
- Jul 2023
-
arxiv.org arxiv.org
-
LLAMA 2 Release Paper
-
- Jun 2023
-
-
We use the same model and architecture as GPT-2
What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
-
-
jmlr.org jmlr.org
-
introducing a unified framework that converts all text-basedlanguage problems into a text-to-text format
this is their goal, to have a single model, including hyperparameters and setup, that can be used for any NLP task.
-
Paper introducing the T5 Text-to-Text transformer mdoel from google. (Raffel, JMLR, 2020)
-
- Apr 2023
-
srush.github.io srush.github.io
-
The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
A new approach to transformers
-
-
-
Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
-
- Jan 2023
-
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
-
"Talking About Large Language Models" by Murray Shanahan
-
- Dec 2022
-
arxiv.org arxiv.org
-
Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
-
- Nov 2022
-
www.semanticscholar.org www.semanticscholar.org
-
we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
-
- Sep 2022
-
arxiv.org arxiv.org
-
We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
-
- Feb 2022
-
www.supercoloring.com www.supercoloring.com
-
Paper Transformer Toys Templates
-