- Aug 2024
-
proceedings.neurips.cc proceedings.neurips.cc
-
MvP : "Direct Multi-view Multi-person 3D Pose Estimation" Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
Influential paper on learning consistent skeletal models of human pose from multiview images
-
-
openaccess.thecvf.com openaccess.thecvf.com
-
Really interesting and innovative method for using multiview perspective data to learn human pose and pedestrian detection.
-
- Jan 2024
-
arxiv.org arxiv.org
-
Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
-
-
cdn.openai.com cdn.openai.com
-
GPT-4 System CardOpenAIMarch 23, 2023
-
- Nov 2023
-
proceedings.mlr.press proceedings.mlr.press
-
Reading this one on Nov 27, 2023 for the reading group.
-
-
proceedings.neurips.cc proceedings.neurips.cc
-
Reading this one on Nov 27, 2023 for the reading group.
-
- Oct 2023
-
-
Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
-
-
arxiv.org arxiv.org
-
(Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
-
-
arxiv.org arxiv.org
-
Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
-
-
arxiv.org arxiv.org
-
Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
-
-
cdn.openai.com cdn.openai.com
-
GPT-2 Introduction paper
Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
-
-
arxiv.org arxiv.org
-
"Attention is All You Need" Foundational paper introducing the Transformer Architecture.
-
-
-
GPT-3 introduction paper
-
-
arxiv.org arxiv.org
-
"Are Pre-trained Convolutions Better than Pre-trained Transformers?"
-
-
arxiv.org arxiv.org
-
LaMDA: Language Models for Dialog Application
"LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
-
-
-
Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
-
-
typeshare.co typeshare.co
-
- Jul 2023
-
arxiv.org arxiv.org
-
LLAMA 2 Release Paper
-
- Jun 2023
-
-
We use the same model and architecture as GPT-2
What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
-
-
jmlr.org jmlr.org
-
introducing a unified framework that converts all text-basedlanguage problems into a text-to-text format
this is their goal, to have a single model, including hyperparameters and setup, that can be used for any NLP task.
-
Paper introducing the T5 Text-to-Text transformer mdoel from google. (Raffel, JMLR, 2020)
-
- Apr 2023
-
srush.github.io srush.github.io
-
The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
A new approach to transformers
-
-
-
Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
-
- Jan 2023
-
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
-
"Talking About Large Language Models" by Murray Shanahan
-
- Dec 2022
-
arxiv.org arxiv.org
-
Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
-
- Nov 2022
-
arxiv.org arxiv.org
-
we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
Tags
Annotators
URL
-
- Sep 2022
-
arxiv.org arxiv.org
-
We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
-
- Feb 2022
-
www.supercoloring.com www.supercoloring.com
-
Paper Transformer Toys Templates
-