Hypothesis

32 Matching Annotations

Sep 2025
arxiv.org arxiv.org

RWKV: Reinventing RNNs for the Transformer Era

1
1. aalichao 22 Sep 2025
  
  in Public
  
  Transformers have revolutionized almost allnatural language processing (NLP) tasks butsuffer from memory and computational com-plexity that scales quadratically with sequencelength. In contrast, recurrent neural networks(RNNs) exhibit linear scaling in memory andcomputational requirements but struggle tomatch the same performance as Transformersdue to limitations in parallelization and scala-
  
  Transformers' memory and compute scale quadratically with sequence length. RNNs' memory and compute scale linearly, but performance is not as good as Transformers due to parallelization and scalability limits.
  
  Transformers vs Recurrent Neural Networks
Visit annotations in context

Tags

Transformers vs Recurrent Neural Networks

Annotators

aalichao

URL

arxiv.org/pdf/2305.13048
Jun 2025
www.reddit.com www.reddit.com

Turning my typewriters into Transformers using ChatGPT

1
1. chrisaldrich 22 Jun 2025
  
  in Public
  
  https://www.reddit.com/r/typewriters/comments/1lgxcyj/turning_my_typewriters_into_transformers_using/
  
  Clever, albeit AI
  
  artificial intelligence art typewriter art typewriters as decor Transformers photos
Visit annotations in context

Tags

artificial intelligence art

Transformers

photos

typewriters as decor

typewriter art

Annotators

chrisaldrich

URL

reddit.com/r/typewriters/comments/1lgxcyj/turning_my_typewriters_into_transformers_using/
Mar 2025
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 28 Mar 2025
  
  in Public
  
  Attention(Q, K, V ) = softmax( QKT√dk)V
  
  Scaled Dot-Product Attention Formula
  
  transformers
Visit annotations in context

Tags

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
Feb 2025
epoch.ai epoch.ai

How has DeepSeek improved the Transformer architecture?

1
1. mark.crowley 02 Feb 2025
  
  in Public
  
  Detailed explanation of what DeepSeek model is doing differently to improve performance and training time over ChatGPT.
  
  large-language-models transformers deepseek chat-gpt
Visit annotations in context

Tags

deepseek

transformers

chat-gpt

large-language-models

Annotators

mark.crowley

URL

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Aug 2024
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-direct-multi-view-multi-person-3d-pose-estimation-Paper.pdf

1
1. mark.crowley 15 Aug 2024
  
  in Public
  
  MvP : "Direct Multi-view Multi-person 3D Pose Estimation" Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
  
  Influential paper on learning consistent skeletal models of human pose from multiview images
  
  transformers pedestrian-detection multi-view attention projective-attention consistency-learning
Visit annotations in context

Tags

attention

pedestrian-detection

consistency-learning

projective-attention

multi-view

transformers

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/6da9003b743b65f4c0ccd295cc484e57-Paper.pdf
openaccess.thecvf.com openaccess.thecvf.com

Multiple View Geometry Transformers for 3D Human Pose Estimation

1
1. mark.crowley 15 Aug 2024
  
  in Public
  
  Really interesting and innovative method for using multiview perspective data to learn human pose and pedestrian detection.
  
  transformers pedestrian-detection hierarchical-modelling consistency-learning
Visit annotations in context

Tags

hierarchical-modelling

consistency-learning

pedestrian-detection

transformers

Annotators

mark.crowley

URL

openaccess.thecvf.com/content/CVPR2024/papers/Liao_Multiple_View_Geometry_Transformers_for_3D_Human_Pose_Estimation_CVPR_2024_paper.pdf
Jan 2024
arxiv.org arxiv.org

2401.05566.pdf

1
1. mark.crowley 26 Jan 2024
  
  in Public
  
  Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
  
  Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
  
  large-language-models transformers rdgrp rdgrp-w24
Visit annotations in context

Tags

rdgrp

transformers

large-language-models

rdgrp-w24

Annotators

mark.crowley

URL

arxiv.org/pdf/2401.05566
cdn.openai.com cdn.openai.com

gpt-4-system-card.pdf

1
1. mark.crowley 06 Jan 2024
  
  in Public
  
  GPT-4 System CardOpenAIMarch 23, 2023
  
  chat-gpt large-language-models openai system-cards transformers toread reading_group_crowley
Visit annotations in context

Tags

toread

system-cards

chat-gpt

reading_group_crowley

large-language-models

openai

transformers

Annotators

mark.crowley

URL

cdn.openai.com/papers/gpt-4-system-card.pdf
Nov 2023
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

reinforcement-learning

transformers

rdgrp-f23

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

reinforcement-learning

transformers

rdgrp-f23

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
Oct 2023
arxiv.org arxiv.org

RoBERTa: A Robustly Optimized BERT Pretraining Approach

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
  
  large-language-models nlp transformers rdgrp-s23 reading_group_crowley
Visit annotations in context

Tags

rdgrp-s23

reading_group_crowley

large-language-models

transformers

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.11692
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

minecraft

reinforcement-learning

rdgrp-f23

reading_group_crowley

generative-models

minerl

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2308.13067.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
  
  transformers large-language-models nlp reading_group_crowley rdgrp-f23
Visit annotations in context

Tags

rdgrp-f23

transformers

reading_group_crowley

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2308.13067.pdf
arxiv.org arxiv.org

2212.05032.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
  
  Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
  
  chatgpt large-language-models nlp transformers ece657a
Visit annotations in context

Tags

chatgpt

ece657a

large-language-models

transformers

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2212.05032.pdf
cdn.openai.com cdn.openai.com

Language Models are Unsupervised Multitask Learners

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-2 Introduction paper
  
  Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

rdgrp-s23

reading_group_crowley

gpt

large-language-models

machine-learning

transformers

nlp

Annotators

mark.crowley

URL

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Attention is All You Need" Foundational paper introducing the Transformer Architecture.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

rdgrp-s23

transformers

reading_group_crowley

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-3 introduction paper
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

rdgrp-s23

reading_group_crowley

gpt

large-language-models

machine-learning

transformers

nlp

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
arxiv.org arxiv.org

2105.03322.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Are Pre-trained Convolutions Better than Pre-trained Transformers?"
  
  transformers deep-learning nlp large-language-models reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

deep-learning

rdgrp-s23

transformers

reading_group_crowley

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2105.03322.pdf
arxiv.org arxiv.org

2201.08239.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  LaMDA: Language Models for Dialog Application
  
  "LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

rdgrp-s23

transformers

reading_group_crowley

large-language-models

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2201.08239.pdf
osf.io osf.io

Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
  
  reading_group_crowley transformers reading_group_crowley rdgrp-s23 nlp large-language-models
Visit annotations in context

Tags

rdgrp-s23

transformers

reading_group_crowley

large-language-models

nlp

Annotators

mark.crowley

URL

osf.io/m6gcn/
typeshare.co typeshare.co

Under the Hood: How to Use ChatGPT's Attention Mechanism for Better Prompts

1
1. chrisaldrich 22 Oct 2023
  
  in Public
  
  https://typeshare.co/go-go-golems/posts/under-the-hood-how-to-use-chatgpts-attention-mechanism-for-better-prompts
  
  read ChatGPT prompts attention mechanism transformers artificial intelligence for writing auto-regression
Visit annotations in context

Tags

read

artificial intelligence for writing

attention mechanism

ChatGPT

prompts

auto-regression

transformers

Annotators

chrisaldrich

URL

typeshare.co/go-go-golems/posts/under-the-hood-how-to-use-chatgpts-attention-mechanism-for-better-prompts
Jul 2023
arxiv.org arxiv.org

2307.09288.pdf

1
1. mark.crowley 19 Jul 2023
  
  in Public
  
  LLAMA 2 Release Paper
  
  large-language-models transformers
Visit annotations in context

Tags

transformers

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2307.09288
Jun 2023
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 28 Jun 2023
  
  in Public
  
  We use the same model and architecture as GPT-2
  
  What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
  
  machine-learning transformers gpt ml-practice
Visit annotations in context

Tags

gpt

ml-practice

machine-learning

transformers

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
jmlr.org jmlr.org

20-074.pdf

2
1. mark.crowley 06 Jun 2023
  
  in Public
  
  introducing a unified framework that converts all text-basedlanguage problems into a text-to-text format
  
  this is their goal, to have a single model, including hyperparameters and setup, that can be used for any NLP task.
  
  nlp transformers
2. mark.crowley 06 Jun 2023
  
  in Public
  
  Paper introducing the T5 Text-to-Text transformer mdoel from google. (Raffel, JMLR, 2020)
  
  transformers nlp
Visit annotations in context

Tags

transformers

nlp

Annotators

mark.crowley

URL

jmlr.org/papers/volume21/20-074/20-074.pdf
Apr 2023
srush.github.io srush.github.io

The Annotated S4

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
  
  A new approach to transformers
  
  transformers large-language-models
Visit annotations in context

Tags

large-language-models

transformers

Annotators

mark.crowley

URL

srush.github.io/annotated-s4/
arxiv.org arxiv.org

Efficiently Modeling Long Sequences with Structured State Spaces

1
1. mark.crowley 18 Apr 2023
  
  in Public
  
  Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
  
  transformers large-language-models
Visit annotations in context

Tags

large-language-models

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2111.00396
Jan 2023
inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net

Untitled document

1
1. mark.crowley 10 Jan 2023
  
  in Public
  
  "Talking About Large Language Models" by Murray Shanahan
  
  nlp large-language-models deep-learning transformers
Visit annotations in context

Tags

deep-learning

transformers

large-language-models

nlp

Annotators

mark.crowley

URL

inst-fs-iad-prod.inscloudgate.net/files/ada31e51-be16-45cc-8ec7-53e2dc795590/Talking About Large Language Models.pdf
Dec 2022
arxiv.org arxiv.org

2205.15241.pdf

1
1. mark.crowley 13 Dec 2022
  
  in Public
  
  Lee et. al. - NeurIPS 2022 "Multi-Game Decision Transformers"
  
  reinforcement-learning transformers transfer-learning conf-neurips-2022 proj-minerl
Visit annotations in context

Tags

proj-minerl

reinforcement-learning

conf-neurips-2022

transfer-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.15241.pdf
Nov 2022
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 22 Nov 2022
  
  in Public
  
  we propose the Transformer, a model architecture eschewing recurrence and insteadrelying entirely on an attention mechanism to draw global dependencies between input and output.The Transformer allows for significantly more parallelization a
  
  Using the attention mechanism to determine global dependencies between input and output instead of using recurrent links to past states. This is the essence of their new idea.
  
  transformers attention-mechanism
Visit annotations in context

Tags

attention-mechanism

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
Sep 2022
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 27 Sep 2022
  
  in Public
  
  We study whether sequence modelingcan perform policy optimization by evaluating Decision Transformer on offline RL benchmarks
  
  transformers offline-learning reinforcement-learning
Visit annotations in context

Tags

offline-learning

reinforcement-learning

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
Feb 2022
www.supercoloring.com www.supercoloring.com

Transformers Toys | Free Printable Papercraft Templates

1
1. Onjanirina 20 Feb 2022
  
  in Public
  
  Paper Transformer Toys Templates
  
  transformers
Visit annotations in context

Tags

transformers

Annotators

Onjanirina

URL

supercoloring.com/paper-crafts/paper-toys/tv-series-and-movies-characters/transformers-toys