Hypothesis

4 Matching Annotations

Apr 2023
ar5iv.labs.arxiv.org ar5iv.labs.arxiv.org

What learning algorithm is in-context learning? Investigations with linear models

1
1. mshook 22 Apr 2023
  
  in Public
  
  While past work has characterized what kinds of functions ICL can learn (Garg et al., 2022; Laskin et al., 2022) and the distributional properties of pretraining that can elicit in-context learning (Xie et al., 2021; Chan et al., 2022), but how ICL learns these functions has remained unclear. What learning algorithms (if any) are implementable by deep network models? Which algorithms are actually discovered in the course of training? This paper takes first steps toward answering these questions, focusing on a widely used model architecture (the transformer) and an extremely well-understood class of learning problems (linear regression).
  
  icl how algorithm transformer stanford mit linerar regression
Visit annotations in context

Tags

icl

how

regression

mit

algorithm

linerar

transformer

stanford

Annotators

mshook

URL

ar5iv.labs.arxiv.org/html/2211.15661
www.semanticscholar.org www.semanticscholar.org

[PDF] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes | Semantic Scholar

1
1. mshook 01 Apr 2023
  
  in Public
  
  a random function f
  
  a random function not many or several
  
  icl transformer gpt2 gpt function
Visit annotations in context

Tags

gpt

icl

gpt2

function

transformer

Annotators

mshook

URL

semanticscholar.org/reader/de32da8f5c6a50a6c311e9357ba16aa7d05a1bc9
Mar 2023
www.lesswrong.com www.lesswrong.com

interpreting GPT: the logit lens - LessWrong

1
1. mshook 14 Mar 2023
  
  in Public
  
  copying a rare token
  
  This is a Induction Head at work, yes?
  
  induction head transformer icl
Visit annotations in context

Tags

induction

icl

head

transformer

Annotators

mshook

URL

lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
Feb 2023
www.lesswrong.com www.lesswrong.com

Induction heads - illustrated

1
1. mshook 11 Feb 2023
  
  in Public
  
  The central object in the transformer is the residual stream.
  
  transformer architecture nlp residual icl induction
Visit annotations in context

Tags

nlp

icl

induction

architecture

transformer

residual

Annotators

mshook

URL

lesswrong.com/posts/TvrfY4c9eaGLeyDkE/induction-heads-illustrated

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL