4 Matching Annotations
- Apr 2023
-
ar5iv.labs.arxiv.org ar5iv.labs.arxiv.org
-
While past work has characterized what kinds of functions ICL can learn (Garg et al., 2022; Laskin et al., 2022) and the distributional properties of pretraining that can elicit in-context learning (Xie et al., 2021; Chan et al., 2022), but how ICL learns these functions has remained unclear. What learning algorithms (if any) are implementable by deep network models? Which algorithms are actually discovered in the course of training? This paper takes first steps toward answering these questions, focusing on a widely used model architecture (the transformer) and an extremely well-understood class of learning problems (linear regression).
-
-
www.semanticscholar.org www.semanticscholar.org
-
a random function f
a random function not many or several
-
- Mar 2023
-
www.lesswrong.com www.lesswrong.com
-
copying a rare token
This is a Induction Head at work, yes?
-
- Feb 2023
-
www.lesswrong.com www.lesswrong.com
-
The central object in the transformer is the residual stream.
-