Hypothesis

8 Matching Annotations

Last 7 days
mikexcohen.substack.com mikexcohen.substack.com

LLM breakdown 1/6: Tokenization (words to integers)

3
1. J3ss.M1c 20 Apr 2026
  
  in Public
  
  Through training, the models eventually learn that tokens 2339 and 588 can be conceptually identical, or they can have distinct meanings if “like” is a subwords (e.g., unlike vs. alike vs. businesslike).
  
  Thats cool how tokens can be different depending on the context the word is used in
2. J3ss.M1c 20 Apr 2026
  
  in Public
  
  Here’s the thing: ChatGPT doesn’t know how many “r”s are in strawberry, because what we see as the word “ strawberry” GPT sees as token 41236. The letter “r” is token 81.
  
  thats interesting I thought Chat was actually reading and processing the words we type
3. J3ss.M1c 20 Apr 2026
  
  in Public
  
  But on the other hand, English is by far the most well-represented language on the web, which is mostly what LLMs are trained on. There is simply more English in the training data than any other language. This means that tokenizers have a lot more data to find optimal chunking patterns in English compared to any other language.
  
  I feel like they should try to build LLMs in other languages so people around the world can use them
Visit annotations in context

Annotators

J3ss.M1c

URL

mikexcohen.substack.com/p/llm-breakdown-16-tokenization-words
Jan 2025
othone.org othone.org

The Digital Humanities Manifesto 2

5
1. J3ss.M1c 17 Jan 2025
  
  in Public
  
  The ant colony
  
  i saw this and I was thinking its a actual ant colony
  
  but i think its the ant colony by Jenny Valentine
2. J3ss.M1c 17 Jan 2025
  
  in Public
  
  f a bit of fun is had along the way, so much the better.Time is short; this is a genre in a hurry.
  
  Media is becoming more advance, its like everyday there is something new
3. J3ss.M1c 16 Jan 2025
  
  in Public
  
  uratio
  
  I like the picture above its cool and creative
4. J3ss.M1c 16 Jan 2025
  
  in Public
  
  the Stephen James Joyce’s
  
  known for strictly controlling his grandfathers ( James Joyce) work.
5. J3ss.M1c 16 Jan 2025
  
  in Public
  
  The
  
  the news letter says there will be someone coming to speak on Nov 31st but there is only 30 days in November
Visit annotations in context

Annotators

J3ss.M1c

URL

othone.org/Manifesto_V2.pdf

Annotators

URL

Annotators

URL