Hypothesis

9 Matching Annotations

Dec 2025
grok.com grok.com

Karpathy's Insights on AI Misuse - Grok

9
1. harshf9 19 Dec 2025
  
  in Public
  
  , forcing explicit step-by-step chains to make prediction more reliable?
  
  is it preferred to always promp LLM to use a specefic structure to be more accurate?
2. harshf9 19 Dec 2025
  
  in Public
  
  confidently wrong in a way
  
  Yes. also, when it generates inefficient response answers to math, its still confident. and that often leads me me to doubt my own intuition.
3. harshf9 19 Dec 2025
  
  in Public
  
  The LLM predicts continuations that match those high-quality human patterns.
  
  but spotting errors in proofs? how can just predicitng patterns do that?
4. harshf9 19 Dec 2025
  
  in Public
  
  predicting sequences that humans label as correct solutions.
  
  Does that mean if there's a very new question, then the model would fail to solv? but the truth is LLMs like Gemini deepthink are still surpassing PhDs in solving them
5. harshf9 19 Dec 2025
  
  in Public
  
  advanced models produce remarkably sophisticated outputs. How might that emerge purely from prediction?
  
  The language seems to be sophisticated to humans, because the models have been trained and post-trained and tuned towards outputs which seem readable and actually more pleasing to humans. But I'm still unsure how in math, or debugging, do they generate correct, useful outputs?
6. harshf9 19 Dec 2025
  
  in Public
  
  —how might that guide your own use?
  
  BUt I am confused. I have used LLMs to extract insights/inaccuracies (in math), generate novel brainstorming questions, reviews,even generate prompts (prompt engineering). it doesnt seem they were just predicitng next word
7. harshf9 19 Dec 2025
  
  in Public
  
  general token predictors
  
  how do they solvemath then? especially yhe very advanced models now
8. harshf9 19 Dec 2025
  
  in Public
  
  Why couldn't we just build perfect detectors?
  
  LLMs are inaccurate. even if they ar accurate most of the places in the response individually, they might be wrong as a whole
9. harshf9 19 Dec 2025
  
  in Public
  
  What happens if you prompt an LLM with "What do you think about [topic]?" versus simulating a discussion among diverse experts?
  
  When we assign a generic persona, "you", to an LLM, it just randomly picks one out of a thousand or so persons it can simulate, and gives the answer. We never know how relevant or accurate the persona would be to give advice to our question.
Visit annotations in context

Annotators

harshf9

URL

grok.com/c/1be43e34-2cc1-4c2c-8d96-f21665927951

Annotators

URL