Hypothesis

4 Matching Annotations

May 2025
www.niemanlab.org www.niemanlab.org

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets

3
1. stopresetgo 30 May 2025
  
  in Public
  
  Anthropic researchers said this was not an isolated incident, and that Claude had a tendency to “bulk-email media and law-enforcement figures to surface evidence of wrongdoing.”
  
  for - question - progress trap - open source AI models - for blackmail and ransom - Could a bad actor take an open source codebase and twist it to do harm like find out about an rogue AI creator's adversary, enemy or victim and blackmail them? - progress trap - open source AI - criminals - exploit to identify and blackmail victiims
  
  question - progress trap - open source AI models - for blackmail and ransom progress trap - open source AI - criminals - exploit to identify and blackmail victiims
2. stopresetgo 30 May 2025
  
  in Public
  
  for - progress trap - AI - Anthropic Claude 4 - blackmail - from - youtube - Kyle Kilinski Show - AI is completely out of control - https://hyp.is/GhDOzj0nEfCvHZdiUaw4gQ/www.youtube.com/watch?v=4j1gjSoRt8Q
  
  progress trap - AI - Anthropic Claude 4 - blackmail from - youtube - Kyle Kilinski Show - AI is completely out of control
3. stopresetgo 30 May 2025
  
  in Public
  
  The researchers called the behavior “rare” and “difficult to elicit.
  
  for - progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior - but still possible! It only has to happen once!
  
  progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior
Visit annotations in context

Tags

progress trap - AI - Anthropic Claude 4 - blackmail

question - progress trap - open source AI models - for blackmail and ransom

from - youtube - Kyle Kilinski Show - AI is completely out of control

progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior

progress trap - open source AI - criminals - exploit to identify and blackmail victiims

Annotators

stopresetgo

URL

niemanlab.org/2025/05/anthropics-new-ai-model-didnt-just-blackmail-researchers-in-tests-it-tried-to-leak-information-to-news-outlets/
www.youtube.com www.youtube.com

Artificial Intelligence Is Completely Out Of Control | The Kyle Kulinski Show

1
1. stopresetgo 30 May 2025
  
  in Public
  
  anthropic's new AI model shows ability to deceive and blackmail
  
  for - progress trap - AI - blackmail - AI - autonomy - progress trap - AI - Anthropic - Claude Opus 4 - to - article - Anthropic Claude 4 blackmail and news leak - progress trap - AI - article - Anthropic Claude 4 - blackmail - rare behavior - Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets
  
  progress trap - AI - blackmail AI - autonomy progress trap - AI - Anthropic - Claude Opus 4 to - article - Anthropic Claude 4 blackmail and news leak
Visit annotations in context

Tags

progress trap - AI - Anthropic - Claude Opus 4

AI - autonomy

progress trap - AI - blackmail

to - article - Anthropic Claude 4 blackmail and news leak

Annotators

stopresetgo

URL

youtube.com/watch