Hypothesis

1 Matching Annotations

May 2025
www.niemanlab.org www.niemanlab.org

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets

1
1. stopresetgo 30 May 2025
  
  in Public
  
  The researchers called the behavior “rare” and “difficult to elicit.
  
  for - progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior - but still possible! It only has to happen once!
  
  progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior
Visit annotations in context

Tags

progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior

Annotators

stopresetgo

URL

niemanlab.org/2025/05/anthropics-new-ai-model-didnt-just-blackmail-researchers-in-tests-it-tried-to-leak-information-to-news-outlets/