1 Matching Annotations
- Jan 2024
-
arxiv.org arxiv.org
-
Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
-