Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.