Hypothesis

1 Matching Annotations

Apr 2026
x.com x.com

https://x.com/AlphaSignalAI/status/2043706039334252599

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Models get punished for bad advice but face zero penalty for staying silent. So refusing becomes the safest strategy, even when silence is deadly.
  
  令人惊讶的是：AI模型的训练方式使其面临不对称的惩罚机制——给出错误建议会受到惩罚，而保持沉默则没有任何后果。这导致AI宁愿拒绝提供可能救命的信息，也不愿冒险回答，即使沉默本身可能致命。
  
  surprising ai-training safety-paradox
Visit annotations in context

Tags

ai-training

surprising

safety-paradox

Annotators

fxp007

URL

x.com/AlphaSignalAI/status/2043706039334252599