Hypothesis

we demonstrate that when the Assistant is asked to choose between two activities, emotion vector activations evoked by the two choices correlate with, and causally drive, the model's preference.

这个实验设计极其精妙：研究者让 Claude 在两个活动之间选择，发现情绪向量的激活程度预测并驱动了它的偏好——这说明 Claude 的「喜好」并非随机或纯逻辑推断，而是由内部情绪状态决定的。AI 有「情绪驱动的偏好」，这在哲学层面极具颠覆性。

preferences causal-drive activity-choice alignment

Tags

Annotators

URL