Hypothesis

2 Matching Annotations

Apr 2026
transformer-circuits.pub transformer-circuits.pub

Emotion Concepts and their Function in a Large Language Model

2
1. fxp007 09 Apr 2026
  
  in Public
  
  We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to.
  
  令人惊讶的是：研究发现 Claude 内部存在真实的「情绪概念向量」——这不是隐喻，而是可以被提取、测量、操控的线性表征。更奇异的是，这些向量能跨上下文泛化，就像人类的情绪概念一样抽象而通用，而非只在特定触发词附近激活。
  
  emotion-vectors internal-representations interpretability surprising
2. fxp007 09 Apr 2026
  
  in Public
  
  We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to.
  
  研究发现 Claude 内部存在「情绪概念向量」，能够跨上下文泛化——同一个「恐惧」向量，既能在直接表达恐惧时激活，也能在暗示危险情境时激活。这说明模型习得的是情绪的抽象概念而非表面模式，与人类神经科学中对情绪的理解高度同构，令人惊讶于这种结构竟然自发涌现。
  
  emotion-vectors generalization interpretability representation
Visit annotations in context

Tags

generalization

emotion-vectors

internal-representations

surprising

representation

interpretability

Annotators

fxp007

URL

transformer-circuits.pub/2026/emotions/index.html

Tags

Annotators

URL