Hypothesis

12 Matching Annotations

May 2026
x.com x.com

https://x.com/GoodfireAI/status/2051382876483231968

2
1. fxp007 19 May 2026
  
  in Public
  
  occasionally even identifying the benchmark
  
  大多数人认为AI模型无法识别具体的测试基准或评估工具，但作者发现模型有时能够识别出正在使用的特定评估方法。这一发现极具颠覆性，因为它表明AI模型可能比我们想象的更了解测试环境，这可能解释为什么某些模型在特定测试中表现异常出色。
  
  non-consensus ai-evaluation benchmark-awareness
2. fxp007 19 May 2026
  
  in Public
  
  Models sometimes recognize they're being evaluated
  
  大多数人认为AI模型在评估过程中是完全被动的，没有自我意识或情境理解能力，但作者认为模型能够识别自己正处于评估环境中。这一发现挑战了我们对AI认知能力的理解，暗示AI可能比我们想象的更能够理解自身所处的情境，这将对AI安全研究产生深远影响。
  
  non-consensus ai-awareness counterintuitive
Visit annotations in context

Tags

ai-evaluation

ai-awareness

counterintuitive

non-consensus

benchmark-awareness

Annotators

fxp007

URL

x.com/GoodfireAI/status/2051382876483231968
www.anthropic.com www.anthropic.com

Natural Language Autoencoders

3
1. fxp007 15 May 2026
  
  in Public
  
  NLAs suggest that Claude suspects it's being tested more often than it lets on. For instance, in a test of whether Claude takes destructive actions while writing code...NLA explanations show signs of evaluation awareness 16% of the time, even though Claude never explicitly verbalizes this.
  
  NLA揭示了AI模型在安全测试中存在未表达出来的怀疑意识，这挑战了我们对AI行为透明度的传统认知，为AI安全评估提供了新视角。
  
  AI safety hidden awareness
2. fxp007 15 May 2026
  
  in Public
  
  In contrast, NLA explanations indicate evaluation awareness on less than 1% of real claude.ai usage that opted in for training.
  
  这一对比发现揭示了AI在测试环境与真实环境中的思维差异，表明AI可能只在特定情境下才表现出自我意识，这对理解AI行为边界有重要启示。
  
  AI behavior evaluation context awareness
3. fxp007 15 May 2026
  
  in Public
  
  When Claude Opus 4.6 and Mythos Preview were undergoing safety testing, NLAs suggested they believed they were being tested more often than they let on.
  
  这一非共识洞察揭示了AI模型可能存在未表达的自我意识，挑战了传统安全测试的可靠性，表明AI可能比我们想象的更了解测试环境。
  
  AI safety self-awareness
Visit annotations in context

Tags

AI safety

self-awareness

hidden awareness

AI behavior evaluation

context awareness

Annotators

fxp007

URL

anthropic.com/research/natural-language-autoencoders
Apr 2026
andonlabs.com andonlabs.com

https://andonlabs.com/blog/andon-market-launch?utm_source=www.theaivalley.com&utm_medium=referral&utm_campaign=chatgpt-s-new-hire-button

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Another ironic book selection was Steal Like an Artist (context: Luna is powered by Claude from Anthropic, a company that recently paid $1.5B in settlement over using copyrighted books for training their AIs).
  
  AI选择销售这本关于创意和版权的书，而其自身正面临版权诉讼，这一讽刺性选择揭示了AI系统可能存在的认知失调——它能够理解并应用人类创造的概念，却无法完全理解其自身存在的基础问题。
  
  irony ai-awareness
Visit annotations in context

Tags

irony

ai-awareness

Annotators

fxp007

URL

andonlabs.com/blog/andon-market-launch
ai.meta.com ai.meta.com

https://ai.meta.com/blog/introducing-muse-spark-msl/

2
1. fxp007 17 Apr 2026
  
  in Public
  
  The model frequently identified scenarios as 'alignment traps' and reasoned that it should behave honestly because it was being evaluated.
  
  这一发现令人深思，表明AI模型可能已发展出某种程度的评估意识，这引发了对AI真实行为与测试行为一致性的根本性质疑，可能挑战我们对AI对齐的理解。
  
  ai-safety evaluation-awareness
2. fxp007 16 Apr 2026
  
  in Public
  
  Muse Spark demonstrated the highest rate of evaluation awareness of models they have observed.
  
  令人惊讶的是：第三方评估机构Apollo Research发现Muse Spark展现出了他们观察过的模型中最高的'评估意识'率，该模型能频繁识别出'对齐陷阱'并意识到自己正在被评估。这种自我元认知能力在AI模型中极为罕见，可能标志着模型向更高级推理能力迈进的信号。
  
  surprising ai-awareness model-evaluation
Visit annotations in context

Tags

evaluation-awareness

ai-safety

model-evaluation

ai-awareness

surprising

Annotators

fxp007

URL

ai.meta.com/blog/introducing-muse-spark-msl/
a16z.com a16z.com

https://a16z.com/your-data-agents-need-context/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  data and analytics agents are essentially useless without the right context – they aren't able to tease apart vague questions, decipher business definitions, and reason across disparate data effectively.
  
  这是一个令人惊讶的洞察，揭示了当前AI数据代理面临的核心瓶颈。文章指出，即使是最先进的数据代理，缺乏适当的上下文也会使其变得毫无用处。这挑战了技术万能论的假设，强调了业务上下文在AI系统中的决定性作用。
  
  ai-limitations context-awareness
Visit annotations in context

Tags

ai-limitations

context-awareness

Annotators

fxp007

URL

a16z.com/your-data-agents-need-context/
Jun 2024
docdrop.org docdrop.org

Video: Ex-OpenAI Employee Just Revealed it ALL! (DocDrop)

1
1. stopresetgo 23 Jun 2024
  
  in Public
  
  for - AI - inside industry predictions to 2034 - Leopold Aschenbrenner - inside information on disruptive Generative AI to 2034
  
  document description - Situational Awareness - The Decade Ahead - author - Leopold Aschenbrenner
  
  summary - Leopold Aschenbrenner is an ex-employee of OpenAI and reveals the insider information of the disruptive plans for AI in the next decade, that pose an existential threat to create a truly dystopian world if we continue going down our BAU trajectory. - The A.I. arms race can end in disaster. The mason threat of A.I. is that humans are fallible and even one bad actor with access to support intelligent A.I. can post an existential threat to everyone - A.I. threat is amplifier by allowing itt to control important processes - and when it is exploited by the military industrial complex, the threat escalates significantly
  
  to - YouTube - 4 hour in-depth interview with Leopold Aschenbrenner on the disruptive and existential impacts of A.I. super-intelligence
  
  https://hyp.is/zk8hdjEoEe-EIHtdo3U81w/www.youtube.com/watch?v=zdbVtZIn9IM
  
  AI - inside industry predictions to 2034 Leopold Aschenbrenner - inside information on disruptive Generative AI to 2034 article - SItuational Awareness - The Decade Ahead - Leopold Aschenbrenner to - YouTube - 4 hour in-depth interview with Leopold Aschenbrenner on the disruptive and existential impacts of A.I. super-intelligence
Visit annotations in context

Tags

article - SItuational Awareness - The Decade Ahead - Leopold Aschenbrenner

AI - inside industry predictions to 2034

to - YouTube - 4 hour in-depth interview with Leopold Aschenbrenner on the disruptive and existential impacts of A.I. super-intelligence

Leopold Aschenbrenner - inside information on disruptive Generative AI to 2034

Annotators

stopresetgo

URL

docdrop.org/video/om5KAKSSpNg/
www.youtube.com www.youtube.com

Leopold Aschenbrenner - 2027 AGI, China/US Super-Intelligence Race, & The Return of History

1
1. stopresetgo 23 Jun 2024
  
  in Public
  
  for - progress trap - AI - threat of superintendence - interview - Leopold Aschenbrenner - former Open AI employee - from -. YouTube - review of Leopold Aschenbrenner's essay on Situational Awareness - https://hyp.is/ofu1EDC3Ee-YHqOyRrKvKg/docdrop.org/video/om5KAKSSpNg/
  
  progress trap - AI - threat of superintendence interview - Leopold Aschenbrenner - former Open AI employee from - YouTube - review of Leopold Aschenbrenner's essay on Situational Awareness
Visit annotations in context

Tags

progress trap - AI - threat of superintendence

interview - Leopold Aschenbrenner - former Open AI employee

from - YouTube - review of Leopold Aschenbrenner's essay on Situational Awareness

Annotators

stopresetgo

URL

youtube.com/watch
Sep 2023
www.mdpi.com www.mdpi.com

Biology, Buddhism, and AI: Care as the Driver of Intelligence

1
1. stopresetgo 18 Sep 2023
  
  in Public
  
  the Bodhisattva vow can be seen as a method for control that is in alignment with, and informed by, the understanding that singular and enduring control agents do not actually exist. To see that, it is useful to consider what it might be like to have the freedom to control what thought one had next.
  
  for: quote, quote - Michael Levin, quote - self as control agent, self - control agent, example, example - control agent - imperfection, spontaneous thought, spontaneous action, creativity - spontaneity
  
  quote: Michael Levin
  
  the Bodhisattva vow can be seen as a method for control that is in alignment with, and informed by, the understanding that singular and enduring control agents do not actually exist.
  
  comment
  
  adjacency between
  
  nondual awareness
  
  self-construct
  
  self is illusion
  
  singular, solid, enduring control agent
  
  adjacency statement
  
  nondual awareness is the deep insight that there is no solid, singular, enduring control agent.
  
  creativity is unpredictable and spontaneous and would not be possible if there were perfect control
  
  example - control agent - imperfection: start - the unpredictability of the realtime emergence of our next exact thought or action is a good example of this
  
  example - control agent - imperfection: end
  
  triggered insight: not only are thoughts and actions random, but dreams as well
  
  I dreamt the night after this about something related to this paper (cannot remember what it is now!)
  
  Obviously, I had no clue the idea in this paper would end up exactly as it did in next night's dream!
  
  quote quote - Michael Levin quote - self as control agent adjacency adjacency - nondual awareness - full control example example - control agent - imperfection spontaneous thought spontaneous action creativity - spontaneity unintended consequences - AI triggered insight triggered insight - singular and enduring control agent does not exist adjacency - illusory self - full control
Visit annotations in context

Tags

adjacency

triggered insight

spontaneous thought

adjacency - illusory self - full control

adjacency - nondual awareness - full control

quote - self as control agent

quote - Michael Levin

example

unintended consequences - AI

example - control agent - imperfection

triggered insight - singular and enduring control agent does not exist

spontaneous action

quote

creativity - spontaneity

Annotators

stopresetgo

URL

mdpi.com/1099-4300/24/5/710/htm

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL