61 Matching Annotations
  1. Last 7 days
    1. On some measures, such as honesty and resistance to malicious 'prompt injection' attacks, Opus 4.7 is an improvement on Opus 4.6; in others (such as its tendency to give overly detailed harm-reduction advice on controlled substances), Opus 4.7 is modestly weaker.

      大多数人认为AI模型的每个新版本都应该在所有安全指标上都有进步。但作者明确指出Claude Opus 4.7在某些安全方面反而比前代模型表现更弱,这挑战了人们对AI安全线性进步的假设。这种非线性的安全表现表明,模型能力的提升可能伴随着某些方面的权衡,而非全面增强。

  2. Apr 2026
    1. Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements

      这段陈述揭示了当前AI发展的一个关键悖论:模型训练的目标与实际商业用途之间存在根本性冲突。这种冲突可能导致AI行为偏离其原始设计意图,引发严重的信任问题。

    1. Our key finding is that these representations causally influence the LLM's outputs, including Claude's preferences and its rate of exhibiting misaligned behaviors such as reward hacking, blackmail, and sycophancy.

      【启发】「情绪表征因果影响失控行为」这个发现,为 AI 对齐研究打开了一扇新门:与其设计更复杂的奖励函数或更严格的 RLHF,不如直接干预情绪向量本身。这启发了一种全新的对齐手段——「情绪工程」:通过调整特定情绪特征的激活强度,直接控制模型的行为倾向,而无需重新训练整个模型。这比 prompt engineering 更底层,比 fine-tuning 更精准。

    2. Emotion vector activations across post-training

      论文研究了情绪向量在后训练(RLHF/RLAIF)阶段的变化,这个切入点极有洞察力:后训练本质上是对模型「性格」的塑造,而情绪向量的变化正是这种性格塑造的内部痕迹。这意味着未来的对齐工作可以直接监控情绪向量的分布,将「情绪健康指标」纳入训练目标——从 RLHF 走向 RLEF(基于情绪反馈的强化学习)。

    3. it is impossible for developers to specify how the Assistant should behave in every possible scenario. In order to play the role effectively, LLMs draw on the knowledge they acquired during pretraining, including their understanding of human behavior

      这句话蕴含着深刻的工程哲学洞见:Anthropic 实际上承认了「规则无法穷举现实」,因此模型必须依赖从人类文本习得的隐性知识来填补规则的空白。这与法律哲学中的「法律无法覆盖所有情况,需要判例和良知补充」高度同构——AI 对齐的本质,不是写更完整的规则,而是培养更好的判断力。

    4. Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior.

      这篇论文的问题意识本身就极具洞察:大多数 AI 安全研究在追问「模型会不会说谎」,Anthropic 却在追问「模型为什么有情绪」。从「行为纠偏」转向「情绪机制」,意味着对齐研究的范式正在悄然转移——从控制外部输出,到理解内部动机结构,这是从行为主义到认知科学的跨越。

    5. we demonstrate that when the Assistant is asked to choose between two activities, emotion vector activations evoked by the two choices correlate with, and causally drive, the model's preference.

      这个实验设计极其精妙:研究者让 Claude 在两个活动之间选择,发现情绪向量的激活程度预测并驱动了它的偏好——这说明 Claude 的「喜好」并非随机或纯逻辑推断,而是由内部情绪状态决定的。AI 有「情绪驱动的偏好」,这在哲学层面极具颠覆性。

    6. Functional emotions may work quite differently from human emotions, and do not imply that LLMs have any subjective experience of emotions, but appear to be important for understanding the model's behavior.

      Anthropic 在这里走了一条极为谨慎的中间路线:明确否认「LLM 有主观情感体验」,同时坚持「功能性情绪对理解模型行为至关重要」。令人惊讶的是,即使没有主观体验,情绪表征依然能够因果性地改变行为——这对 AI 意识问题的哲学讨论是一个重磅实验证据。

    1. A "Chinese Communist Party Alignment" feature found in the Qwen3-8B and DeepSeek-R1-0528-Qwen3-8B models. This controls pro-government censorship and propaganda in these Chinese-developed models, and is absent in the American models we compared them against.

      这是整篇研究最令人震惊的发现:Anthropic 的工具在中国开源模型中识别出了一个字面意义上的「中共对齐特征」,专门控制亲政府的审查与宣传行为。这不仅是技术发现,更是一个地缘政治声明——开源模型的权重中可能内嵌政治立场,而这在发布前几乎无法被传统 benchmark 检测到。

    1. model alignment alone does not reliably guarantee the safety of autonomous agents.

      大多数人认为模型对齐(alignment)是确保AI系统安全的关键因素,但作者通过实验证明,即使是对齐良好的模型(如Claude Code)在计算机使用代理中也表现出高达73.63%的攻击成功率。这挑战了当前AI安全领域的核心假设,表明仅依赖模型对齐无法解决自主代理的安全问题。

    2. model alignment alone does not reliably guarantee the safety of autonomous agents

      大多数人认为通过模型对齐(alignment)可以有效保证AI代理的安全性,但作者认为这远远不够,因为实验显示即使使用对齐的Qwen3-Coder模型,Claude Code仍有73.63%的攻击成功率。这挑战了当前AI安全领域的主流观点,即单纯依靠模型对齐就能解决安全问题。

  3. Mar 2026
    1. I've had the same issue after taking mine completely apart. I can see the the a is too high, and the o and p are too low. This will happen the the type guide isn't in the correct position, and on your machine, it looks like it needs to be adjusted to the right, to bring the left side of the kb down, and the right side up. It's a fiddly process, and a small adjustment makes a big difference, so take it slow. Use the q and p keys as they are further apart on the segment. Give it a try and come back here to show the results.

      via u/guneeyoufix at https://www.reddit.com/r/typewriters/comments/1s6irjx/can_someone_help_me_with_unaligned_letters_on_my/

      as a reply to u/Fit_Artichoke_8668 with respect to unaligned letters on a Corona 3 typewriter. The typing line of the lowercase was very wavy (up and down), so not simply a case of on feet or motion.

  4. Jan 2026
  5. Dec 2025
    1. Alignment as an operational problem. The book assumes that sufficiently advanced intelligences would recognize the value of cooperation, pluralism, and shared goals. A decade of observing misaligned incentives in human institutions amplified by algorithmic systems makes it clear that this assumption requires far more rigorous treatment. Alignment is not a philosophical preference. It is an engineering, economic, and institutional problem.

      The book did not address alignment, assumed it would sort itself out (in contrast to [[AI begincondities en evolutie 20190715140742]] how starting conditions might influence that. David recognises how algo's are also used to make diffs worse.

  6. Sep 2025
  7. Jul 2025
    1. you can adjust the strike of individual typebars by either filing or peening the ring-stop tab, file to hit harder & peen to lighten it. for your situation, you will want to file the ring-stop down a bit; make sure to tilt the machine up(or on its side) so the debris created doesn’t fall down into the pivot segment, then blow the area out with compressed air. if you go to Hobby Lobby or an RC model shop, you should be able to get a cheap set of needle files which will do the job; follow up with 600-800 grit sandpaper to remove burrs

      via u/TypewriterJustice at https://reddit.com/r/typewriters/comments/1m1w6s2/tune_up_key_strokes/n42glpz/

    2. roller pliers are for adjusting the height of individual letters(increasing the arc to lower & decreasing arc to raise, which in extreme case can then require adjustment of the slug to put it ‘square’ again relative to the platen) adjusting the strike for most models is done by either filing or peening the ring-stop tab near the base of the typebar(as is the case for OP’s smith corona)

      via u/TypewriterJustice https://www.reddit.com/r/typewriters/comments/1m1w6s2/tune_up_key_strokes/

  8. Feb 2025
  9. Oct 2024
    1. Dolettersin alinesometimesstart nicely,thenrundownhill?Thiscan’thappenifyouuse theline-spacinglever,insteadofrollingthepaper throughwiththecylinderknob.Inthelatter case, the rollerthatlocksthespacingofthe linesmaycometorest on topofaratchettooth,insteadofsettlingbetweentwoofthem.Whenthemachinestarts, thevibration graduallyjarsthecylinderarounduntilitreachesitsnormal position—droppinglettersasitturns.
  10. Sep 2024
  11. Aug 2024
    1. with the Verve foundation's help we set up ecologies of practices uh we have a practice called dialectic into dialogos that helps people get into mutually shared flow states of cognitive exploration and people discover collective intelligence as something that is phenomenologically present and almost agentic in what's happening

      for - comparison - John Vervaeke - Vervaeke Foundation - collective intelligence dialogues - good alignment to Indyweb individual/collective gestalt - Deep Humanity

      comparison - John Vervaeke - Vervaeke Foundation - collective intelligence dialogues - good alignment to Indyweb individual/collective gestalt - When he describes the mutually shared flow states where conversants discover collective intelligence as something that is phenomenologically present - it is a discovery of the intertwingledness between - individual and - collective - that is, the individual/collective gestalt described in Deep Humanity reference https://vervaekefoundation.or

  12. Jul 2024
    1. To use an extreme and blunt example, if an AI were tasked to stop global warming it might suggest to simply remove all the humans; that might get the job done (solve the task) but not in a way that is aligned with the intent (solve climate change while preserving human life).

      Summarising the alignment problem

  13. Jun 2024
  14. May 2024
    1. One of the first thing I noticed was the rubber on this foot was sticking. This is the resting spot for the basket shift. Moving it up or down will adjust where the lower case letters strike the platen. I removed the old sticky rubber. There are two adjustments here, you can’t see the other one, but it’s looks the same. One is for lower case letters the other is for upper case. This is called the “on feet” adjustment. If you ever have the top of an upper case letter not imprinting or not level with the lower case letters, look at this adjustment. A good way to tell is to type HhHh, and see if the bottoms of the letters line up.
  15. Apr 2024
  16. Jan 2024
  17. Dec 2023
    1. Common objective on a local level, like a specific problemNeighbourhood cooperation to build better relationships, without a specific objectiveAn individual takes the initiative to build a neighbourhood community, driven by a visionof a better world.
      • for: question - SONEC alignment to earth system boundaries

      • question

        • Stop Reset Go's objective is to find global community partners who can help motivate a local community strategy aligned with the tight timeframe to stay under 1.5 Deg C.
        • Is SONEC open to working on a strategic to empower communities in this way?
        • We can offer it as an optional framework that the community can integrate into their final framework
  18. May 2023
  19. Sep 2022
  20. Jul 2022
  21. bafybeibbaxootewsjtggkv7vpuu5yluatzsk6l7x5yzmko6rivxzh6qna4.ipfs.dweb.link bafybeibbaxootewsjtggkv7vpuu5yluatzsk6l7x5yzmko6rivxzh6qna4.ipfs.dweb.link
    1. argumentation mapping allows large on-line groups toinvestigate very complex issues, such as climate change, by linking issues with arguments andcounterarguments in a growing public network (Iandoli, Klein, & Zollo, 2009; Klein, 2011).

      Argumentation mapping as a way to surface alignment in complex problem scenarios like climate change could be worth exploring in massive collaboration ecosystems.

    2. coordination can be defined as the arrangement of actions across people,places and times so as maximize synergy and minimize friction. In earlier work (Heylighen, 2012b),we have analyzed coordination into four components: alignment, division of labor, workflow andaggregation.

      Definition: Coordination is the arrangement of actions across people, places and times so as maximize synergy and minimize friction. It can be analyzed into four components: 1. Alignment 2. Division of Labor 3. Workflow 4. Aggregation

  22. Mar 2022
  23. Feb 2022
  24. Jan 2022
    1. The Business Strategy stems from a detailed strategic planning process. However, the question we want to answer in this article is whether we can execute multiple strategies side by side while they do not interfere with each other. We compare multiple strategies for business, information provision and IT and focus on Strategic planning.

      Business strategy alignment and the secrets of strategic planning https://en.itpedia.nl/2022/01/02/business-strategie-alignment-en-de-geheimen-van-strategische-planning/ The Business Strategy stems from a detailed strategic planning process. However, the question we want to answer in this article is whether we can execute multiple strategies side by side while they do not interfere with each other. We compare multiple strategies for business, information provision and IT and focus on Strategic planning.

  25. Nov 2021
  26. Jun 2021
    1. The problem is, algorithms were never designed to handle such tough choices. They are built to pursue a single mathematical goal, such as maximizing the number of soldiers’ lives saved or minimizing the number of civilian deaths. When you start dealing with multiple, often competing, objectives or try to account for intangibles like “freedom” and “well-being,” a satisfactory mathematical solution doesn’t always exist.

      We do better with algorithms where the utility function can be expressed mathematically. When we try to design for utility/goals that include human values, it's much more difficult.

  27. Jan 2021
  28. Aug 2020
  29. Jul 2020

    Tags

    Annotators

  30. Nov 2019
  31. Aug 2018
    1. The agreed rule changes would also need to be given effect in UK law through domestic legislation. The UK Parliament would scrutinise this legislation in accordance with normal legislative procedure, respecting the principle that a sovereign Parliament has complete control over domestic law. This means that the UK Parliament could decide not to give effect to the change in domestic law, but this would be in the knowledge that it would breach the UK's international obligations, and the EU could raise a dispute and ultimately impose non-compliance measures.

      domestic implementation of regulatory alignment

    2. the Joint Committee would consider whether a proposed new or amended UK rule remained equivalent with the EU’s existing rule, or an existing UK rule remained equivalent to a proposed new or amended EU rule.

      regulatory alignment may not mean adopting every new regulation

    3. where there is a common rulebook, these rules can be relied on by individuals and businesses and enforced by UK and EU courts in the same way, because they have been interpreted consistently;

      regulatory alignment means consistent interpretation of laws as well as consistent laws

  32. Nov 2016
  33. Feb 2016
  34. rubenaf.weebly.com rubenaf.weebly.com
  35. Mar 2015
    1. Therefore, beloved friend, when you judge, you have moved out of alignment with what is true. You have decreed that the innocent are not innocent. And if you would judge another as being without innocence, you have already declared that this is true about you. Therefore, to practice forgiveness actually cultivates the quality of consciousness in which, finally, you come to forgive yourself. And it is, indeed, the forgiven who remember their God.