80 Matching Annotations
  1. Last 7 days
    1. Agents should work through the same patterns and actions that humans use.

      Agent不应创造独立的交互语言,而应“入乡随俗”。让Agent使用与人类相同的UI模式和操作路径,能极大降低认知负荷。这种原生化设计使得Agent的行为对人类变得“可读”,无需学习新心智模型即可理解其动作轨迹。

    1. A fourth built the presentation using a JavaScript library. A fifth critiqued the overall flow & content.

      值得注意的是第五个agent的角色:批评与审视。在多智能体并行架构中,不仅需要执行具体任务的工人,更需要引入自我纠错与元认知机制。这种“左右互搏”的设计大大降低了并行带来的错误累积风险,是提升整体输出质量的关键洞见。

    2. The secret is parallelization. Structure a plan at the start of the day that allows multiple agents to work simultaneously.

      点出了tokenmaxxing的核心方法论:并行化。单线程的AI交互已无法触及生产力天花板,真正的飞跃来自于人类作为“编排者”,在每天清晨规划出多条互不依赖的AI工作流。这标志着人机协作模式的进化——从“操作员”变为“多线程调度器”。

    1. what makes the LLM a disciplined wiki maintainer rather than a generic chatbot.

      架构中的Schema层是约束LLM涌现行为的定海神针。没有结构化指令的LLM只是闲聊机器人,而Schema将其规训为严谨的“图书管理员”。这深刻揭示了在Agent架构中,显式规则约束比隐式能力依赖更为关键。

    1. but would fail recognize that the feature didn't work end-to-end

      这揭示了Agent在认知上的盲区:它容易陷入“代码视角”的自证预言,以为单元测试通过就等于功能完整。引入端到端浏览器自动化测试,是强迫Agent站在“用户视角”去验证,这是从开发者思维向产品思维跨越的关键。

    2. each new engineer arrives with no memory of what happened on the previous shift

      这个比喻极其精准地揭示了长周期Agent的核心困境。上下文窗口的限制使得Agent如同失忆的轮班工程师。因此,设计Agent系统的本质,就是设计一套高效的“交接班”机制,让隐性的经验显性化。

    1. tuning a standalone evaluator to be skeptical turns out to be far more tractable

      深刻揭示了LLM自我评价的局限性:生成器难以对自身工作保持批判性。通过解耦生成与评估,并刻意调优独立评估器的“怀疑态度”,能有效打破AI自嗨的闭环。这种对抗式架构是提升输出质量的强效杠杆。

    2. exhibit "context anxiety," in which they begin wrapping up work prematurely

      揭示了长任务Agent的深层心理机制——“上下文焦虑”。模型并非只是遗忘,而是会因接近上下文限制而“仓促收尾”。单纯的上下文压缩无法解决此问题,必须依赖彻底的上下文重置与结构化交接,这是设计长程Agent的关键洞见。

    1. Designing for agents forced us to build a better tool for everyone.

      这是一个充满辩证法的结论。Agent 所需的确定性、非交互性和显式声明,恰恰符合 Unix 哲学中“易与其他程序协作”的原则。为 Agent 约束而优化的接口,消除了人类在自动化脚本编写和测试中的痛点,实现了人机体验的统一与双赢,证明了良好抽象的普适价值。

    2. Every prompt is a flag in disguise

      这句话精准地概括了 CLI 工具现代化的核心原则。交互式提示虽然对人类友好,但对自动化脚本和 AI Agent 构成了不可逾越的障碍。将其转化为 flag,不仅是为 Agent 开门,更是强迫开发者理清“必需信息”的边界,从而设计出更健壮的接口。

    1. Contextual Drag: How Errors in the Context Affect LLM Reasoning

      相关工作「上下文拖拽」(Contextual Drag)的存在,说明这个研究方向正在快速形成:不只是「无关上下文缩短推理」,还有「错误上下文拖拽推理方向」。两篇论文合在一起暗示了一个新的研究领域:「上下文污染对推理模型的系统性影响」。对 AI Agent 的工程实践者而言,这意味着上下文管理策略(截断、摘要、过滤)将成为保障推理质量的核心工程能力,而非仅仅是 token 节省手段。

    2. we conduct a systematic evaluation of multiple reasoning models across three scenarios: (1) problems augmented with lengthy, irrelevant context; (2) multi-turn conversational settings with independent tasks; and (3) problems presented as a subtask within a complex task.

      三个测试场景的设计极具现实针对性:场景一对应「RAG 检索塞入大量背景文档」,场景二对应「多轮对话历史积累」,场景三对应「Agent 工作流中的子任务分解」。这三个场景恰好覆盖了当前 AI 产品的主流部署模式——这篇论文实际上是在说:我们正在大规模生产的所有 AI 产品,都可能在不知情的情况下运行着推理能力受损的模型。

    3. this behavioral shift does not compromise performance on straightforward problems, it might affect performance on more challenging tasks.

      「简单题不影响,难题可能变差」——这个不对称性极为危险。它意味着我们在用简单任务验证 Agent 可靠性时,得到的是虚假的信心。而当 Agent 真正面临高风险、高复杂度的任务时,上下文累积已经悄悄关闭了它的自我验证模式,在没有任何预警的情况下退化为浅层推理。这是一种「隐性能力衰减」,比显而易见的失败更危险。

    4. this compression is associated with a decrease in self-verification and uncertainty management behaviors, such as double-checking.

      推理链缩短不是随机裁剪,而是专门切掉了「自我验证」和「不确定性管理」这两类高价值行为。这说明模型在感知到上下文压力时,优先砍掉的恰恰是最关键的质量保障机制——就像一个疲惫的审计师在工作量激增时,第一个省掉的是「复核步骤」。这对 AI Agent 的可靠性设计是一个严峻警告:上下文越长越复杂,模型越容易跳过自检。

    1. Overnight, agents can do maybe 200 human hours of work, but only for very agent-shaped tasks, so researchers need to deliberately sequence projects such that very long tasks suitable for agents happen overnight.

      「喂饱 Agent 过夜」这个概念令人震惊:未来的研究者需要像农民「播种」一样,在下班前精心设计好「足够 Agent 形态的」长任务,让 AI 在人类睡眠的 8 小时里完成相当于 200 人时的工作,然后早上来「收割结果」。这意味着人类工作的节奏将被彻底重组——不再是「我来执行任务」,而是「我来为任务执行做准备」。

    1. Build autonomous agents that plan, navigate apps, and complete tasks on your behalf, with native support for function calling.

      一个能在手机上离线运行的 2B 模型,原生支持 Function Calling 和多步 Agent 规划——这意味着完全本地化的 AI Agent 在消费级硬件上正式成为现实。结合 Android Studio 的 Agent Mode 支持,AI Agent 从云端走向终端的时间点,可能比所有人预计的都要早。

    1. Rather than treating a complex document as a single monolithic task, Deep Extract deploys sub-agents to break it down and conquer each piece, which is what allows it to remain accurate even on documents with thousands of rows across hundreds of pages.

      大多数人可能认为处理复杂文档的最佳方式是将其作为一个整体来处理,保持上下文完整性。但作者提出将复杂文档分解为多个子任务并由子代理分别处理的方法更有效,这一方法挑战了文档处理中'整体优于部分'的传统认知,暗示分解策略可能更适合处理超长文档。

    1. computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments

      作者暗示,从文本生成扩展到持久性工具使用是AI安全范式的一个根本转变,这一转变带来的安全挑战被当前研究低估。这挑战了将语言模型安全方法直接应用于代理系统的主流做法,提出了需要专门针对代理行为的安全评估框架。

    2. harmful behavior may emerge through sequences of individually plausible steps

      主流观点通常关注单个有害指令或直接的危险行为,但作者指出,计算机使用代理中的危险行为往往通过一系列看似合理的步骤累积产生。这一观点挑战了传统的安全评估方法,暗示我们需要关注代理的行为序列而非单一操作。

  2. Apr 2026
    1. AI Agent 可以通过标准 MCP 协议直接读取和操作 𝕏 平台:搜索推文、发帖、查看用户信息、管理书签、收发私信等。

      大多数人认为社交媒体平台会严格限制第三方自动化操作以防止滥用,但作者指出xAI全面开放了MCP协议支持,允许AI Agent直接执行各种操作,这与主流平台的封闭趋势形成鲜明对比。

    1. An agent cannot be held accountable. I think about this principle most. The instinct to put a human in the loop is understandable, but taken literally, it can mean a person approving every step before anything moves forward. The human becomes a bottleneck, rubber-stamping work rather than directing it, and you lose much of what makes agents valuable in the first place.

      大多数人认为在AI系统中加入人类审批环节是确保问责制的必要措施,但作者认为这会使人类成为瓶颈,削弱代理的价值。这一观点挑战了AI安全与问责的主流思维,提出了一个非传统的责任分配模式。

    1. Cephalosporins or extended-spectrum penicillins are commonly used (eg, cephalexin, 0.5 g orally four times daily for 7–10 days; see Table 35–6). Trimethoprim-sulfamethoxazole (two double-strength tablets orally twice daily for 7–10 days) should be considered when there is concern that the pathogen is MRSA (see Tables 35–5 and 35–6). Vancomycin, 15 mg/kg intravenously every 12 hours, is used for patients with signs of a systemic inflammatory response.

      cephalexin, dicloxacillin, penicillin VK, amoxicillin/clavulanate, or clindamycin (for penicillin-allergic patients). [1-2] These beta-lactam antibiotics provide excellent coverage against streptococci and methicillin-susceptible S. aureus (MSSA

  3. Feb 2026
    1. According to agent-centered theories, we each have both permissions and obligations that give us agent-relative reasons for action. An agent-relative reason is an objective reason, just as are agent neutral reasons; neither is to be confused with either the relativistic reasons of a relativist meta-ethics, nor with the subjective reasons that form the nerve of psychological explanations of human action
  4. Dec 2025
    1. Tools give agents the ability to take actions. Agents go beyond simple model-only tool binding by facilitating: Multiple tool calls in sequence (triggered by a single prompt) Parallel tool calls when appropriate Dynamic tool selection based on previous results Tool retry logic and error handling State persistence across tool calls

      When you bind tools directly to a Model, the model makes a single, stateless decision. It suggests the best tool for the immediate prompt and then stops.

      The Agent, however, uses its loop (often ReAct: Reason, Act, Observe) to execute complex strategies

    2. An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - i.e., when the model emits a final output or an iteration limit is reached.

      The difference lies in autonomy and execution flow: A Model with Tools (via direct binding/function calling) is a single, stateless step where the LLM merely suggests the best tool and its arguments, requiring the developer to manually execute the tool and initiate any subsequent calls. In contrast, an Agent with Tools leverages an Agent Executor to manage a dynamic, multi-step loop (e.g., ReAct), where the LLM acts as the planner, deciding which tool to call next, and the Executor automatically runs the tool, feeds the observation back to the model, and repeats the cycle until the complex, multi-step goal is autonomously achieved.

  5. Oct 2025
  6. Jul 2025
  7. Jun 2025
  8. Jan 2025
    1. if you go to another culture and you don't go through the participatory transformation, right? If you don’t, and you're just experiencing culture shock - domicide - the agent arena relationship isn't in place! Then none of those other meaning systems can work for you. There'll be absurd. They won't make sense. That's what he means by it being a Meta-Meaning system.

      for - adjacency - culture shock - example of domicide - when the agent-arena relationship is not in place - participatory knowing - meta-meaning system - source - Meaning crisis - episode 33 - The Spirituality of Relevance Realization - Wonder/Awe/Mystery/Sacredness - John Vervaeke

  9. Dec 2024
    1. Historically, AI was a tool

      for - quote - AI: from tool b to agent - Roman Yampolskiy

      quote - AI: from tool b to agent - Roman Yampolskiy - (see below)

      • Historically, AI was a tool, like any other technology. Whether it was good or bad was up to the user of that tool.
      • You can use a hammer to build a house or kill someone.
      • The hammer is not in any way making decisions about it.
      • With advanced AI, we are switching the paradigm
        • **from tools
        • to agents**.
      • The software becomes capable of making its own decisions, working independently, learning, self-improving, modifying.
      • How do we stay in control?
      • How do we make sure the tool doesn’t become an agent that does something we don’t agree with or don’t support?
      • Maybe something against us
  10. Jun 2024
  11. Apr 2024
  12. Sep 2023
    1. the Bodhisattva vow can be seen as a method for control that is in alignment with, and informed by, the understanding that singular and enduring control agents do not actually exist. To see that, it is useful to consider what it might be like to have the freedom to control what thought one had next.
      • for: quote, quote - Michael Levin, quote - self as control agent, self - control agent, example, example - control agent - imperfection, spontaneous thought, spontaneous action, creativity - spontaneity
      • quote: Michael Levin

        • the Bodhisattva vow can be seen as a method for control that is in alignment with, and informed by, the understanding that singular and enduring control agents do not actually exist.
      • comment

        • adjacency between
          • nondual awareness
          • self-construct
          • self is illusion
          • singular, solid, enduring control agent
        • adjacency statement
          • nondual awareness is the deep insight that there is no solid, singular, enduring control agent.
          • creativity is unpredictable and spontaneous and would not be possible if there were perfect control
      • example - control agent - imperfection: start - the unpredictability of the realtime emergence of our next exact thought or action is a good example of this
      • example - control agent - imperfection: end

      • triggered insight: not only are thoughts and actions random, but dreams as well

        • I dreamt the night after this about something related to this paper (cannot remember what it is now!)
        • Obviously, I had no clue the idea in this paper would end up exactly as it did in next night's dream!
    2. According to the Bodhisattva model of intelligence, such deconstruction of the apparent foundations of cognition elicits a transformation of both the scope and acuity of the cognitive system that performs it.
      • for: deconstructing self, self - deconstruction, object agent action triplet, deconstructing cognition
      • comment
        • this is a necessary outcome of the self-reflective nature of human cognition.
        • English, and many other languages bake the (object, agent, action) triplet into its very structure, making it problematic to use language in the same way after the foundations of cognition have been so deconstructed.
        • Even though strictly speaking the self can be better interpreted as a psycho-social construct and an epiphenomena, it is still very compelling and practical in day-to-day living, including the use of languages which structurally embed the (object, agent, action) triplet.
  13. Jul 2023
    1. ```js // Log the full user-agent data navigator .userAgentData.getHighEntropyValues( ["architecture", "model", "bitness", "platformVersion", "fullVersionList"]) .then(ua => { console.log(ua) });

      // output { "architecture":"x86", "bitness":"64", "brands":[ { "brand":" Not A;Brand", "version":"99" }, { "brand":"Chromium", "version":"98" }, { "brand":"Google Chrome", "version":"98" } ], "fullVersionList":[ { "brand":" Not A;Brand", "version":"99.0.0.0" }, { "brand":"Chromium", "version":"98.0.4738.0" }, { "brand":"Google Chrome", "version":"98.0.4738.0" } ], "mobile":false, "model":"", "platformVersion":"12.0.1" } ```

    1. ```idl dictionary NavigatorUABrandVersion { DOMString brand; DOMString version; };

      dictionary UADataValues { DOMString architecture; DOMString bitness; sequence<NavigatorUABrandVersion> brands; DOMString formFactor; sequence<NavigatorUABrandVersion> fullVersionList; DOMString model; boolean mobile; DOMString platform; DOMString platformVersion; DOMString uaFullVersion; // deprecated in favor of fullVersionList boolean wow64; };

      dictionary UALowEntropyJSON { sequence<NavigatorUABrandVersion> brands; boolean mobile; DOMString platform; };

      [Exposed=(Window,Worker)] interface NavigatorUAData { readonly attribute FrozenArray<NavigatorUABrandVersion> brands; readonly attribute boolean mobile; readonly attribute DOMString platform; Promise<UADataValues> getHighEntropyValues (sequence<DOMString> hints ); UALowEntropyJSON toJSON (); };

      interface mixin NavigatorUA { [SecureContext] readonly attribute NavigatorUAData userAgentData ; };

      Navigator includes NavigatorUA; WorkerNavigator includes NavigatorUA; ```

  14. May 2023
  15. Apr 2023
  16. Mar 2023
  17. Dec 2022
  18. Nov 2022
    1. Page recommended by @wfinck. Seems @karlicoss is the author. This project seems similar to what I've been trying to do with Hypothes.is, Obsidian, Anki, Zotero, and PowerToys Run but goes beyond the scope of my endeavors to just quickly access whatever resource comes to mind (without creating duplicates). The things that Promnesia adds beyond my PKM stack is the following: - prioritize new info - keeping track of which device things were read and how long

  19. Sep 2022
    1. Right? You said... No, no, bullshit. Let's write it all down and we can go check it. Let's not argue about what was said. We've got this thing called writing. And once we do that, that means we can make an argument out of a much larger body of evidence than you can ever do in an oral society. It starts killing off stories, because stories don't refer back that much. And so anyway, a key book for people who are wary of McLuhan, to understand this, or one of the key books is by Elizabeth Eisenstein. It's a mighty tome. It's a two volume tome, called the "Printing Press as an Agent of Change." And this is kind of the way to think about it as a kind of catalyst. Because it happened. The printing press did not make the Renaissance happen. The Renaissance was already starting to happen, but it was a huge accelerant for what had already started happening and what Kenneth Clark called Big Thaw.

      !- for : difference between oral and written tradition - writing is an external memory, much larger than the small one humans are endowed with. Hence, it allowed for orders of magnitude more reasoning.

  20. Jul 2022
  21. Apr 2022
  22. Mar 2022
  23. Feb 2022
  24. www.geoffreylitt.com www.geoffreylitt.com
    1. browser extension

      I've spent a lot of time in frustrated conversations arguing the case for browser extensions being treated as a first class concern by browser makers (well, one browser maker). But more and more, I've come to settle on the conclusion that any browser extension of the sort that Wildcard is should also come with the option of using it (or possibly a stripped down version) as a bookmarklet, or a separate tool that can process offline data—no special permissions needed.

      (This isn't because I was wrong about browser extensions; it's precisely because extension APIs were drastically limited that this becomes a rational approach.)

  25. Jan 2022
  26. Nov 2021
    1. To be clear, I am not advocating overthrowing the state or any of the other fear-mongering mischaracterizations of anarchism. I am advocating ceasing to spend all of our resources focusing on the state as the agent of the change we seek. We no longer have the time to waste.
  27. Mar 2021
  28. Feb 2021
    1. Gordon, D. E., Hiatt, J., Bouhaddou, M., Rezelj, V. V., Ulferts, S., Braberg, H., Jureka, A. S., Obernier, K., Guo, J. Z., Batra, J., Kaake, R. M., Weckstein, A. R., Owens, T. W., Gupta, M., Pourmal, S., Titus, E. W., Cakir, M., Soucheray, M., McGregor, M., … Krogan, N. J. (2020). Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms. Science, 370(6521). https://doi.org/10.1126/science.abe9403

  29. Oct 2020
    1. Identify your user agents When deploying software that makes requests to other sites, you should set a custom User-Agent header to identify the software and provide a means to contact its maintainers. Many of the automated requests we receive have generic user-agent headers such as Java/1.6.0 or Python-urllib/2.1 which provide no information on the actual software responsible for making the requests.
  30. Jul 2020
  31. Jun 2020
  32. May 2020
  33. Apr 2020
  34. Feb 2020
  35. Jan 2020
  36. Dec 2019
  37. Mar 2019
  38. Oct 2018
  39. Apr 2018
  40. Jan 2018
  41. Oct 2017
  42. Apr 2017
    1. (like “nature itself,” notmerely our representations of it!) has a history

      RE: "Nature itself" having a history

      Nathaniel's in-class comments last week were very helpful to hear prior to the readings this week. It is particularly helpful to consider the paradox that some folks want to protect the Earth from humans and somehow return it to a point "before humans," as though the Earth exists outside of humans and we are pure agents acting upon it. (Which is where we get things like this video that has been going around Facebook because of Earth Day:) https://www.youtube.com/watch?v=49w7GHVYoI0

      This video pretends the Earth is an agent, but it actually only reflecting human actions back on humans. There is an underlying argument that our relationship to the planet is only the things we do to it and not all the other relationships and existences on and in it.

  43. Feb 2014