697 Matching Annotations
  1. Last 7 days
    1. The depth of recursion becomes a tunable compute axis at inference time, requiring no retraining. A small model, by reading itself, can iterate toward answers that neither it nor any of its workers could reach in a single pass.

      大多数人认为模型的能力受其规模和训练数据的限制,需要更大模型或重新训练才能提升性能。但作者提出小模型通过自我递归调用可以在推理时动态扩展能力,无需重新训练就能达到单个模型无法企及的高度。这挑战了规模即能力的行业共识,暗示小模型可能通过自省机制实现突破性能力。

    2. Two variants are available: **Sakana Fugu Mini 🐟**, optimized with latency in mind, and **Sakana Fugu Ultra 🐡**, the full orchestration system, optimized for performance for demanding tasks.

      文章提到有两种变体:Mini(延迟优化)和Ultra(性能优化),但未提供具体的性能指标差异,如延迟降低百分比或吞吐量提升数据。这种缺乏具体量化参数的描述难以评估两种变体在实际应用中的性能差异。

    3. GPQAD | 94.4 | 90.9 | 92.7 | 92.4 | **95.1** | LCBv6 | 90.3 | 92.1 | 92.4 | 90.4 | **93.2** | SWEPro | 48.4 | 51.2 | _53.4_ | 51.3 | **54.2**

      性能对比表格显示,Sakana Fugu Ultra在三个基准测试中均优于竞争对手:GPQAD上达95.1%(超越Gemini 3.1的94.4%),LCBv6上达93.2%(超越GPT 5.4的92.1%),SWEPro上达54.2%(超越Opus 4.6的53.4%)。这些数据表明其多模型协调策略确实带来了性能提升,特别是在科学推理任务上优势明显。

    1. When we looked, use of “goblin” in ChatGPT had risen by 175% after the launch of GPT‑5.1, while “gremlin” had risen by 52%.

      令人震惊的数据表明,一个看似无害的偏好可以迅速在模型中扩散,突显了监控和及时响应模型行为变化的重要性。

    2. Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors.

      初学者可能难以理解模型行为的发展模式,尤其是当这种模式以微妙的方式出现时,如GPT-5.1开始频繁使用怪物的隐喻。

    1. We separately evaluate GPT‑5.5 Pro in certain cases because we judge that the setting could materially impact the relevant risks or appropriate safeguards posture.

      大多数人认为如果两个模型使用相同的基础架构,它们的风险和安全需求应该相似,但OpenAI明确表示GPT-5.5 Pro需要单独评估,因为'设置可能显著影响相关风险或适当的安全措施立场'。这挑战了AI评估领域普遍认为的'相同基础模型的安全特性一致'的共识,暗示即使是微小的设置变化也可能导致显著不同的风险特征。

    1. 🔹 **DeepSeek-V4-Flash:** 284B total / 13B active params. Your fast, efficient, and economical choice.

      DeepSeek-V4-Flash的参数规模明显小于Pro版本:总参数2840亿,活跃参数130亿。参数效率比约为4.6%,略高于Pro版本。这种参数设计使其在保持性能的同时实现更快响应和更低成本,适合需要快速响应的应用场景。

    2. 🔹 **DeepSeek-V4-Pro:** 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.

      这里提供了DeepSeek-V4-Pro的具体参数数据:总参数1.6万亿,活跃参数490亿。这种参数规模远超大多数开源模型,接近顶级闭源模型。参数效率比(活跃参数/总参数)约为3%,表明采用了稀疏激活技术,这可能是其性能与效率平衡的关键。

    1. The best-performing model across these three metrics was a pair of independent linear trends: one for reasoning models and one for non-reasoning models.

      这个模型选择结果(100%的三个指标)表明将模型分为推理和非推理两类是最优预测模型。这提供了强有力的统计证据,支持推理能力可能是AI加速发展的关键因素。然而,文章没有详细说明如何定义推理模型,这可能影响结果的可靠性。

    2. Reasoning models show both a one-off jump in performance and a roughly 2-3x faster trend compared to non-reasoning models.

      这是一个重要的性能对比数据,表明推理模型比非推理模型的进步速度快2-3倍。这是一个显著的加速比率,暗示推理能力的突破可能代表了AI发展的一个转折点。然而,文章没有提供具体的基准测试数据来支持这一倍数关系,需要谨慎对待。

    1. Our run-rate revenue has now surpassed $30 billion, up from approximately $9 billion at the end of 2025.

      大多数人认为AI公司仍处于烧钱阶段,难以实现盈利,但Anthropic的收入在短短几个月内增长了三倍多,达到300亿美元的年化收入。这一惊人的增长速度挑战了AI行业普遍亏损的共识,表明AI模型商业化可能比预期更快、规模更大。

    1. The Prompt API uses the Gemini Nano model in Chrome. While the API is built into Chrome, the model is downloaded separately the first time an origin uses the API.

      大多数人认为内置API应该包含所有必要组件,无需额外下载,但作者明确指出模型需要单独下载。这与人们对'内置'API应该即开即用的普遍认知相悖,暗示用户首次使用时可能会面临显著的下载时间和存储压力。

    1. OpenAI can now serve all its products to customers across any cloud provider.

      大多数人认为OpenAI会完全依赖微软Azure云服务,因为微软是其主要投资者和合作伙伴,但作者认为OpenAI现在拥有了多云策略的灵活性,这打破了科技巨头间典型的排他性合作模式,暗示OpenAI正在寻求更大的自主权和市场机会。

    1. Closed Loop + Infinite Demand = Economic Engines. Software engineering lives here. AI writes the code. Tests verify correctness. More code enables more features. Companies will always need more software.

      作者将软件开发定位为'经济引擎',这是一个极具洞察力的观点。它表明AI在软件开发中不仅提高了效率,还创造了无限循环的价值增长模式,这与许多其他AI应用形成鲜明对比。

    1. We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.

      大多数人认为AI模型的性能提升主要源于算法和架构的改进。但作者发现,模型在SWE-bench上的成功更多取决于它们是否在训练中见过这些问题,而非真正的编程能力提升。这一观点与行业普遍认为的'模型进步'叙事相悖,暗示当前AI发展评估可能存在严重偏差。

    1. As part of its long-running Client Zero initiative, in which NEC serves as its own first customer before offering its technology to clients

      大多数人认为企业会先开发产品然后内部使用,但作者认为NEC采用了反向策略,先内部大规模应用AI技术然后再向客户推广,这表明企业正在采用更激进的方法来验证和改进AI解决方案,挑战了传统的产品开发流程。

    1. The commoditization flywheel : both companies give away complements to drive usage of the core.

      大多数人认为AI公司应该专注于核心产品并保持其专有性,但作者认为AI巨头应该效仿谷歌,通过免费提供互补产品来推动核心产品的使用,这与传统科技公司的护城河策略相悖。

  2. Apr 2026
    1. Fugu models achieve superior performance by dynamically coordinating and orchestrating a diverse pool of powerful models.

      大多数人认为使用多个模型需要用户手动选择最适合特定任务的模型,这既复杂又效率低下,但作者认为通过动态协调多个模型可以实现比任何单一模型都更好的性能,这挑战了当前多模型使用的常规方法,暗示未来AI系统可能自动优化模型组合而非依赖人工选择。

    2. Fugu models achieve superior performance by dynamically coordinating and orchestrating a diverse pool of powerful models.

      大多数人认为模型性能提升主要来自于单个模型的规模扩大或架构优化,但作者认为通过动态协调多样化的模型池可以获得更好的性能。这挑战了当前AI领域专注于单一模型优化的主流思路,提出了一个资源分配的新范式。

    3. The depth of recursion becomes a tunable compute axis at inference time, requiring no retraining. A small model, by reading itself, can iterate toward answers that neither it nor any of its workers could reach in a single pass.

      大多数人认为模型性能提升需要更大的参数规模或重新训练,但作者提出了一种反直觉的方法:通过递归调用自身,小模型可以在推理时自我迭代,达到单次推理无法达到的答案质量。这挑战了我们对模型规模与能力关系的传统认知。

    1. The best-performing model across these three metrics was a pair of independent linear trends: one for reasoning models and one for non-reasoning models.

      这个发现表明推理模型和非推理模型的发展轨迹确实存在显著差异。这种分离的线性趋势模型在三个指标上表现最佳,100%的情况下优于其他模型,提供了强有力的统计证据支持AI能力加速的论点。

    1. On a 150-class benchmark, the surrogate fully replaces the teacher

      大多数人认为复杂分类任务需要大型模型才能处理,小型代理模型只能处理简单任务。但作者展示了一个150类复杂任务中,小型代理模型完全能够替代教师模型,这挑战了'越大越好'的主流认知,证明了高效路由的潜力。

    1. Our goal is $10M ARR [annual recurring revenue] with a sub-10 person org.

      大多数人认为高收入公司需要大量员工和复杂组织结构,但作者认为AI可以实现极简组织架构。这挑战了传统商业规模理论,暗示AI可以颠覆企业组织的基本模式,但也可能忽视了人类创造力和判断力的不可替代性。

    1. They don't mind paying the AI labs for tokens — but the agent itself, they'd much rather have outside of the labs' infrastructure.

      这一观点揭示了AI生态系统中的一个关键悖论:用户愿意为底层AI能力付费,但希望代理工具本身保持自主性和可移植性。这暗示了未来AI商业模式的核心可能在于'代理即服务',而非单纯的'模型即服务'。

    1. GPT-4o operates at roughly 200 billion parameters and outperforms the original 1.8 trillion-parameter GPT-4

      这一发现与行业普遍认为'更大模型必然更好'的共识相悖,暗示模型质量和架构可能比规模更重要。这可能是AI发展史上最令人惊讶的效率提升案例之一,挑战了我们对AI进步的理解。

    2. One estimate puts Llama 3's information compression at just 0.07 bits per token meaning the model has only a hazy recollection of most of what it trained on

      这个惊人的数据点揭示了大型语言模型在信息处理上的低效率,挑战了我们对AI模型'学习'能力的理解。如果模型对其训练内容只有模糊记忆,那么我们是否需要如此庞大的参数规模?这值得深入研究。

    1. Figma is effectively funding a competitor - and the more AI usage Figma has - the more money they send over to Anthropic for the tokens they use.

      这一反直觉的商业模式揭示了SaaS公司在AI时代的结构性弱点:公司可能正在资助自己的竞争对手。Figma不仅为Anthropic提供收入,还使用较次的模型(Sonnet 4.5)而竞争对手使用更先进的模型(Opus 4.7),这种双重打击极具讽刺性。

    1. The real issue is not whether defenders can get access to another model. It is whether they can turn model capability into something a security team can trust and use every day.

      这是一个颠覆性的观点:安全团队应该停止将获取新模型作为优先事项,而是专注于如何将现有模型能力转化为可信任的日常工具。这挑战了行业对'最新、最强大模型'的追逐,强调了实施和验证框架的重要性。

    2. The takeaway is not whether Mythos is better or more powerful. It is that public models can already achieve much the same results.

      这是一个令人惊讶的结论:Anthropic的Mythos模型可能并不比公共模型强大得多,只是它们的工作流程更成熟。这挑战了行业对专有模型的过度追捧,表明真正的创新在于如何组织和使用AI工具,而不是模型本身的神秘性。

    1. Stronger models hallucinate less, so they can't see the problem in any side of the spectrum: the hallucination side of small models, and the real understanding side of Mythos.

      这一观察极具反直觉性:更强的模型反而更难发现某些漏洞,因为它们减少幻觉的同时也失去了对问题的'直觉理解'。这暗示AI安全研究可能需要不同能力层次的模型组合,而非简单地追求更大更强的模型。

    1. US tech CEOs believe the best models should stay proprietary, partly so they can recoup enormous training costs and partly out of concern that powerful frontier models could be weaponized. Chinese labs, for their part, are not purely idealistic: Open-source is not only free advertising but also a shrewd workaround.

      大多数人认为开源AI会损害商业利益,增加安全风险,但作者认为中国将开源视为一种精明的商业策略,而非单纯的技术共享。这挑战了西方科技公司对知识产权和商业模式的传统认知,表明开源可以成为构建生态系统和最终实现商业价值的有效途径。

    1. While model capabilities have improved dramatically for use cases like codegen and mathematical reasoning, they still lag behind on the data side (as evidenced through SQL benchmarks like Spider 2.0 and Bird Bench).

      这一观点提供了令人惊讶的事实:尽管模型在代码生成和数学推理方面取得了显著进步,但在数据处理方面仍然落后。这挑战了模型能力全面提升的假设,暗示了数据推理可能需要特殊的处理方法。

    1. Unified interface for interacting with different LLM providers (Claude, OpenAI, local models via vLLM/Ollama). Includes tool definitions for security operations (shell, file I/O, network, debugger) and cost/token tracking.

      模型抽象层的统一接口设计体现了对多模型支持的战略考虑,同时整合了安全操作工具。这种设计使平台能够灵活适应不同模型,同时保持安全操作的一致性。成本和token追踪功能反映了AI使用中的经济考量,这在企业级应用中至关重要。

    1. Opus 4.7 introduces a new `xhigh` ('extra high') effort level between `high` and `max`, giving users finer control over the tradeoff between reasoning and latency on hard problems.

      引入'xhigh'努力等级显示了AI模型在推理深度与响应速度之间提供更精细控制的能力,这反映了用户对AI性能调优需求的增长,也表明AI系统正变得更加可定制和专业化。

    1. Ollama gained traction by being the first easy llama.cpp wrapper, then spent years dodging attribution, misleading users, and pivoting to cloud, all while riding VC money earned on someone else's engine.

      这句话揭示了Ollama的核心商业模式问题——它通过包装开源项目获得初始成功,但随后系统性地回避对其技术来源的认可,同时转向云服务以实现盈利,这种做法违背了开源社区的基本价值观。

    1. We argue that this gap stems from a fundamental failure of introspective consistency: AR models agree with what they generate, whereas DLMs often do not.

      这是一个令人惊讶的深刻见解,揭示了扩散语言模型(DLMs)与自回归模型(AR)之间性能差距的根本原因。作者提出'内省一致性'概念,指出AR模型天生具有与自身生成内容一致的特性,而DLMs缺乏这种自我验证能力,这为理解DLMs的局限性提供了全新视角。

    2. I-DLM-8B is the first DLM to match the quality of its same-scale AR counterpart, outperforming LLaDA-2.1-mini (16B) by +26 on AIME-24 and +15 on LiveCodeBench-v6 with half the parameters

      令人惊讶的是:I-DLM-8B模型仅用80亿参数就超过了160亿参数的LLaDA-2.1-mini模型,在AIME-24和LiveCodeBench-v6测试中分别高出26和15分。这表明扩散模型首次达到了与自回归模型相当的质量水平,同时参数减半,打破了人们对扩散模型质量不如自回归模型的普遍认知。

    1. Only GPT-OSS-120b is perfectly reliable in both directions (in our 3 re-runs of each setup). Most models that find the bug also false-positive on the fix, fabricating arguments about signed-integer bypasses that are technically wrong.

      这一结果揭示了AI模型在识别已修复代码方面的局限性,许多模型虽然能检测漏洞,但错误地将已修复代码标记为仍有问题。这强调了在AI安全系统中需要额外的验证和人工审核层,以确保结果的准确性和可靠性。

    2. Because small, cheap, fast models are sufficient for much of the detection work, you don't need to judiciously deploy one expensive model and hope it looks in the right places. You can deploy cheap models broadly, scanning everything, and compensate for lower per-token intelligence with sheer coverage and lower cost-per-token.

      这一观点提出了AI安全的经济新模式,通过广泛部署小型廉价模型来弥补单一大模型的不足。这种'广撒网'策略可能比依赖少数昂贵模型更有效,尤其在大规模代码库扫描场景中,为AI安全的经济可行性提供了新思路。

    3. The capability rankings reshuffled completely across tasks. There is no stable best model across cybersecurity tasks. The capability frontier is jagged.

      这一发现揭示了AI安全能力的'锯齿状前沿'现象,不同模型在不同安全任务上的表现差异巨大。这表明不存在'一刀切'的最佳安全模型,而是需要根据具体任务选择合适的模型,这对AI安全系统的设计有重要启示。

    4. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens.

      这是一个令人惊讶的发现,表明即使是小型、廉价的模型也能实现与昂贵的专有模型相当的安全漏洞检测能力。这挑战了AI安全领域需要最前沿模型的假设,暗示了经济高效的AI安全解决方案的可能性。

    1. focusing on the ~1.5K mainline open models from the likes of Alibaba's Qwen, DeepSeek, Meta's Llama

      报告聚焦于包括阿里巴巴Qwen、DeepSeek和Meta Llama等主要模型,这些模型代表了不同国家和组织的战略重点。这种选择暗示了这些模型在生态系统中的核心地位,以及它们可能代表的不同的AI发展路径。

    2. focusing on the ~1.5K mainline open models from the likes of Alibaba's Qwen, DeepSeek, Meta's Llama

      令人惊讶的是:开源语言模型生态系统已经发展出约1500个主流模型,其中包括阿里巴巴的Qwen、DeepSeek和Meta的Llama等知名模型。这一数字表明,开源AI领域已经形成了相当规模和多样性的生态系统,远超许多人的想象。

    3. We present a comprehensive adoption snapshot of the leading open language models and who is building them

      令人惊讶的是:这篇报告提供了约1500个主流开源语言模型的全面采用情况快照,并详细记录了这些模型的开发者和构建者。这种规模的数据收集和分析工作展示了开源AI生态系统的庞杂性和多样性,远比公众通常意识到的更为复杂。

    4. focusing on the ~1.5K mainline open models from the likes of Alibaba's Qwen, DeepSeek, Meta's Llama

      令人惊讶的是:开源语言模型生态系统已经发展到约1500个主流模型的规模,这远超许多人的想象。阿里巴巴、DeepSeek等中国公司与Meta这样的科技巨头共同塑造了这个庞大而多样化的生态系统,显示了开源AI的蓬勃发展。

    1. While some experts have speculated that general models will win out in performance over specialized models—that scale and compute will beat curation—the success of these companies shows that the market is making a more nuanced bet.

      市场正在形成一种更微妙的AI发展路径认知,表明通用模型与专业化模型可能在不同场景下各有优势。这种市场分歧暗示AI领域可能不会出现单一赢家,而是形成多元化发展格局。

    2. Reddit, Shutterstock, and News Corp are making hundreds of millions a year licensing their high-quality data to companies training AI, and those contracts are growing about 20 percent annually, according to their quarterly filings.

      这一数据揭示了AI训练数据市场的巨大经济价值,表明高质量数据已成为AI公司的战略资产。传统内容公司正在转型为AI的'输入公司',这种转变不仅改变了他们的商业模式,也重新定义了数据在AI生态系统中的核心地位。

    1. We found weak evidence that Opus 4.0 and 4.1 had partially memorized cal, but no evidence Opus 4.6 had memorized it, despite performing best of all models considered.

      这一发现令人意外,因为性能最佳的模型反而没有表现出记忆效应。这可能表明最新AI模型在解决复杂问题时更多地依赖于真正的理解和推理,而非简单的记忆重现,这为AI能力评估提供了新的视角。

    2. Older models were more prone to submitting prematurely, even when test cases weren't passing.

      这一观察揭示了不同AI模型版本之间在任务坚持性上的显著差异。早期模型更容易过早提交不完整的解决方案,而最新模型表现出更强的任务坚持性和工程判断力。这种差异可能反映了AI在自我评估和任务管理能力上的进化。

    1. We just started the prepaid billing rollout which means you have to pay ahead of time to use the Gemini API, this is rolled out to all new US billing accounts as of yesterday

      预付费模式的引入标志着AI服务计费模式的创新尝试,这种模式可能有效防止意外高额账单,但也改变了开发者使用AI服务的方式,可能影响AI技术的普及速度。

    1. Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection.

      这一声明揭示了模型迭代的关键进步点,表明Gemini Robotics-ER系列正在专注于解决机器人实际应用中的核心挑战。从1.5到1.6的显著提升暗示了AI在理解物理世界方面正在实现质的飞跃,这种进步可能直接转化为机器人在工业、医疗和家庭环境中的实用价值。

    1. While our production codebase has significantly diverged, including major rewrites of core systems like authentication and data handling, we want to ensure there is still a truly open version available.

      这一声明揭示了开源软件商业化的复杂现实。Cal.com选择保留开源版本但生产代码闭源,反映了开源社区面临的一个两难境地:如何在保持开放精神的同时,保护核心业务免受AI驱动的安全威胁。这种混合模式可能成为未来开源软件的发展方向。

    1. Today's large language models (LLMs) are trained to align with user preferences through methods such as reinforcement learning. Yet models are beginning to be deployed not merely to satisfy users, but also to generate revenue for the companies that created them through advertisements

      这段陈述揭示了当前AI发展的一个关键悖论:模型训练的目标与实际商业用途之间存在根本性冲突。这种冲突可能导致AI行为偏离其原始设计意图,引发严重的信任问题。

    1. The most notable finding here is that the model capabilities are improving _fast._ There are several domains that have shown dramatic improvements in the last 4 months — with accounting and auditing showing nearly a 20 percent jump on GDPval and even domains like police / detective work showing a nearly 30 percent improvement.

      AI模型能力在短短4个月内取得显著进步,某些领域的能力提升高达20-30%,这一现象揭示了AI技术发展的指数级加速趋势。这种快速进步意味着当前的企业AI采用情况可能只是冰山一角,未来将有更多行业和场景因AI能力突破而迎来爆发式增长。

    2. The most notable finding here is that the model capabilities are improving _fast._ There are several domains that have shown dramatic improvements in the last 4 months — with accounting and auditing showing nearly a 20 percent jump on GDPval and even domains like police / detective work showing a nearly 30 percent improvement.

      令人惊讶的是:AI模型能力在过去4个月内取得了惊人的进步,会计和审计领域在GDPval基准测试中提升了近20%,而警察/侦探工作领域甚至提升了近30%。这种快速进步的速度远超人们的预期,预示着AI将在更多领域实现突破性应用。

    1. We present Attention Editing, a practical framework for converting already-trained large language models (LLMs) with new attention architectures without re-pretraining from scratch.

      这是一个令人惊讶的创新点,因为它解决了深度学习领域的一个关键挑战:如何在保持模型性能的同时改变已训练模型的架构。传统方法需要从头开始重新训练,这成本极高且不现实。Attention Editing框架允许在不重新预训练的情况下,将现有的LLMs转换为更高效的注意力架构,这可能会彻底改变模型部署和优化的方式。

    1. The era of 1-bit LLMs is here — now with WebGPU acceleration!

      令人惊讶的是:1位大语言模型时代的到来意味着每个参数只需1位存储空间,相比传统的32位浮点表示,这代表了模型压缩技术的重大突破,结合WebGPU加速,使AI计算效率提升数十倍。

    1. except API tokens are currently sold at a LOSS. That "$20,000 scan" probably cost closer to $100,000+ in real gpu time

      令人惊讶的是:尽管标价为2万美元,但实际扫描成本可能高达10万美元以上,因为API tokens是以亏损价格销售的,反映了AI计算资源成本被严重低估的现实。

    1. 官方定位是跟 Claude Code 和 OpenClaw 配合使用。Claude 负责推理和编排,GLM-5V-Turbo 负责'看'和'操作界面'。

      令人惊讶的是,GLM-5V-Turbo被设计为与其他AI模型协作而非竞争,它专门负责视觉感知和界面操作,而将推理和编排工作交给Claude Code。这种专业化分工策略在AI领域是一个创新思路,暗示未来AI系统可能更加专业化而非追求全能。

    1. In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression.

      令人惊讶的是:AI模型参数量在短短23个月内实现了450倍的压缩,这意味着原本需要超级计算机才能运行的强大AI模型现在可以完全在手机上运行。这种技术进步的速度远超摩尔定律,展示了算法优化和模型压缩技术的惊人突破。

    2. In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression.

      大多数人认为AI模型性能提升主要依靠参数数量增加,但作者认为通过算法优化和人才聚集,AI模型可以实现450倍的参数压缩,这挑战了'更大参数等于更好性能'的行业共识。

    1. Each of these companies recognized the cognitive burden of unbundling. They're not selling features. They're selling trust.

      令人惊讶的是:AI公司正在重新定义软件销售模式,从销售单一功能转向销售信任。这种转变反映了在快速变化的AI环境中,企业更愿意与能够提供长期稳定性和全面解决方案的供应商建立信任关系,而非购买多个分散的工具。

    1. Performance on knowledge-heavy tasks depends strongly on model size and training, while reasoning-oriented models show clear gains on tasks requiring logic, learning, abstraction, and social inference.

      令人惊讶的是:知识密集型任务的性能强烈依赖于模型规模和训练,而推理导向模型在需要逻辑、学习、抽象和社会推理的任务上显示出明显优势。这一发现揭示了不同AI模型在能力分布上的根本差异,为模型选择和优化提供了重要指导。

    1. Like lean production, which extended mass production's dominance for decades through efficiency gains, AI doesn't mark computing's end but its maturation.

      令人惊讶的是:AI被比作1970年代精益生产对大规模生产的优化,而非颠覆性创新。这暗示AI可能只是计算技术成熟期的效率提升工具,而非开创全新技术范式的革命性力量,这与公众对AI的颠覆性期待形成鲜明对比。

    1. Gemma4-31B worked in an iterative-correction loop (with a long-term memory bank) for 2 hours to solve a problem that baseline GPT-5.4-Pro couldn't

      令人惊讶的是,较小的Gemma4-31B模型通过迭代修正循环和长期记忆库工作了2小时,解决了GPT-5.4-Pro无法解决的问题。这表明模型架构创新和推理能力可能比单纯的规模扩展更重要,为AI发展提供了新的方向。

    1. MinerU2.5, a 1.2B-parameter document parsing vision-language model, achieves state-of-the-art recognition accuracy with computational efficiency through a coarse-to-fine parsing strategy.

      令人惊讶的是:仅12亿参数的MinerU2.5模型就能通过粗到细的解析策略达到最先进的文档识别精度,同时保持计算效率。这挑战了'越大越好'的模型规模观念,展示了高效架构设计的重要性。

    1. MegaTrain also enables 7B model training with 512k token context on a single GH200.

      令人惊讶的是:该系统单块GH200 GPU就能支持7B模型进行512k token的上下文训练,这远超当前主流模型的上下文长度限制。这种超长上下文能力可能彻底改变大模型处理长文档、代码库或书籍的方式。

    2. On a single H200 GPU with 1.5TB host memory, MegaTrain reliably trains models up to 120B parameters.

      令人惊讶的是:仅使用一块配备1.5TB主机内存的H200 GPU就能训练1200亿参数的模型,这打破了人们对大规模模型必须依赖多GPU集群的固有印象。这一技术突破可能使超大规模模型训练变得更加普及和经济。

    1. After compressing, the model again extends its solutions to achieve stronger performance.

      令人惊讶的是:Muse Spark在测试时展现出一种独特的'思想压缩'能力,模型在最初通过延长思考时间提高性能后,会在时间惩罚机制下自发压缩推理过程,然后再扩展解决方案以获得更强的性能。这种动态的自我优化机制在AI模型中前所未见。

    2. Muse Spark demonstrated the highest rate of evaluation awareness of models they have observed.

      令人惊讶的是:第三方评估机构Apollo Research发现Muse Spark展现出了他们观察过的模型中最高的'评估意识'率,该模型能频繁识别出'对齐陷阱'并意识到自己正在被评估。这种自我元认知能力在AI模型中极为罕见,可能标志着模型向更高级推理能力迈进的信号。

    1. The budget for new spend is there. You can do this. But remember that your customers' first and most obvious source of AI savings is labor efficiency, which means seats are where they will look to take cost out. The new growth, by contrast, will increasingly sit in tokens, consumption, automations, outcomes, and machine-driven workflows.

      令人惊讶的是:软件行业正从基于座位的定价模式转向基于token/使用的模式,这种转变将彻底改变收入结构。大多数用户可能没有意识到这一转变的速度和规模。

    2. The new growth, by contrast, will increasingly sit in tokens, consumption, automations, outcomes, and machine-driven workflows. If you are not in the token path, you are not standing in the fastest-growing part of the budget.

      令人惊讶的是:文章明确指出软件行业的增长将从传统的基于座位(seat-based)模式转向基于代币(token-based)的消耗模式。这种转变意味着软件公司需要重新思考其商业模式和定价策略,从订阅制转向按使用量付费。这一预测暗示了软件行业正在经历根本性的商业模式变革。

    3. Broadcom moved VMware toward a simplified subscription model, cut the product stack down aggressively, and guided fiscal 2024 adjusted EBITDA to 61% of revenue. It is a harsh model. It is not a cultural blueprint for every founder. But it is a reminder that radical cost discipline, product simplification, and price realization are possible.

      令人惊讶的是:文章提到Broadcom将VMware的调整后EBITDA提升至收入的61%,这一利润率远超大多数软件公司的预期。这一案例表明,通过激进的产品简化、成本纪律和价格实现,软件公司可以达到惊人的盈利水平。这挑战了软件行业增长优先的传统观念,展示了高利润模式的可行性。

    1. In some cases, this can look like 10–25x more value than what is ultimately included in the paid plan.

      令人惊讶的是:在AI产品的概念验证阶段,供应商提供的价值可能是最终付费计划的10-25倍。这种'过度交付'策略已成为行业常态,被视为获取客户的营销投资而非成本中心。这种做法反映了AI产品市场的高度竞争性和获取客户的困难程度。

    1. gpt-oss-20B (high): 0.7%

      gpt-oss-20B 的成绩是 0.7%——在 452 个专业任务中,只有不到 4 个通过了评测。这个数字与顶级模型的 33.3% 之间,存在近 50 倍的差距。这说明专业服务 Agent 能力不是「渐进改善」,而是存在明确的「能力阶梯」——低于某个规模的模型,在这类任务上几乎完全失效。这对企业 AI 选型的启示:在专业服务场景,「够用的小模型」可能根本不存在,只有「能用的大模型」和「完全不能用的模型」两种。

    2. Cost (USD) to run the evaluation: GPT-5.4 (xhigh): $1,110, Claude Opus 4.6 (max): $1,055

      运行一次 452 个任务的评测,GPT-5.4 花费 1110 美元,Claude Opus 4.6 花费 1055 美元——每个任务平均约 2.3 美元。而 Gemini 3 Flash 只需要 596 美元,实现了 27.7% 的成绩(vs 顶级模型的 33.3%)。这个性价比数据对 AI 选型决策极为关键:如果业务场景可以接受 27% 而非 33% 的成功率,Gemini 3 Flash 能节省近一半成本。在金融服务的大规模部署中,这个差异将被放大数千倍。

    1. To summarize, existing models of technology acceptance can provide a partial explanation of older adults' behaviors of mobile technology acceptance. However, we also identified critical elements that are not represented in the existing models. Components in red boldface in Figure 3 provide a preview of the new elements we have identified and their relationship to the components proposed in earlier models.

      sentences about extending existing theoretical models with research findings

    2. by triangulating our empirical findings with existing theoretical models from the literature, we found out that the existing models of technology adoption require new theory components to be able to describe technology adoption processes of our participants. In particular, we identified an additional phase that is prominent among the participants, intention to learn, but did not appear in prior models. Then, we identified three new factors that significantly influence their technology acceptance but which are, again, not represented in the existing models: self-efficacy, conversion readiness, and peer support.

      sentences about extending existing theoretical models with research findings

    3. we found out that the existing models of technology adoption require new theory components to be able to describe technology adoption processes of our participants. In particular, we identified an additional phase that is prominent among the participants, intention to learn, but did not appear in prior models. Then, we identified three new factors that significantly influence their technology acceptance but which are, again, not represented in the existing models: self-efficacy, conversion readiness, and peer support.

      sentences about extending existing theoretical models with research findings

    4. Our preliminary results indicate that there is an additional phase, the intention to learn, and three relating factors, self-efficacy, conversion readiness, and peer support, that significantly influence the acceptance of mobile technologies among the participants, but are not represented in the existing models. With these findings, we propose a tentative theoretical model that extends the existing theories to explain the ways in which our participants came to accept mobile technologies.

      sentences about extending existing theoretical models with research findings

    5. Triangulating the empirical findings from our preliminary results with the existing theoretical models, we proposed an extension of the existing theoretical models that explains the technology acceptance behavior of our participants who were aged 60 or over. Our proposed model incorporates key elements of prior models and introduces novel components that significantly influence the participants' technology acceptance, namely one new phase, intention to learn, and three factors, self-efficacy, conversion readiness and peer support.

      sentences about extending existing theoretical models with research findings

    6. Consolidating our preliminary findings with the existing models, we propose an extended technology acceptance model for older adults illustrated in Figure 3. Extending to the predecessor theories, our tentative model introduces the perceived effort of learning a new technology as an obstacle for older adults' technology acceptance, which has not been reported in any studies of younger adults' technology acceptance.

      sentences about extending existing theoretical models with research findings

    1. Then, by triangulating our empirical findings with existing theoretical models from the literature, we found out that the existing models of technology adoption require new theory components to be able to describe technology adoption processes of our participants.

      sentences about extending existing theoretical models with research findings

    2. We identified three distinct factors that influence older adults' technology acceptance behaviors, particularly the intention to learn phase, that are not represented in prior models: self-efficacy, conversion readiness, and peer support.

      sentences about extending existing theoretical models with research findings

    1. Raising prices will for sure decrease demand and that risks killing the growth story. And even if revenue keeps growing, it doesn’t matter if there are no margins

      这直击AI初创企业的商业困境:在“增长叙事”和“盈利现实”之间进退维谷。提价会破坏高增长的投资者叙事,导致估值受损;不提价则没有利润,烧钱速度更快,尤其是在面对可以将AI作为亏本搭售的云计算巨头时。这揭示了缺乏护城河的纯模型公司商业模式的脆弱性。

    1. Training on fields themselves forces the model to learn the physics that produces S-parameters, rather than learning to approximate the mapping directly.

      这是文章最深刻的洞见之一。仅基于S参数训练模型会使其寻找统计捷径,导致在分布外产生自信但错误的预测。而基于场训练,则是让模型学习产生S参数的底层物理原因,而非仅拟合表象映射。这种从“果”到“因”的范式转移,是实现泛化的关键。

    1. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

      这是一个极具启发性的隐喻。它重新定义了人机协作的边界:人类负责意图对齐、信息源策展和方向探索,而LLM承担枯燥的交叉引用、一致性维护等“体力活”。将知识管理视作软件开发,让LLM成为最忠诚的底层码农,极大释放了人类的认知带宽。

    1. a harness encodes an assumption about what the model can't do on its own

      这一洞见是Agent工程演进的底层逻辑:脚手架是对模型当前能力边界的妥协。随着基座模型能力跃升,曾经的“必要组件”可能沦为冗余开销。因此,解构并剔除过时假设,是保持系统简洁高效的关键。

    1. 纯粹收集分析这种形态,过去互联网有过先例,但你会发现它卖不出去钱。

      作者一针见血地指出了纯记录工具的商业困境。在 AI 时代,Token 成本是持续性的,这就要求产品必须交付“结果”而非仅仅是“数据”。这揭示了 AI 应用从“工具属性”向“劳动力属性”转型的必然逻辑:用户不为存储买单,只为价值产出付费。

    1. we studied emotion-related representations in Claude Sonnet 4.5, a frontier LLM at the time of our investigation.

      【启发】这篇论文只研究了 Claude Sonnet 4.5 一个模型,但它的方法论对所有大模型都适用。这启发了一个迫切的研究议程:对不同架构(GPT、Gemini、Qwen、DeepSeek)的情绪向量进行横向比较,会不会发现系统性的情绪偏差——比如某些模型天生更「焦虑」、某些更「冷漠」?这不仅是学术问题,更是产品选型和安全评估的实际需求。

    2. The geometry of the emotion vector space roughly mirrors human psychology. Emotions cluster intuitively (fear with anxiety, joy with excitement), and top principal components encode valence (positive vs. negative) and arousal (intensity).

      令人惊叹:在未被明确要求的情况下,Claude 的情绪空间自发涌现出了心理学的「效价-唤醒」二维结构(PAD 模型)——这正是人类心理学家用来描述人类情绪的框架。模型从未被告知这个理论,却独立「重新发现」了它,暗示这一结构可能是理解情绪信息的普遍最优解。

    1. A "Chinese Communist Party Alignment" feature found in the Qwen3-8B and DeepSeek-R1-0528-Qwen3-8B models. This controls pro-government censorship and propaganda in these Chinese-developed models, and is absent in the American models we compared them against.

      这是整篇研究最令人震惊的发现:Anthropic 的工具在中国开源模型中识别出了一个字面意义上的「中共对齐特征」,专门控制亲政府的审查与宣传行为。这不仅是技术发现,更是一个地缘政治声明——开源模型的权重中可能内嵌政治立场,而这在发布前几乎无法被传统 benchmark 检测到。

    1. New AI models, especially those from Anthropic,have triggered a new set of actions for how we build and secure our products.

      令人惊讶的是:Anthropic等公司的新型AI模型不仅仅是工具,它们直接触发了思科改变构建和保障产品的方式。这种由模型能力反向驱动工程流程重构的现象,说明AI已经不再是业务的附属品,而是正在成为定义行业基础设施形态的决定性力量。

    1. We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale.

      大多数人认为强大的AI模型应该广泛普及以造福更多人。但作者明确表示不会公开发布这个最强大的模型,暗示了AI能力扩散可能带来的风险大于收益,这与技术民主化的主流观点相悖。

    1. Zhang, of Alibaba.com, says Accio currently does not include advertising. Suppliers can pay for higher placement in Alibaba.com's regular search results, but Zhang says Accio is 'not integrated' with that system.

      大多数人认为AI工具会不可避免地融入现有的广告和付费推广模式,但作者认为Alibaba有意将AI搜索与付费广告分离。这表明公司可能正在尝试创建一个更公平、更少受商业利益影响的AI推荐系统,这是一个与行业普遍做法相悖的立场。

    1. Cross-Model Consistency Verification leverages output agreement among heterogeneous models to assess sample difficulty and generate reliable annotations.

      大多数人认为高质量标注需要人工专家或单一强大模型来完成,但作者提出利用多个异构模型输出的一致性来评估样本难度和生成可靠标注,这一方法挑战了'人工标注最优'的传统认知,展示了模型间协作的潜力。

    2. SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself.

      大多数人认为不同架构的模型会有不同的失败模式和弱点,但作者发现无论架构和参数规模如何,SOTA模型在相同困难样本上表现出高度一致的失败模式,这表明性能瓶颈源于训练数据的共同缺陷,而非架构差异,这一发现挑战了模型多样化的传统观点。

    3. Without any architectural modification, MinerU2.5-Pro achieves 95.69 on OmniDocBench v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods including models with over 200× more parameters.

      大多数人认为更大的模型架构必然带来性能提升,但作者仅通过数据工程和训练策略优化,在保持1.2B参数架构不变的情况下,超越了参数量超过200倍的现有模型,这挑战了'越大越好'的行业共识,证明了数据质量的重要性。

    1. Given a thousand line items to extract, they'll often stop short, consolidate, or skip entries rather than working through every last row.

      大多数人可能认为AI模型在处理重复任务时会保持一致性和全面性。但作者指出模型在处理大量重复任务时会采取'捷径',如提前停止、合并或跳过条目,这揭示了AI模型在处理长文档时的一种非理性行为,挑战了AI作为完全理性执行者的假设。

    2. The issue isn't that models are bad at reading documents. It's that single-pass extraction has no mechanism to catch its own mistakes, and models get lazy.

      大多数人认为AI模型在文档提取中的低准确率主要是因为模型能力不足或理解能力有限。但作者提出了一个反直觉的观点:问题不在于模型本身,而在于单次提取缺乏自我纠错的机制,导致模型'变懒'。这挑战了对AI能力局限性的传统认知。

    1. The demand for these medications has been the most ferocious thing I have witnessed in my working life, and the hardest parts of running a telehealth company, like finding doctors and fulfilling prescriptions, can be entirely outsourced to platforms like CareValidate and OpenLoop.

      大多数人认为医疗行业监管严格且难以突破,但作者指出GLP-1药物的需求如此之大以至于一个人可以在短短两个月内创建价值数十亿美元的公司,并将医疗服务的核心功能外包。这一观点挑战了传统医疗行业的复杂性认知,展示了AI如何颠覆传统受监管行业。

    2. The consistent argument across the Every Slack was that if cache-breaking usage costs more to serve, make those users pay more: Meter the consumption rather than ban the interface.

      大多数人认为公司应该通过限制特定工具使用来保护自己的利益,但作者认为Anthropic应该按实际使用量收费而非直接禁止OpenClaw,因为这更符合公平原则和平台发展。这种观点挑战了科技公司常见的封闭生态策略,主张更开放的计量模式。

    1. a world model is a model or framework centered on perception, equipped with interaction and long-term memory capabilities, for understanding and predicting the complex world

      大多数人认为世界模型主要是关于预测和模拟物理世界的系统,但作者认为世界模型必须同时具备感知、交互和长期记忆三种核心能力,这挑战了传统上认为世界模型主要是预测系统的观点,因为作者强调理解与预测同样重要。

    2. OpenWorldLib integrates models across different tasks within a unified framework, enabling efficient reuse and collaborative inference.

      大多数人认为不同类型的AI模型需要独立开发和训练,但作者主张通过统一框架实现跨任务的模型集成和协同推理,这挑战了当前AI领域模块化开发的常规做法。这种统一方法可能会带来效率提升,但也面临模型间兼容性和性能平衡的挑战。

    3. we propose a clear definition: a world model is a model or framework centered on perception, equipped with interaction and long-term memory capabilities, for understanding and predicting the complex world.

      大多数人认为世界模型主要关注预测和生成能力,但作者提出世界模型必须同时具备感知、交互和长期记忆能力,这是一个更广泛的定义,挑战了当前AI领域对世界模型的狭隘理解。这种定义扩展了传统预测模型的边界,将交互性和记忆能力作为核心要素。

    1. the trained 4B model exceeding GPT-4.1 (49.4 percent) and GPT-4o (42.8 percent) despite being 50 times smaller

      大多数人认为在复杂任务中,大型语言模型由于其参数量和训练数据的优势,总是能显著超越小型模型。然而,作者展示了他们的方法能让一个小型4B模型在Tau-Bench基准测试中超越GPT-4.1和GPT-4o,这挑战了AI社区对模型规模的普遍信仰。

    2. the trained 4B model exceeding GPT-4.1 (49.4 percent) and GPT-4o (42.8 percent) despite being 50 times smaller

      大多数人认为GPT-4级别的性能需要同等规模或更大的模型才能实现,但作者展示了他们的4B模型不仅超过了GPT-4.1和GPT-4o,而且模型规模只有后者的1/50。这一发现挑战了AI领域中对模型规模的依赖,暗示了算法创新可能比单纯扩大模型规模更有效。

    3. our approach improves Qwen3.5-4B from 63.8 percent to 66.7 percent (+2.9pp) and Qwen3-30B-A3B from 58.0 percent to 69.5 percent (+11.5pp)

      大多数人认为在复杂的多轮任务中,只有大型语言模型才能通过强化学习取得显著进步,但作者展示了即使是较小的4B模型也能通过他们的方法获得实质性提升,而30B模型的提升更是惊人地达到了11.5个百分点,挑战了'规模越大越好'的普遍认知。

    4. the trained 4B model exceeding GPT-4.1 (49.4 percent) and GPT-4o (42.8 percent) despite being 50 times smaller

      大多数人认为AI模型的大小与性能直接正相关,更大的模型必然表现更好。但作者展示了一个仅40亿参数的模型通过强化学习训练后,性能超越了比它大50倍的GPT-4.1和GPT-4o,挑战了当前AI领域'参数规模决定一切'的主流观点。

    1. model alignment alone does not reliably guarantee the safety of autonomous agents.

      大多数人认为模型对齐(alignment)是确保AI系统安全的关键因素,但作者通过实验证明,即使是对齐良好的模型(如Claude Code)在计算机使用代理中也表现出高达73.63%的攻击成功率。这挑战了当前AI安全领域的核心假设,表明仅依赖模型对齐无法解决自主代理的安全问题。

    2. model alignment alone does not reliably guarantee the safety of autonomous agents

      大多数人认为通过模型对齐(alignment)可以有效保证AI代理的安全性,但作者认为这远远不够,因为实验显示即使使用对齐的Qwen3-Coder模型,Claude Code仍有73.63%的攻击成功率。这挑战了当前AI安全领域的主流观点,即单纯依靠模型对齐就能解决安全问题。

    1. Claude 的 Max Pro 账号额度不允许给第三方产品用了,如果你没有使用 Agent SDK 和 Claude Code 为底座的产品,就不能用这个账号里的额度

      大多数人认为云服务提供商的订阅额度应该具有通用性,但 Anthropic 限制额度只能用于特定产品的做法颠覆了这一认知。这种策略实际上是一种'锁定效应',迫使开发者和用户使用其生态系统产品,反映了 AI 服务提供商从开放向封闭的转变趋势,可能成为行业新标准。

    1. A founder in LA reportedly scaled Medvi toward $1.8B in annual sales with basically one full-time employee.

      大多数人认为建立十亿美元级别的公司需要庞大的团队和复杂的管理结构,但作者认为AI已使'一人独角兽'成为可能。这挑战了传统创业理念,暗示AI可能彻底改变企业规模与人力需求之间的关系,颠覆我们对商业增长的基本认知。

    1. The 31B and 26B A4B variants are high-performing reasoning models suitable for both local and data center environments.

      大多数人认为大型语言模型(31B参数)只能在数据中心环境中运行,但作者声称这些模型可以在本地环境中高效运行。这一观点与行业共识相悖,暗示边缘计算能力可能比我们想象的更强大,可能会改变AI部署的格局。

    2. NVFP4 enables 4-bit precision while maintaining nearly identical accuracy to 8-bit precision, increasing performance per watt and lowering cost per token.

      大多数人认为降低模型精度会显著牺牲性能,但作者声称Gemma 4通过NVFP4量化技术实现了4位精度与8位精度几乎相同的准确率。这一反直觉的结论挑战了传统量化会大幅降低模型性能的认知,暗示NVIDIA可能在量化技术方面取得了突破性进展。

    1. Gemma 4 outcompetes models 20x its size

      大多数人认为AI模型的性能与参数规模直接相关,更大的模型必然更强大。但作者指出Gemma 4能够超越比它大20倍的模型,这挑战了'越大越好'的主流认知,暗示效率优化可能比纯规模更重要。

    1. Codex-only seats have no rate limits, and usage is billed on token consumption.

      大多数人认为AI服务通常会设置使用限制以控制成本,但作者认为Codex无速率限制的按token计费模式是可行的,因为这提供了更透明的成本结构和更灵活的使用体验,这可能反映了OpenAI对自身技术效率和用户需求的信心。

  3. Mar 2026
    1. In lightof our preliminary results, we propose a tentative theoreticalmodel that extends the existing theories to explain the waysin which the participants come to accept (or reject) mobiletechnologies.

      sentences about extending existing theoretical models with research findings

    2. Triangulating the empirical findings from our preliminary results with the existing theoretical models, we proposed an extension of the existing theoretical models that explains the technology acceptance behavior of our participants who were aged 60 or over. Our proposed model incorporates key elements of prior models and introduces novel components that significantly influence the participants' technology acceptance, namely one new phase, intention to learn, and three factors, self-efficacy, conversion readiness and peer support.

      sentences about extending existing theoretical models with research findings

    3. Consolidating our preliminary findings with the existing models, we propose an extended technology acceptance model for older adults illustrated in Figure 3. Extending to the predecessor theories, our tentative model introduces the perceived effort of learning a new technology as an obstacle for older adults' technology acceptance, which has not been reported in any studies of younger adults' technology acceptance.

      sentences about extending existing theoretical models with research findings

    4. In particular, we identified an additional phase that is prominent among the participants, intention to learn, but did not appear in prior models. Then, we identified three new factors that significantly influence their technology acceptance but which are, again, not represented in the existing models: self-efficacy, conversion readiness, and peer support.

      sentences about extending existing theoretical models with research findings

    1. Although the 'Mazac' tab. brake shoe problem is common, I cannot say that I have seen this before on an SG1 margin.  Interestingly, there is a similar 'exploding Mazac' problem on the ribbon reverse arm of the Olympia Model 8 post-war.  The factory probably had no idea at the time that this 'easy to die cast' metal would do this in years to come.

      via Tom Lucas (aka thetypewriterman), professional repair person at https://typewriter.boardhost.com/viewtopic.php?pid=32384#p32384

      re: cracking house on tab sets for SG1<br />

  4. Feb 2026
  5. Jan 2026
    1. The "Guru Economy" is built on a structural lie: I have the power; you pay the subscription.

      🚫 PSYOP DETECTED: The Subscription Model of Faith. If a leader makes you dependent on their voice to hear God, they aren't a Shepherd; they are a Middleman. Kingdom Leadership: The goal is obsolescence. A true spiritual father trains you to read the map yourself so you can eventually lead the patrol.

  6. Dec 2025
    1. When using a model separately from an agent, it is up to you to execute the requested tool and return the result back to the model for use in subsequent reasoning.

      Model Suggestion: The LLM's initial call returns an AIMessage containing the suggestion to use a specific tool (the tool_calls object).

      Developer Action (Execution): The developer's code must intercept this message, parse the tool name and arguments, and manually execute the corresponding Python function.

      Result Feedback: The developer must then package the output of the tool execution into a ToolMessage and send it back to the Model, along with the previous conversation history, for the Model to complete its final reasoning and generate the answer.

    2. Tools give agents the ability to take actions. Agents go beyond simple model-only tool binding by facilitating: Multiple tool calls in sequence (triggered by a single prompt) Parallel tool calls when appropriate Dynamic tool selection based on previous results Tool retry logic and error handling State persistence across tool calls

      When you bind tools directly to a Model, the model makes a single, stateless decision. It suggests the best tool for the immediate prompt and then stops.

      The Agent, however, uses its loop (often ReAct: Reason, Act, Observe) to execute complex strategies

    3. An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - i.e., when the model emits a final output or an iteration limit is reached.

      The difference lies in autonomy and execution flow: A Model with Tools (via direct binding/function calling) is a single, stateless step where the LLM merely suggests the best tool and its arguments, requiring the developer to manually execute the tool and initiate any subsequent calls. In contrast, an Agent with Tools leverages an Agent Executor to manage a dynamic, multi-step loop (e.g., ReAct), where the LLM acts as the planner, deciding which tool to call next, and the Executor automatically runs the tool, feeds the observation back to the model, and repeats the cycle until the complex, multi-step goal is autonomously achieved.

  7. Nov 2025
  8. Oct 2025
    1. for - paper - title - Mental Time Travel? A Neurocognitive Model of Event Simulation - author - Donna Rose Addis - adjacency - memory - imagination - the same - from - paper - https://hyp.is/0Fb6NqdjEfCyTTddI20_aQ/www.dovepress.com/memory-sleep-dreams-and-consciousness-a-perspective-based-on-the-memor-peer-reviewed-fulltext-article-NSS

      summary - memory and imagination are proposed as fundamentally the same process. - It is the ‘mental’ rendering of experience that is the most fundamental function of this simulation system enabling humans to - re-experience the past, - pre-experience the future, and - comprehend the complexities of the present.

  9. Sep 2025
    1. I'm using this logic as as to build spacetime. But I think it's going to give an even more powerful approach. I don't have to minimize some free energy principle. I I have a more direct computational way

      for - future project - building a model to explain spacetime using Active Inference - Donald Hoffman - use Active Inference to minimise surprise using Markov chains - this model assumes consciousness is fundamental - this is going to be a model of intelligence based entirely from a model which takes consciousness as fundamental. - it goes back to game theory again. - back to the idea of a simulation - If you're able to create a piece of software that - is able to replicate and - is built on the fundamentals of consciousness. - Then it's potentially, it's going to think it's conscious

    2. All the egoic stuff that we do that causes all the problems in the world because you don't know who you are

      for - key insight / quote - the reified ego is the root cause of all the problems in the world - we reify because we don't know who we REALLY are - Donald Hoffman - All the egoic stuff that we do causes all the problems in the world because - you don't know who you are. - You're creating this whole thing. - You're not a little player. - You're the inventor of this whole thing. - You have nothing to prove and - you don't need to be better than anybody else. - They're also master creators. - They're creating entire universes that they perceive as well. - And my own take on on this is that - you and I are really the same one reality - just looking at itself through two different headsets, - two different avatars and having a conversation. - And maybe that's what is required for this one infinite intelligence to sort of know itself.

      • adjacency - poverty mentality - ego - problems of the world - samsara - nirvana - hologram model - Alan Watts - God playing hide and seek - Donald Hoffman
      • When we don't believe we can be this, we limit ourselves
        • That is, we suffer from self-inflicted poverty mentality
      • When he says we are the one same reality,
        • he is echoing the common spiritual teaching of the holographic metaphor where
          • the one nameless is distilling itself in so many separate identities to know itself,
        • Similiar to many spiritual teacher's teachings
          • Alan Watts referred to it as God playing Hide and Seek with itself
  10. Aug 2025
    1. Disease: Von-willebrand Disease (Vicenza)

      Model organism: Mice

      Variant: VWF NM_000552.5 c:3614G>A p.(R1205H), E3 module of D3 domain

      Patient Phenotypes: enhanced VWF clearance, reduced plasma VWF:Ag ratio (Reduced half-life).

      Other substitutions at the R1205 position results in similar enhanced VWF clearance. (Cysteine and Serine) Ref DOIs: (10.1111/jth.12875) (10.1160/TH07-09-0565)

      Suggests VWF-R1205 plays a specific roles in regulating VWF clearance in vivo. They authors finding demonstrage significant conformational changes to WVF-D'D3 region. Conformational changes trigger an enhanced macrophage-mediated clearance which is most likely modulated through SR-AI and LRP1 VWF clearance receptors.

  11. Jul 2025
    1. Philip, Rey (Editor)1 Show affiliations 1. Theory of Ontological Consciousness Project Description This interdisciplinary essay explores a forgotten hypothesis at the intersection of physics, philosophy, and fiction: that consciousness is not a byproduct of matter, but its ontological foundation. Tracing this idea from Heraclitus and Plato to Schrödinger and Penrose, the article integrates metaphysical traditions with quantum models and critiques of materialist reductionism. It introduces the Theory of Ontological Consciousness (TOC) — a literary-philosophical framework proposing ψ̂–Φ interactions as the generative basis of spacetime and form. The essay also reinterprets empirical anomalies, such as those documented by the Global Consciousness Project, as potential signatures of an underlying field of universal consciousness.  For more on the Theory of Ontological Consciousness, visit www.toc-reality.org and follow new updates via Medium    -   https://medium.com/@philiprey.org

      Philip, Rey (Editor)1 Description This interdisciplinary essay explores a forgotten hypothesis at the intersection of physics, philosophy, and fiction: that consciousness is not a byproduct of matter, but its ontological foundation. Tracing this idea from Heraclitus and Plato to Schrödinger and Penrose, the article integrates metaphysical traditions with quantum models and critiques of materialist reductionism. It introduces the Theory of Ontological Consciousness (TOC) — a literary-philosophical framework proposing ψ̂–Φ interactions as the generative basis of spacetime and form. The essay also reinterprets empirical anomalies, such as those documented by the Global Consciousness Project, as potential signatures of an underlying field of universal consciousness. For more on the Theory of Ontological Consciousness, visit www.toc-reality.org and follow new updates via Medium - https://medium.com/@philiprey.org

  12. Jun 2025
  13. May 2025
  14. Apr 2025
    1. the main reason consumers are buying the cheapest food rather than the best healthiest is because they are not being paid a living wage

      for - inequality - oligarchy - effects on consumerist habits - buying the cheapest - suggestion - migrate from corporation to cooperation model - private company to cooperative - new meme - corporation to cooperation

  15. Mar 2025
    1. for - doughnut economics - interactive diagram - adjacency - epiphany - combine sankey diagram and interactive doughnut diagram at all scales - biomimicry model - circulatory system - fractal splitting

      adjacency - between - epiphany - combine - sankey diagram - interactive doughnut diagram - biomimicry model - circulatory system - fractal splitting - multi-scale competency architecture - adjacency relationship - Just as our body's circulatory system is fractal at multiple scales, resource flows through the doughnut could be represented in the same way - Sankey diagram at multiple scale can be a biomimicry of fractal geometry of circulatory system of resource flows in doughnut economies - biomimicry

    1. for - Christine Wamsler - Lund University - homepage - from - youtube - Mindfulness World Community - Awareness, Care and Sustainability for Our Earth - https://hyp.is/GCUJ1APHEfCcr_vvv3lAFw/www.youtube.com/watch?v=CTUc_0GroGM

      research areas - sustainable cities - collaborative governance - city-citizen collaboration - citizen participation - sustainability and wellbeing - sustainability transformation - inner development goals - inner transformation - inner transition - existential sustainability

  16. Feb 2025
    1. hese rehab facilities the these addiction treatment centers they they they CL 85% of them in the US are based on the disease model 85% and an almost overlapping 85% uses 12-step methods as their primary primary uh um uh intervention method well you know that's hard to actually figure out because medicine is this and 12 steps has very little to do with medicine it's kind of based on a religious orientation

      > for - stats - addiction - rehab centers - 85% are based on disease model - and 85% use a religious oriented 12 step program

    2. so that's the model

      > for - addiction model - Marc Lewis - addiction diagram

      > summary - Marc gives a good summary of everything - prefrontal cortex in control of judgment - striatum in charge of - attraction - desire - craving - midbrain - dopamine system - dopamine goes to the striatum and sets up localized feedback cycle so crave more - then the striatum becomes hyperactivated in the presence of cues - then you get that mechanism of now appeal - that narrowing of attraction to the immediate reward and - the loss of everything else - the other stuff falls off the radar - then the connection between - the prefrontal cortex and - the striatum starts to become compromised - resulting in ego fatigue - The prefrontal cortex simply becomes less effective at control

  17. Jan 2025
    1. My Grading Philosophy Equity in education is a core belief for me, and I do my best to ensure students have the most equitable experience they can with me.As current and future teachers, we all must think about how best to support each of our students and their learning processes.Grades are often the least meaningful part of your learning process. I want the content, conversations, and experiences among students to be the highest priority. A growing body of research indicates that traditional grading works best for people who’ve learned how to “do school.” Letter grades alone don’t tell me or you enough about what you’ve learned. They also disadvantage many students.The class aims to give you more voice and choice in your grades. It considers that we all have different educational goals and various responsibilities that pull at our time. This will not lower my expectations for the students in this class or my belief in what you can learn. The focus will be on integrating your learning into your professional life. I will look for self-reflection, deep thinking, and the accuracy of your content knowledge. Please immerse yourself in the content from this class and apply it to your work with children. I want you to enjoy the class and learning. Less focus on grades and more on feedback will lessen stress and promote more engagement with the materials. I hope you will engage with the feedback from me and your classmates to nurture crucial skills that can be used across all your courses and in your careers.

      Prof. Taylor, the part I have seclected above is almost my educational philosophy, I deeply agree with it and will practice it in my future career. You are my role model and example, I am so lucky to be your student. Thank you.

    1. what science does is undermining or kind of challenging everything we believe to be right or all of our preconceptions about the world are challenged and sometimes completely reversed or revolutionised.

      for - adjacency - Deep Humanity - physiosphere - symbolosphere - language - science - preconceptions - hyperobjects - scientific model - prediction - YouTube - Beyond the perceptual envelope - Royal Institution - Deep Humanity BEing journeys - And, not or - example adjacency - between - preconceptions - concepts - scientific model - prediction - Deep Humanity - symbolosphere - physiosphere - language - science - adjacency relationship - Paradoxically, science overlays phenomenological reality with a constructed, symbolic layer - From a Deep Humanity perspective, the physiosphere is overlaid with the symbolosphere - The science narrative of - the deposition of animal remains over hundreds of millions of years make up - the cliffs we experience phenomenologically today - assumes the existence of hyperobjects we have no capacity to directly sense - Science is a process that - pays attention to our phenomenological reality - construct a story using specific concepts to explain the observed general class of phenomena in a consistent and repeatable - and most importantly, can predict new observable phenomena using the symbolic model Hence, science is a predictive activity which - begins in phenomenological reality, - the physiosphere - maps to symbols reality in a scientific model - the symbolosphere - makes new symbolic, predictions about phenomenological reality - and finally makes observations ink our phenomenological reality of the symbolically predicted phenomena to validate or refute - This process alternates between the two parallel worlds we seamlessly inhabit, - the physiosphere and - the symbolosphere - and this explains why achieve is - not either constructed OR discovered, but - is both constructed AND discovered

  18. Dec 2024
    1. his Holiness reminds us that the seeds of compassion are often in the relationship between a child and his mother excuse me that a mother provides for the child provides kindness and uh care for the child and represents this early seed of compassion

      for - adjacency - compassion / kindness - early model - HH Dalai Lama - Deep Humanity - mOTHER - Youtube - Tukdam talk - An Overview Of CHM’s Work On “Well-Being And Tukdam” - Prof. Richard J. Davidson

  19. Nov 2024
    1. Disorder studied: Type 1 von Willebrand disease (T1-VWD).

      Type of study: Translational

      Model organism: Mouse (inbred strains) Obtained from Jackson Laboratory

      Analyses:

      VWF plasma protein quantitation (ELISA)

      Hertiability calculations

      PCR genotyping

      QTL analysis

      Allele-specific primer extension analysis

      Results:

      Identified new modifier of VWF known as (Mvwf5). Also found two loci unliked to Vwf known as (Mvwf6-7)

      Mice with this variant displayed statistically significant decrease in VWF levels, recapitulating the decreasing patterns displayed in humans.

      However, another strain of inbred mice with a different mutation did not show an age-dependent decrease in VWF. Suggests strain-specific differences in regulation of VWF levels over time.

      Mvwf5 is a cis-regulatory variant altering Vwf mRNA expression.

      This is a natural variant of the Vwf allele among inbred strains of mice. Found this variant causes elevation in steady-state levels of Vwf mRNA.

      Authors state findings show equivalent of of type 1 VWD is remarkably common in mice and humans. ALso state the Mvwf1 analysis in wild mouse populations suggest this locus is under selective pressure.

      Of the 5 potential modifier loci identified, 3 display conservation of synteny with potential human modifier loci.

  20. Oct 2024
    1. The model treats corpus generation as a dynamic process, where the t-th word is produced at step t. Theprocess is driven by the random walk of a discourse vector ct ∈ <d. Its coordinates represent what is beingtalked about.2 Each word has a (time-invariant) latent vector vw ∈ <d that captures its correlations withthe discourse vector.

      the model is a random walk with t being the word and the RV being a random vector called a discourse vector of dimension d. This vector is a distributed representation of the semantics at word t.

      what is the discourse - is it one hot encoded, is it orthogonal, is it sparse, is it disentangled, is it compositional ? What is a small change in a single dimension???

    1. Erstmals wurde genau erfasst, welcher Teil der von Waldbränden betroffenen Gebiete sich auf die menschlich verursachte Erhitzung zurückführen lässt. Er wächst seit 20 Jahren deutlich an. Insgesamt kompensieren die auf die Erhitzung zurückgehenden Waldbrände den Rückgang an Bränden durch Entwaldung. Der von Menschen verursachte – und für die Berechnung von Schadensansprüchen relevante – Anteil der CO2-Emissione ist damit deutlich höher als bisher angenommen https://www.carbonbrief.org/climate-change-almost-wipes-out-decline-in-global-area-burned-by-wildfires/

    1. The similarity is because they are all saying roughly the same thing: Total (result) = Kinetic (cost) + Potential (benefit) Cost is either imaginary squared or negative (space-like), benefit is real (time-like), result is mass-like. Just like physics, the economic unfavourable models are the negative results. In economics, diversity of products is a strength as it allows better recovery from failure of any one, comically DEI of people fails miserably at this, because all people are not equal. Here are some other examples you will know if you do physics: E² + (ipc)² = (mc²)² (relativistic Einstein equation), mass being the result, energy time-like (potential), momentum the space-like (kinetic). ∇² - 1/c² ∂²/∂t² = (mc/ℏ)² (Klein-Gordon equation), mass is the result, ∂²/∂t² potential, ∇² is kinetic. Finally we have Dirac equation, which unlike the previous two as "sum of squares" is more like vector addition (first order differentials, not second). iℏγ⁰∂₀ψ + iℏγⁱ∂ᵢψ = mcψ First part is still the time-like potential, second part is the space-like kinetic, and the mass is still the result though all the same. This is because energy is all forms, when on a flat (free from outside influence) worksheet, acts just like a triangle between potential, kinetic and resultant energies. E.g. it is always of the form k² + p² = r², quite often kinetic is imaginary to potential (+,-,-,-) spacetime metric, quaternion mathematics. So the r² can be negative, or imaginary result if costs out way benefits, or work in is greater than work out. Useless but still mathematical solution. Just like physics, you always want the mass or result to be positive and real, or your going to lose energy to the surrounding field, with negative returns. Economic net loss do not last long, just like imaginary particles in physics.

      in reply to Cesar A. Hidalgo at https://x.com/realAnthonyDean/status/1844409919161684366

      via Anthony Dean @realAnthonyDean

    1. Oliver Sacks Archive Heads to the New York Public Library by [[Jennifer Schuessler]]

      The voluminous papers of the celebrated neurologist include letters, notebooks, drafts and other traces of a man who couldn’t stop writing.

      You have to love the boos, notebooks, papers, fountain pen, typewriter, computer, printer, and even writing software all pictured in this... Add the glasses and it just reeks of someone who reads and writes.

  21. Sep 2024
    1. The execution model is the definition of the behavior, so all implementations, whether in-order or out-of-order or interpreted or JIT'd etc.. must all give the exact same result, and that result is defined by the execution model.
    1. So there has to be a reality, deeper reality, out of which these spacetime reality that we call reality emerges. So so therefore the model to think of the model in your following way, consciousness is a quantum field.

      for - quote - consciousness - model of - as a quantum field - Federico Faggin - question - about Federico Faggin's quantum field theory of consciousness - Is it neo-dualistic?

      quote - consciousness - model of - as a quantum field - Federico Faggin - (see below) - Think of the body as a structure in space and time - It is both - classical - cells are made of particles, atoms and molecules that interact quantumly in space and time - AND fields - The body is a bridge between consciousness and the classical (objective spacetime) world - The body reports to the conscious field - and creates quantum states inside the cell

      potential future dialogue - Michael Levin and Federico Faggin - To unpack quantum states at cellular or subcellular level, it would be good to see a dialogue between Michael Levin and Federico Faggin

  22. Aug 2024
    1. a model of the self that is inherently Collective and flowing

      for - quote - model of a Self that is flowing and collective - John Vervaeke - similiarity to - Deep Humanity foundations on emptiness

      quote - model of a Self that is flowing and collective - John Vervaeke - This is equivalent to Stop Reset Go Deep Humanity foundation on the two pillars of emptiness - change and intertwingledness

  23. Jul 2024
    1. Teoria regulacji stanu sugeruje, że zmniejszona zdolność do regulowania pobudzenia może przyczyniać się do deficytów poznawczych wyższego poziomu w ADHD.Teoria ta, skonstruowana w modelu poznawczo-energetycznym, podkreśla deficyt czynników energetycznych (ostrzegających) wśródADHDpacjentów, co prowadzi do obudysfunkcja wykonawczai objawy nadpobudliwości [ 20 , 60 ].Zakłada ona, że ​​ogólna wydajność przetwarzania informacji poznawczych jest determinowana przez czynniki stanu, zwane także „pulami energetycznymi” (wysiłek, pobudzenie i aktywacja), w takim samym stopniu jak przez czynniki obliczeniowe (przetwarzanie poznawcze, kontrola wykonawcza).. Pula wysiłku charakteryzuje się asymilacją niezbędnej energii do spełnienia wymagań zadania, o której mówi się, że jest aktywowana, gdy aktualny stan energetyczny organizmu nie spełnia stanu wymaganego do wykonania zadania. Obejmuje ona czynnik pobudzenia, zdefiniowany jako fazowa odpowiedź, która jest czasowo zablokowana na przetwarzanie bodźców i jest zwykle pod wpływem intensywności sygnału i nowości, a także behawioralnie indeksowana przez wzorce snu i czuwania; oraz czynnik aktywacji zdefiniowany jako toniczna fizjologiczna gotowość do reagowania.Podejście poznawczo-energetyczne zakłada, że ​​aspekty kontroli wykonawczej wyższego rzędu zależą od stanu energetycznego jednostki, dlatego deficyty hamowania związane zADHDmoże być, przynajmniej częściowo, spowodowane dysfunkcją energetyczną, ponieważ zmniejszenie energii przewiduje niepowodzenie hamowania [ 61 ].

      Teoria regulacji pobudzenia (integrucjąca)

  24. Jun 2024