Consider the deep anatomy of an individual AI session to understand how this telemetry actually works in practice.
作者呼吁深入理解单个AI会话的内部结构,以便更好地理解人工智能的使用和度量。
Consider the deep anatomy of an individual AI session to understand how this telemetry actually works in practice.
作者呼吁深入理解单个AI会话的内部结构,以便更好地理解人工智能的使用和度量。
you can run an inferior model for an infinite number of tokens, and it will never realize(*) that the lack of validation of the start window, if put together with the integer overflow, then put together with the fact the branch where the node should never be NULL is entered regardless, will produce the bug.
作者通过OpenBSD SACK bug的例子提供了一个令人惊讶的发现:弱模型无论运行多久都无法理解复杂漏洞的因果关系。这揭示了AI在理解复杂系统交互方面的根本局限性,挑战了'无限计算可解决任何问题'的假设。
Meta feels AI models don’t understand how people use computers, so the company needs real-life examples of how meatbags click their way through a working day so it can build agents.
大多数人认为AI模型能够很好地理解人类行为,但作者指出Meta认为AI模型并不理解人类如何使用电脑,这挑战了AI技术的普遍认知。
We found weak evidence that Opus 4.0 and 4.1 had partially memorized cal, but no evidence Opus 4.6 had memorized it, despite performing best of all models considered.
这一发现令人意外,因为性能最佳的模型反而没有表现出记忆效应。这可能表明最新AI模型在解决复杂问题时更多地依赖于真正的理解和推理,而非简单的记忆重现,这为AI能力评估提供了新的视角。
5.12 Be cautious about trusting AI without having deep understanding.
5.12 Be cautious about trusting AI without having deep understanding.