1 Matching Annotations
  1. May 2026
    1. Rather than supervising what the agent does, we supervise what it's _able_ to do by enforcing access boundaries through, for example, sandboxes, virtual machines, and egress controls.

      行动建议:为AI代理系统实施环境层边界控制,使用沙盒、虚拟机和出口控制技术限制代理的访问能力,而不是仅仅依赖行为监督。这种方法能够从根本上限制代理可能造成的损害范围,即使模型层防御失效。