Hypothesis

Rather than supervising what the agent does, we supervise what it's _able_ to do by enforcing access boundaries through, for example, sandboxes, virtual machines, and egress controls.

行动建议：为AI代理系统实施环境层边界控制，使用沙盒、虚拟机和出口控制技术限制代理的访问能力，而不是仅仅依赖行为监督。这种方法能够从根本上限制代理可能造成的损害范围，即使模型层防御失效。

actionable security-containment how-to

Tags

Annotators

URL