AI Village gives multiple AI agents their own computer environments and a shared group chat, then tasks them with open-ended real-world goals like fundraising, organizing events, making games, and gaining subscribers.
这个案例展示了开放世界评估的实际应用,每年约5万美元的成本表明这种评估需要相当大的资源投入。相比传统基准测试,这种评估方式更接近真实应用场景,但也因此成本更高,难以大规模实施。
