Leitersdorf thinks the consistency issue might be partially solved in the model's next version, which will allow users to start generating worlds based on a video of an environment rather than an image.
大多数人认为AI世界模型应该从文本或简单图像生成复杂场景,但作者暗示未来发展方向是基于视频输入生成环境。这一观点挑战了当前AI生成的主流范式,暗示视频可能比静态图像更适合作为世界模型的基础输入,这违背了行业对文本作为主要输入的共识。