These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under [delayed rewards](https://huggingface.co/papers?q=delayed%20rewards) and [partial observability](https://huggingface.co/papers?q=partial%20observability).
这些环境要求多步推理、在多个时间步长中连锁多个技能,以及在延迟奖励和部分可观测性下的稳健决策,这突显了长期交互环境对智能体能力的挑战。