Diffusion models also waste resources when the desired output is only a few tokens long. They have to do a lot more parallel work to whittle down to, say, five tokens that an autoregressive model does from beginning to end in just five steps.
文章客观地指出了扩散模型在短文本生成时的局限性,显示了平衡的观点。这值得深入了解扩散模型在不同任务长度下的效率表现,以及Google是否针对这一局限性进行了优化。