In the cloud, autoregressive models can batch large numbers of compute jobs from multiple users so they're always churning out tokens, and the high bandwidth memory (HBM) used in these systems can move data around much more efficiently.
文章的核心论点之一,解释了为什么扩散模型更适合本地处理而非云端。这一技术分析值得深入了解,因为它可能影响未来AI模型架构的发展方向。