By applying compute otherwise that goes unutilized to predict and verify additional tokens in parallel (up to three in this implementation), throughput at high interactivity is increased.
大多数人认为计算资源应该用于当前任务,但作者提出利用未充分利用的计算资源并行预测额外令牌的创新方法,这挑战了传统计算资源分配的常识,暗示了AI计算效率的全新可能性。