In pixel-native generation, more inference often means sampling more outputs: generate twenty images, pick the best one, maybe try again. That is useful, but every attempt is mostly a new roll of the dice.
作者认为当前主流的像素原生生成方法本质上是在'掷骰子',每次尝试都是全新的随机生成。这一观点挑战了当前扩散模型通过增加推理次数提升质量的共识,暗示这种方法效率低下且缺乏系统性改进。