Fortunately, this problem can be resolved by the following observation: The final rounding decisionsfor column i are only affected by updates performed on this very column, and so updates to latercolumns are irrelevant at this point in the process. T
算法层OK,但具体实现时需要频繁计算更新,内存调用效率并不高,因此设置处理区间,区间的参数量化完成后在计算更新
合理设置读取块的大小,使得GPU最快的部分能一次完成计算,而不是频繁置换