we improve the FP8 GEMM precision by promoting to CUDACores at an interval of 𝑁𝐶 = 128 elements MMA for the high-precision accumulation
这是什么意思
we improve the FP8 GEMM precision by promoting to CUDACores at an interval of 𝑁𝐶 = 128 elements MMA for the high-precision accumulation
这是什么意思
总结起来就是“高频外推、低频内插”,于是他通过令i=d/2−1i=d/2−1i = d/2-1时的Scale正好等于内插ScaleLtrainLtestLtrainLtest\frac{L_{train}}{L_{test}},得出方程 (10000κ)−2i/d|i=d/2−1=LtrainLtest10000−2i/d∣∣∣i=d/2−1(9)(9)(10000κ)−2i/d|i=d/2−1=LtrainLtest10000−2i/d|i=d/2−1\begin{equation}(10000\kappa)^{-2i/d}|_{i=d/2-1} = \left.\frac{L_{train}}{L_{test}}10000^{-2i/d}\right|_{i=d/2-1}\end{equation} 解得 κ=(LtestLtrain)d/(d−2)(10)
ntk的公式