1 Matching Annotations
- Jan 2021
-
arxiv.org arxiv.org
-
GEMM operation latency in cuBLAS ac-counts for 82% and 88% respectively afteroptimization, accounting for most of the in-ference time. However, in the original Ten-sorFlow model, GEMM operations accountfor only 25%. This shows that beam searchoptimization has achieved good results.
This suggests to me that the most of the latency saving comes from the improved beam search strategy.
-