1 Matching Annotations
  1. Jan 2021
    1. GEMM operation latency in cuBLAS ac-counts for 82% and 88% respectively afteroptimization, accounting for most of the in-ference time. However, in the original Ten-sorFlow model, GEMM operations accountfor only 25%. This shows that beam searchoptimization has achieved good results.

      This suggests to me that the most of the latency saving comes from the improved beam search strategy.