Hypothesis

1 Matching Annotations

Jan 2021
arxiv.org arxiv.org

2010.13887.pdf

1
1. jethro 20 Jan 2021
  
  in Public
  
  GEMM operation latency in cuBLAS ac-counts for 82% and 88% respectively afteroptimization, accounting for most of the in-ference time. However, in the original Ten-sorFlow model, GEMM operations accountfor only 25%. This shows that beam searchoptimization has achieved good results.
  
  This suggests to me that the most of the latency saving comes from the improved beam search strategy.
Visit annotations in context

Annotators

jethro

URL

arxiv.org/pdf/2010.13887.pdf