6 Matching Annotations
  1. Apr 2020
  2. Dec 2017
    1. Warp divergence

      When threads inside a warp branches to different execution paths. Instead of all 32 threads in the warp do the same instruction, on average only half of the threads do the instruction when warp divergence occurs. This causes 50% performance loss.

    2. make as many consecutive threads as possible do the same thing

      an important take-home message for dealing with branch divergence.

    3. Warps are run concurrently in an SM

      This statement conflicts the statement that only one warp is executed at a time per SM.

    4. Each SM has multiple processors but only one instruction unit

      Q: Only one instruction unit is a SM, and a SM has many warps. Does this imply all these warps within the same SM execute the same set of instructions?

      A: No. Each SM has a (cost-free) warp scheduler which prioritizes ready warps (along the time dimension). Take a look at the figure in page 6. of http://www.math.ncku.edu.tw/~mhchen/HPC/CUDA/GPGPU_Lecture5.pdf

  3. Nov 2017