12 Matching Annotations
  1. Sep 2018
  2. Jul 2018
  3. Oct 2016
    1. ADD Vd.4S, Vn.4S, Vm.4S

      和之前32bit neon对比,将操作类型和个数放到了操作数部分,不再通过指令区分

  4. Mar 2016
    1. The latencies shown assume the memory access hits in the Level 1 Data Cache

      命中L1 cache的情况下latency还有4,所以一定要把load都统一放前面一点儿。

    1. 6 - (3 × 5) = -10

      saturation这个地方需要注意

    2. Vectorizing examples

      很有用的方法论,如何向量化code

    3. VFP unit is therefore sometimes referred to as the Floating Point Unit (FPU)

      gcc那个mfpu选项就是这个含义了

    4. The ARMv6 architecture introduced a small set of SIMD instructions that operate on multiple 16-bit or 8-bit values packed into standard 32-bit ARM general-purpose register

      除了在neon的寄存器,还可以直接在arm的寄存器上进行SIMD