2 Matching Annotations
  1. Nov 2021
    1. It partitions optimizer state, gradients and parameters across multiple data parallel processes via a dynamic communication schedule to minimize the communication volume.

      ZeRO-DP 的原理是什么?

  2. Sep 2017