pretraining_tp (int, optional, defaults to 1) — Experimental feature. Tensor parallelism rank used during pretraining. Please refer to this document to understand more about it. This value is necessary to ensure exact reproducibility of the pretraining results. Please refer to this issue.
[!NOTE] 模型的
pretraining_tp
是指什么?flashcard
预训练时的张量并行度