Hypothesis

4 Matching Annotations

Jun 2022
arxiv.org arxiv.org

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LMEfficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1
1. sunzhensu 10 Jun 2022
  
  in Public
  
  the high numberof compute operations required can result in unrealistically longtraining times (e.g., training GPT-3 with 175 billion parameters [11 ]would require approximately 288 years with a single V100 NVIDIAGPU).
  
  GPT3训练成本
  
  GreenAI
Visit annotations in context

Tags

GreenAI

Annotators

sunzhensu

URL

arxiv.org/pdf/2104.04473
www.semanticscholar.org www.semanticscholar.org

Semantic Scholar

1
1. sunzhensu 10 Jun 2022
  
  in Public
  
  As a concrete measure, we suggest reporting the total number of floating point operations (FPO) required togenerate a result.13 FPO provides an estimate to the amount of work performed by a computational process. It iscomputed analytically by defining a cost to two base operations, ADD and MUL. Based on these operations, the FPOcost of any machine learning abstract operation (e.g., a tanh operation, a matrix multiplication, a convolution operation,or the BERT model) can be computed as a recursive function of these two operations. FPO has been used in the pastto quantify the energy footprint of a model [26, 42, 12, 41], but is not widely adopted in AI
  
  FLOPs的介绍
  
  GreenAI
Visit annotations in context

Tags

GreenAI

Annotators

sunzhensu

URL

semanticscholar.org/reader/fb73b93de3734a996829caf31e4310e0054e9c6b
arxiv.org arxiv.org

2104.10350.pdf

1
1. sunzhensu 06 Jun 2022
  
  in Public
  
  For example, NVIDIA estimated that 80–90% of the ML workload is inference processing [Leo19]. Similarly,Amazon Web services claimed that 90% of the ML demand in the cloud is for inference [Bar19].
  
  inference整体能耗论据
  
  GreenAI
Visit annotations in context

Tags

GreenAI

Annotators

sunzhensu

URL

arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf
ar5iv.labs.arxiv.org ar5iv.labs.arxiv.org

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

1
1. sunzhensu 01 Jun 2022
  
  in Public
  
  A clear recent trend in the AI community is that models are getting significantly larger. It only took 3 months to shift the title of the largest model from BERT-Large to GPT-2 (Radford et al. 2019) in 2020 while the number of parameters of GPT-2 is around 5 times larger than that of BERT-Large. Moreover, GPT-2 further evolves into GPT-3 (Brown et al. 2020) with 175 Billion parameters. More recently, GLM (Du et al. 2021) has clinched the title with surprisingly 1.75 Trillion parameters. These large models consume more data and have better performance than their smaller counterparts
  
  AI模型不断变大的发展趋势
  
  GreenAI
Visit annotations in context

Tags

GreenAI

Annotators

sunzhensu

URL

ar5iv.labs.arxiv.org/html/2110.14883

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL