However, this approach has its own challenges. As ML engineers move more data processing workloads to the training job, the training throughput slows down. To address this, they add more data loader workers that require more CPU and memory. Once the CPU / memory limit is reached, ML engineers continue to scale the machines vertically by provisioning expensive GPU machines that have more CPU and memory. The GPU resources in these machines are not adequately utilized as the training job is bottle-necked on CPU.
Last mile data processing