1 Matching Annotations
  1. Dec 2025
    1. Adam optimizers maintain two moving averages for each model parameter: the first moment (mean) of the gradients and the second moment (uncentered variance) of the gradients. In other words, Adam optimizers store two additional values for each single model parameter in memory.

      additional weights that add up the memory.