Document that dampening is skipped in SGD momentum first step (#152833)

Pointed out by https://x.com/hi_tysam/status/1917318692276174977/photo/2. It would be BC breaking to change this behavior 7 years after it has been decided, so we are documenting it first at the very least. <img width="642" alt="image" src="https://github.com/user-attachments/assets/3febcb07-e0ed-44a1-bd3b-a8e685711cb4" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/152833 Approved by: https://github.com/albanD
2025-10-20 21:14:14 +08:00 · 2025-05-05 09:22:07 -07:00
parent 99dac7005f
commit 3bc69cc08d
1 changed files with 3 additions and 1 deletions
--- a/torch/optim/sgd.py
+++ b/torch/optim/sgd.py
@ -238,7 +238,9 @@ SGD.__doc__ = (

        Moreover, the initial value of the momentum buffer is set to the
        gradient value at the first step. This is in contrast to some other
-        frameworks that initialize it to all zeros.
+        frameworks that initialize it to all zeros. One notable side effect
+        of this decision is that the first momentum value will not be scaled
+        by dampening. Dampening will be applied starting at the second step.

    """
 )