Document that dampening is skipped in SGD momentum first step (#152833)

Pointed out by https://x.com/hi_tysam/status/1917318692276174977/photo/2.

It would be BC breaking to change this behavior 7 years after it has been decided, so we are documenting it first at the very least.

<img width="642" alt="image" src="https://github.com/user-attachments/assets/3febcb07-e0ed-44a1-bd3b-a8e685711cb4" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152833
Approved by: https://github.com/albanD
This commit is contained in:
Jane Xu
2025-05-05 09:22:07 -07:00
committed by PyTorch MergeBot
parent 99dac7005f
commit 3bc69cc08d

View File

@ -238,7 +238,9 @@ SGD.__doc__ = (
Moreover, the initial value of the momentum buffer is set to the
gradient value at the first step. This is in contrast to some other
frameworks that initialize it to all zeros.
frameworks that initialize it to all zeros. One notable side effect
of this decision is that the first momentum value will not be scaled
by dampening. Dampening will be applied starting at the second step.
"""
)