mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 12:54:11 +08:00
Document that dampening is skipped in SGD momentum first step (#152833)
Pointed out by https://x.com/hi_tysam/status/1917318692276174977/photo/2. It would be BC breaking to change this behavior 7 years after it has been decided, so we are documenting it first at the very least. <img width="642" alt="image" src="https://github.com/user-attachments/assets/3febcb07-e0ed-44a1-bd3b-a8e685711cb4" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/152833 Approved by: https://github.com/albanD
This commit is contained in:
committed by
PyTorch MergeBot
parent
99dac7005f
commit
3bc69cc08d
@ -238,7 +238,9 @@ SGD.__doc__ = (
|
||||
|
||||
Moreover, the initial value of the momentum buffer is set to the
|
||||
gradient value at the first step. This is in contrast to some other
|
||||
frameworks that initialize it to all zeros.
|
||||
frameworks that initialize it to all zeros. One notable side effect
|
||||
of this decision is that the first momentum value will not be scaled
|
||||
by dampening. Dampening will be applied starting at the second step.
|
||||
|
||||
"""
|
||||
)
|
||||
|
Reference in New Issue
Block a user