From 3bc69cc08d45ec4357ad109e0f1d00dd2c9c9956 Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Mon, 5 May 2025 09:22:07 -0700 Subject: [PATCH] Document that dampening is skipped in SGD momentum first step (#152833) Pointed out by https://x.com/hi_tysam/status/1917318692276174977/photo/2. It would be BC breaking to change this behavior 7 years after it has been decided, so we are documenting it first at the very least. image Pull Request resolved: https://github.com/pytorch/pytorch/pull/152833 Approved by: https://github.com/albanD --- torch/optim/sgd.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/torch/optim/sgd.py b/torch/optim/sgd.py index ddf247ef6559..8002df8b308d 100644 --- a/torch/optim/sgd.py +++ b/torch/optim/sgd.py @@ -238,7 +238,9 @@ SGD.__doc__ = ( Moreover, the initial value of the momentum buffer is set to the gradient value at the first step. This is in contrast to some other - frameworks that initialize it to all zeros. + frameworks that initialize it to all zeros. One notable side effect + of this decision is that the first momentum value will not be scaled + by dampening. Dampening will be applied starting at the second step. """ )