Attempts to fix#92656
BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731
Approved by: https://github.com/ngimel
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
I find that the original implementation of `post_localSGD_optimizer.step()` is incorrect:
Whenever `averager.average_parameters()` is called, the built-in step counter will be increased. Therefore, this should only be called exactly once per `optimizer.step()`. However, if a model has multiple param groups or params, the current implementation will call `averager.average_parameters()` multiple times and over-increase the step counter.
Relevant proposals since hierarchical SGD can be supported on `post_localSGD_optimizer`: https://github.com/pytorch/pytorch/issues/73382, https://github.com/pytorch/pytorch/issues/71325
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74737
Approved by: https://github.com/mrshenli
Summary:
The first arg of `PostLocalSGDState` ctor, `process_group`, cannot be empty. Here to simplify the usage, does not even create a subgroup explicitly.
See the example in unit test: 4feef6c970/torch/testing/_internal/distributed/distributed_test.py (L4260)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72792
Reviewed By: samdow
Differential Revision: D34213221
Pulled By: rohan-varma
fbshipit-source-id: 078343f3ee138e175bf835897f190032eb970662
(cherry picked from commit bf90af704fb371eef799a951007cc5d41dbe07a1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71621
Moves this feature to beta as discussed, and cleans up some docs.
Synced offline with wayi1 who mentioned that the current names are preferred
as he works to prototype hierarchical allreduce as discussed in this RFC: https://github.com/pytorch/pytorch/issues/71325.
ghstack-source-id: 147382940
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D33700444
fbshipit-source-id: 8eb543f5b02a119d0790a5c0919e6def6383a067
(cherry picked from commit 656e9809b2429d1924e008164a1f4ca770700a9a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65197
1. The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2. The parameters are read from local optimizer's param_groups instead of a separate input.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 138307226
Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity
Reviewed By: rohan-varma
Differential Revision: D31007439
fbshipit-source-id: bbb0526e6763ef76775b85088571506b3942c722
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885
1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2) The parameters are read from local optimizer's `param_groups` instead of a separate input.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 137865867
Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity
Reviewed By: rohan-varma
Differential Revision: D30888794
fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63340
Some methods are needed such as accessing optimizer states. These are necessary for integration with PyTorch Lightning.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 135912246
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD
Reviewed By: rohan-varma
Differential Revision: D30328794
fbshipit-source-id: e585b874313bd266fdc7c79936e2af98700c7bad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62131
Wrap `PeriodicModelAverager` as an optimizer.
Currently both the optimizer and averager require an input `params` arg, where the latter actually can read params from the optimizer wrapper. Will update averager class API in a follow-up PR.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 134560248
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity
Reviewed By: rohan-varma
Differential Revision: D29881465
fbshipit-source-id: b9634972f4d8bffd3b3eb94f5dbbb19db2bcd759