Commit Graph

23 Commits

Author SHA1 Message Date
443edb9015 [DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721)
Fix the simple of saving and reloading PowerSGD state and hook.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102721
Approved by: https://github.com/H-Huang
2023-06-10 00:15:00 +00:00
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
ae5c166035 Fix two small typos in ddp_comm_hooks.rst (#82047)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82047
Approved by: https://github.com/kit1980
2022-07-23 19:10:57 +00:00
8a6d83079c Functionality/pickling for commhooks (#79334)
This PR addresses issue address #75666.
Stateful communication hook now can be saved and reloaded to resume training.

Current PR adds the functionality for PowerSGD communication hook and tests that communication hook can be properly saved and restored.

PowerSGD implementation uses ``__slots__``, as a result introduced __getstate__ and __setstate__ methods are implemented to work with `__slots__` and not` __dict__`.

`__getstate__ `

	 Returns:
           A dictionary that represents a ``PowerSGDState`` which will be pickled and saved.
          ``process_group`` is non-serializable and excluded from a returned state.

`__setstate__`

	Takes a provided ``state`` and retrieves ``PowerSGDState``.
        ``process_group`` is set to default with a proper warning issued to a user.

Unit test

A hook-independent `_test_hook_pickling` is added with this PR, as well as `test_ddp_hook_pickling_powerSGD`, which tests `powerSGD`’s ability to be saved and reloaded.

Currently, the test creates a ddp model with a provided hook, trains it for 10 epochs and saves model’s state and hook’s state.
During reloading, unit test makes sure that a warning was logged (only one warning and the proper one). It then proceeds to check that reloaded hook and original hook are the same. Finally, it checks that a hook’s state was properly initialized:
	- it compares slot values (all, but 2: `process_group` and `rng`) for original and reloaded state
	- it checks that process group was set to a default group
	- it checks that a random state was restored properly with np.testing.assert_array_equal, because `rng` is an instance of `np.random.RandomState`, represented by a tuple. One of entries is of `ndarray dtype[uint32]` type and `np.testing.assert_array_equal` is used for assertion.

Future To-Do:
	- Implement similar __getstate__ and __setstate__ for other stateful communication hooks
	- Add appropriate tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79334
Approved by: https://github.com/rohan-varma, https://github.com/awgu
2022-06-16 23:15:34 +00:00
778af56504 [DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64352

as title
ghstack-source-id: 137246253

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30694089

fbshipit-source-id: a78110b11d59bb0718f43c99ede23f2fd8ab21d0
2021-09-01 17:37:19 -07:00
a8f9aab840 [DDP Comm Hook] Add bf16 gradient compression to ddp_comm_hooks.rst (#64346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64346

as title
ghstack-source-id: 137170288

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30693513

fbshipit-source-id: 8c64b8404ff3b0322e1bbbd93f6ef051ea91307d
2021-09-01 16:34:00 -07:00
7a3f1386ae Add GradBucket::parameters() to ddp_comm_hooks.rst (#62877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62877

as title
ghstack-source-id: 135214612

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30153490

fbshipit-source-id: d4cec434a53ef6e65b60c065804884d1a114aa0d
2021-08-06 14:50:47 -07:00
34c9f5a8da [DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662

Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface.

Reviewed By: SciPioneer

Differential Revision: D30012869

fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482
2021-08-04 09:27:31 -07:00
db071ef005 [Reland][DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592

Reland #62510

`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.
ghstack-source-id: 134848352

Test Plan: unit test

Reviewed By: andwgu

Differential Revision: D30049431

fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9
2021-08-02 16:38:09 -07:00
6f95850127 Revert D30024161: [DDP Communication Hook] Rename 4 Methods of GradBucket Class
Test Plan: revert-hammer

Differential Revision:
D30024161 (29c8b1db57)

Original commit changeset: 07e6072a2f7b

fbshipit-source-id: d571c2caadaf7b71fe2aba3c0597bd8074d153de
2021-08-02 10:26:54 -07:00
29c8b1db57 [DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510

`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.

Test Plan:
Local run comprehensive test with following results:
https://pxl.cl/1Ml8b
For two timeout failure test cases, most likely environment related and fail in my devserver.

Reviewed By: SciPioneer

Differential Revision: D30024161

fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f
2021-08-02 09:33:32 -07:00
581bf01074 [Gradient Compression] Remove unnecessary warning on the rst file and the check on C++ version (#58170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58170

Now comm hook can be supported on MPI and GLOO backends besides NCCL. No longer need these warnings and check.
ghstack-source-id: 128799123

Test Plan: N/A

Reviewed By: agolynski

Differential Revision: D28388861

fbshipit-source-id: f56a7b9f42bfae1e904f58cdeccf7ceefcbb0850
2021-05-12 14:15:10 -07:00
6a2f046504 [SPMD] Restrict DDP communication hooks to SPSD mode (#55253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55253

Previously DDP communication hooks takes a tensor list as the input. Now only takes a single tensor, as the preparation of retiring SPMD and only providing a single model replica for DDP communication hooks.

The next step is limiting only 1 model replica in Reducer.
ghstack-source-id: 125677637

Test Plan: waitforbuildbot

Reviewed By: zhaojuanmao

Differential Revision: D27533898

fbshipit-source-id: 5db92549c440f33662cf4edf8e0a0fd024101eae
2021-04-05 16:46:47 -07:00
e593044748 [Gradient Compression] Update a warning in ddp_comm_hooks.rst (#55031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55031

It turns out that PowerSGD hooks can work on PyTorch native AMP package, but not Apex AMP package, which can somehow mutate gradients during the execution of communication hooks.

{F561544045}
ghstack-source-id: 125268206

Test Plan:
Used native amp backend for the same pytext model and worked:
f261564342
f261561664

Reviewed By: rohan-varma

Differential Revision: D27436484

fbshipit-source-id: 2b63eb683ce373f9da06d4d224ccc5f0a3016c88
2021-04-02 12:07:50 -07:00
4b00bce156 [Gradient Compression] Introduce fp16_compress_wrapper in ddp_comm_hooks.rst (#54052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54052

Introduce `fp16_compress_wrapper`, which can give some speedup on top of some gradient compression algorithms like PowerSGD.

ghstack-source-id: 124001805

Test Plan: {F509205173}

Reviewed By: iseessel

Differential Revision: D27076064

fbshipit-source-id: 4845a14854cafe2112c0caefc1e2532efe9d3ed8
2021-03-16 15:40:10 -07:00
3078233e9a [Gradient Compression] Make FP16 compression as a wrapper that can be combined with other communication hooks (#53808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53808

Create a FP16 wrapper that can combine FP16 gradient compression with any gradient compression algorithm.

Test Plan:
Unit test:
```
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper
```

Performance Test on DDP QPS Benchmark: Check if AllReduce + FP16 Wrapper = FP16 Compression
1) FP16 Compression:
f256897690

2) FP16 Wrapper + AllReduce (after patching D26960986):
f256897289

Reviewed By: SciPioneer

Differential Revision: D26978832

fbshipit-source-id: 0dcd18b050c02f5e9f3cff56344d1f39a04e20c0
2021-03-12 17:31:07 -08:00
8d8a4a0624 Remove the extra ":noindex:" in ddp_comm_hooks.rst (#53855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53855

Remove "noindex" here:

{F492926346}
ghstack-source-id: 123724419

Test Plan:
waitforbuildbot

The failure on doctest does not seem to be relevant.

Reviewed By: rohan-varma

Differential Revision: D26967086

fbshipit-source-id: adf9db1144fa1475573f617402fdbca8177b7c08
2021-03-11 17:26:50 -08:00
fe0810e2f8 Add a section to introduce GradBucket class in ddp_comm_hooks.rst (#53253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53253

Since GradBucket class becomes public, mention this class in ddp_comm_hooks.rst.

Screenshot:
{F478201008}

ghstack-source-id: 123596842

Test Plan: viewed generated html file

Reviewed By: rohan-varma

Differential Revision: D26812210

fbshipit-source-id: 65b70a45096b39f7d41a195e65b365b722645000
2021-03-10 16:14:34 -08:00
b6806308ac typo in docs ddp_comm_hooks.rst (#51986)
Summary:
Fixes a typo in ddp_comm_hooks.rst

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51986

Reviewed By: SciPioneer

Differential Revision: D26360314

Pulled By: mrshenli

fbshipit-source-id: 50349501c53823cbcbad0f72be7c6ac9d51a4120
2021-02-11 12:02:37 -08:00
9e4f3b89c4 [Gradient Compression] Add register_comm_hook API to DDP communication hooks documentation page (#51846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51846

`register_comm_hook` method is defined in DistributedDataParallel module, but it is not covered in `distributed.rst`. Since it's closely related to DDP communication hook, add the docstrings to `ddp_comm_hooks.rst` instead of a reference.

Screenshot:

{F370425625}
ghstack-source-id: 121278173

Test Plan:
view locally

python_doc_test:
https://app.circleci.com/pipelines/github/pytorch/pytorch/271234/workflows/dc0b443d-8a62-4334-9b42-800c33a68553/jobs/10770636

Reviewed By: rohan-varma

Differential Revision: D26298191

fbshipit-source-id: 32e0685fd3c935cf9a2d129e6c520a94aa3e3817
2021-02-08 15:12:39 -08:00
4b3c99ce4a [Resubmission] Add a documentation page for DDP communication hooks (#51773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51773

Resubmission of #51715.

Minor changes:
1) Removed "Note [Guidance to Tune ``matrix_approximation_rank`` And ``start_powerSGD_iter``]" in powerSGD_hook.py.

2) Removed the duplicate description of `torch.nn.parallel.DistributedDataParallel.register_comm_hook` in ddp_comm_hooks.rst, because it is already covered by distributed.rst.

Also updated the doc based on the comments from PowerSGD paper author Thijs Vogels .

It seems that `python_doc_test` was flaky. The previous error message was not informative:
https://app.circleci.com/pipelines/github/pytorch/pytorch/270682/workflows/8d186a3c-d682-46bf-b617-ad4eef5991e2/jobs/10739143, and all the warnings did also appear on the master branch.

Rebasing to a new master branch seems to get this fixed:
https://app.circleci.com/pipelines/github/pytorch/pytorch/270696/workflows/1a3adbea-6443-4876-b87b-e17d90d41428/jobs/10740021/steps

Screenshot:

{F369899792}
ghstack-source-id: 121199613

Test Plan: View locally

Reviewed By: mingzhe09088

Differential Revision: D26272687

fbshipit-source-id: 6677db496a68171798940a80343f4d9a508e15db
2021-02-06 21:22:04 -08:00
d3023d86ba Revert D26249330: [Gradient Compression] Add a documentation page for DDP communication hooks
Test Plan: revert-hammer

Differential Revision:
D26249330 (e62aabac43)

Original commit changeset: ab973390ddb7

fbshipit-source-id: d508daed76219e7ca588cf7fb38aeaaffc61acfd
2021-02-04 22:38:06 -08:00
e62aabac43 [Gradient Compression] Add a documentation page for DDP communication hooks (#51715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51715

Add a documentation page for DDP communication hooks.

Screenshot:

{F369781049}

Test Plan: View locally

Reviewed By: pritamdamania87

Differential Revision: D26249330

fbshipit-source-id: ab973390ddb785c5191f587a1b2b6de7d229e50e
2021-02-04 18:53:53 -08:00