dda071587f
Revert "Make distributed modules importable even when backend not built ( #159889 )" ( #162568 )
...
This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9.
Revert "Always build USE_DISTRIBUTED. (#160449 )"
This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568
Approved by: https://github.com/huydhn
2025-09-10 04:29:42 +00:00
d80297a684
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-08 19:10:36 +00:00
1e0656f063
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit de893e96c775023aa3be895060848fac3296772c.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053 ) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002 ))
2025-09-08 07:04:36 +00:00
de893e96c7
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-05 20:15:11 +00:00
adae7f66aa
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit c37103234afc832dcad307e9016230810957c9d5.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011 ))
2025-09-05 18:58:47 +00:00
c37103234a
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-04 19:43:17 +00:00
b7dad7dd49
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit 90b08643c3a6eb1f3265b7d1388bd76660759f46.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3254219358 ))
2025-09-04 15:25:07 +00:00
90b08643c3
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-03 07:33:55 +00:00
4e42aa8ffc
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit b7034e9c924412bfbe8ee25a22d7e95239b5ca65.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3246689684 ))
2025-09-02 20:28:42 +00:00
b7034e9c92
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-01 23:00:21 +00:00
42e51cd4b3
Support ddp zero hook XCCL path ( #159240 )
...
XCCL backend no https://github.com/pytorch/pytorch/issues/62300 issue, add xccl path here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159240
Approved by: https://github.com/guangyey , https://github.com/Skylion007 , https://github.com/EikanWang
2025-08-13 12:37:33 +00:00
f6c89c1ef3
Detach tensor before clone in SGD optimiser and other code ( #159204 )
...
Reverse the pattern of tensor clone followed by detach in SGD and other code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159204
Approved by: https://github.com/Skylion007
2025-07-27 03:31:12 +00:00
e2c9d8d641
Fix non-bitwise type annotations for Tensor operators (see #145838 ) ( #146845 )
...
Fix https://github.com/pytorch/pytorch/issues/145838
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146845
Approved by: https://github.com/Skylion007
2025-06-24 15:41:34 +00:00
3443627e07
Revert "[BE]: Enable RUFF TRY400 rule - log.exception ( #153473 )"
...
This reverts commit 4f4ecc583e0f48ad2d062a53bf91c61ab40b4948.
Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075 ))
2025-05-16 08:29:26 +00:00
4f4ecc583e
[BE]: Enable RUFF TRY400 rule - log.exception ( #153473 )
...
Change logging.error to logging.exception to log additional information when relevant. A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD , https://github.com/cyyever
2025-05-15 13:36:59 +00:00
3555ebb63d
[BE]: Update ruff to 0.11.8 ( #153249 )
...
Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249
Approved by: https://github.com/cyyever , https://github.com/albanD , https://github.com/seemethere
2025-05-12 18:30:52 +00:00
686dff0098
Fix an incorrect link markup ( #152239 )
...
Remove extra whitespace so the link works correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152239
Approved by: https://github.com/soulitzer
2025-04-28 18:28:08 +00:00
995df34b19
[BE][PYFMT] migrate PYFMT for torch.{distributed,distributions} to ruff format ( #144547 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144547
Approved by: https://github.com/kwen2501
2025-02-28 07:35:56 +00:00
302f56a1f2
Revert "Fix non-bitwise type annotations for Tensor operators (see #145838 ) ( #146845 )"
...
This reverts commit 59b7e52ad8f6146b4364515a7f3e54d6f3edd6da.
Reverted https://github.com/pytorch/pytorch/pull/146845 on behalf of https://github.com/jeanschmidt due to Seems to break a few code dependencies in multiple places ([comment](https://github.com/pytorch/pytorch/pull/146845#issuecomment-2666656834 ))
2025-02-18 19:01:27 +00:00
59b7e52ad8
Fix non-bitwise type annotations for Tensor operators (see #145838 ) ( #146845 )
...
Fix https://github.com/pytorch/pytorch/issues/145838
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146845
Approved by: https://github.com/Skylion007
2025-02-17 22:42:16 +00:00
00ffeca1b1
PEP585 update - torch/distributed ( #145164 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-21 04:23:29 +00:00
6374332d33
Revert "PEP585 update - torch/distributed ( #145164 )"
...
This reverts commit 6cb186e279bc179a6bb63f0226e24ab42a07b394.
Reverted https://github.com/pytorch/pytorch/pull/145164 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing an inductor test ([comment](https://github.com/pytorch/pytorch/pull/145164#issuecomment-2602875679 ))
2025-01-20 16:46:46 +00:00
6cb186e279
PEP585 update - torch/distributed ( #145164 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-20 00:19:01 +00:00
176cde6240
Use torch with statement in torch distributed module ( #144951 )
...
# Motivation
In https://github.com/pytorch/pytorch/pull/137678 , we help use the device-agnostic APIs to generalize distributed module. As this [comment](https://github.com/pytorch/pytorch/pull/137678#discussion_r1828645683 ) said, we will use the with statement of `torch.Stream` once https://github.com/pytorch/pytorch/pull/140138 is landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144951
Approved by: https://github.com/kwen2501 , https://github.com/albanD
2025-01-17 01:49:28 +00:00
08be9ec312
Migrate from Tuple -> tuple in torch/distributed ( #144258 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144258
Approved by: https://github.com/aorenste
2025-01-10 08:34:54 +00:00
e1196dfe51
Deprecate torch._utils.is_compiling() ( #127690 )
...
This PR is split from PR #126898 .
- #126898
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007 , https://github.com/malfet
2024-12-08 22:55:36 +00:00
08db735629
[BE]: Update mypy to 1.13.0 ( #140808 )
...
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang , https://github.com/malfet
2024-12-03 02:50:10 +00:00
daa77f3d9f
Revert "[BE]: Update mypy to 1.13.0 ( #140808 )"
...
This reverts commit 00134d68af2ce50560fa5a74473665ea229e6c9d.
Reverted https://github.com/pytorch/pytorch/pull/140808 on behalf of https://github.com/huydhn due to This is failing a distributed test in trunk, target determination missed this test and did not run it on PR ([comment](https://github.com/pytorch/pytorch/pull/140808#issuecomment-2512788426 ))
2024-12-02 20:47:43 +00:00
00134d68af
[BE]: Update mypy to 1.13.0 ( #140808 )
...
Update mypy to 1.13.0 . Should hopefully reduce linting time. Has support for orjson cache serialization which should improve mypy cache perf if orjson is installed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140808
Approved by: https://github.com/ezyang , https://github.com/malfet
2024-12-02 18:47:54 +00:00
c82c46ccc7
[C10D] support group_src/dst in broadcast/reduce ops ( #140843 )
...
Also add mypy annotations
Partially addresses RFC 0042 (pytorch/rfcs#71)
See more details/motivation in #140460
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140843
Approved by: https://github.com/kwen2501
2024-11-19 01:23:08 +00:00
1886e33f60
Use device-agnostic runtime API in distributed DDP/FSDP instead of cuda device specific. ( #137678 )
...
# Motivation
This PR targets to use device-agnostic runtime API in distributed DDP/FSDP instead of `cuda` device specific.
cc cc [@jgong5](https://github.com/jgong5 ) [@gujinghui](https://github.com/gujinghui ) [@EikanWang](https://github.com/EikanWang ) [@fengyuan14](https://github.com/fengyuan14 ) [@guangyey](https://github.com/guangyey )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137678
Approved by: https://github.com/kwen2501 , https://github.com/guangyey , https://github.com/jgong5
2024-11-13 05:32:19 +00:00
1d28b8b6d5
Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() ( #127690 )"
...
This reverts commit e84d1121ad66a453c8c24fcc098625e2e9764fca.
Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. More details in D65483292 ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2458381056 ))
2024-11-05 23:10:38 +00:00
e84d1121ad
Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() ( #127690 )
...
This PR is split from PR #126898 .
- #126898
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007 , https://github.com/malfet
2024-11-05 10:44:56 +00:00
9d7a0869f0
Make DDP Quantization hooks backend Agnostic ( #138816 )
...
Current ddp hooks quantization code use .cuda() API to move tensors and parameter on backend devices. This limits only cuda backend to work with ddp quantization hooks.
Change is to make code backend agnostic and move tensors/parameters based on **tensor.device.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138816
Approved by: https://github.com/kwen2501
2024-10-29 15:02:45 +00:00
c0582fd0f8
Remove unused Python variables in torch/[b-z]* ( #136963 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
82443798aa
[Distributed] Refactor compress hook to remove duplicated code ( #138182 )
...
Fix TODO in code
```python
# TODO: create an internal helper function and extract the duplicate code in FP16_compress and BF16_compress.
```
1. Extract common logic in `fp16_compress_hook` and `bf16_compress_hook` to `_compress_hook` method
2. Let `fp16_compress_hook` and `bf16_compress_hook` invoke `_compress_hook` with difference `dtype`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138182
Approved by: https://github.com/awgu
2024-10-18 06:01:15 +00:00
31715be72a
[BE]: Update mypy to 1.11.2 ( #133816 )
...
Updates mypy to 1.11.1 to improve type inference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-16 19:44:11 +00:00
3117f2cf67
Revert "[BE]: Update mypy to 1.11.2 ( #133816 )"
...
This reverts commit 55299cfc223fa838aadd8d6d6fa3ed541fa5acd1.
Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684 ))
2024-09-16 09:11:16 +00:00
55299cfc22
[BE]: Update mypy to 1.11.2 ( #133816 )
...
Updates mypy to 1.11.1 to improve type inference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-14 21:40:36 +00:00
758a0a88a2
[BE][Easy] enable ruff rule PIE790: unnecessary pass statement ( #133200 )
...
This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change.
Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200
Approved by: https://github.com/malfet , https://github.com/eqy , https://github.com/kit1980
2024-08-15 15:50:19 +00:00
cbee9c1fd2
Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() ( #127690 )"
...
This reverts commit 0e7e61f7cec82a43f2de52b83eff152d703be7a3.
Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2272370386 ))
2024-08-07 00:05:20 +00:00
0e7e61f7ce
Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() ( #127690 )
...
This PR is split from PR #126898 .
- #126898
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007 , https://github.com/malfet
2024-08-03 09:43:38 +00:00
72d2dba992
Add None return type to init ( #132335 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
1dd10ac802
[BE] [Reland] Make nn.Module state_dict load_state_dict pre-hook and state_dict post-hook public ( #131690 )
...
Reland https://github.com/pytorch/pytorch/pull/126704
#### Fixes the issue with type of `nn.Module._state_dict_hooks` being changed in that PR which was problematic:
Instead of using `Tuple(Callable, bool)` to keep track of whether the private `_register_state_dict_hook` or the public `register_state_dict_post_hook` API was used to register the hook and toggle the behavior accordingly, I set an attribute on the Callable in the private API, which is never cleaned up.
If a callable previously registered using the private API is registered via the public API, a RuntimeError will be raised
#### Copied from previous PR description
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437
- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
- Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
- For issuet by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
- Document this for private `_register_state_dict_hook`
- Remove this for the public `register_state_dict_post_hook`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131690
Approved by: https://github.com/albanD
2024-07-26 18:14:07 +00:00
f6edd1f7c9
[BE] Make ActivationWrapper an abstract class ( #129808 )
...
Fixes #95481
Test Plan:
Unit tested checkpoint_wrapper.py by instantizing ActivationWrapper and got TypeError as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129808
Approved by: https://github.com/Skylion007
2024-07-02 04:29:43 +00:00
e6d4451ae8
[BE][Easy] enable UFMT for torch/distributed/{algorithms,autograd,benchmarks,checkpoint,elastic}/ ( #128866 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128866
Approved by: https://github.com/fegin
2024-06-18 13:51:53 +00:00
1d233b8f50
Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public ( #126704 )"
...
This reverts commit c38b3381a12a0ec033dd417827c530c4474b8165.
Reverted https://github.com/pytorch/pytorch/pull/126704 on behalf of https://github.com/clee2000 due to broke internal typecheck D58394110 (which probably means the code wouldn't work either but I guess it didn't run on the diff). Probably an easy fix? ([comment](https://github.com/pytorch/pytorch/pull/126704#issuecomment-2161299193 ))
2024-06-11 17:45:20 +00:00
c38b3381a1
Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public ( #126704 )
...
Fixes https://github.com/pytorch/pytorch/issues/75287 and https://github.com/pytorch/pytorch/issues/117437
- `nn.Module._register_state_dict_hook` --> add public `nn.Module.register_state_dict_post_hook`
- Add a test as this API was previously untested
- `nn.Module._register_load_state_dict_pre_hook` --> add public `nn.Module.register_load_state_dict_pre_hook` (remove the `with_module` flag, default it to `True`
~- For consistency with optimizer `load_state_dict_pre_hook` raised by @janeyx99, allow the pre-hook to return a new `state_dict`~
- Document issue pointed out by https://github.com/pytorch/pytorch/issues/117437 regarding `_register_state_dict_hook` semantic of returning a new state_dict only being respected for the root for private hook
- Remove this for the public `register_state_dict_post_hook`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126704
Approved by: https://github.com/albanD
ghstack dependencies: #126906
2024-06-10 21:50:17 +00:00
90bb510ece
Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() ( #127690 )"
...
This reverts commit 348b181a97abc2e636a6c18e5880a78e5d1dab94.
Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/clee2000 due to sorry I think https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456 is still relevant, I will reach out to them to see what needs to be done in internal to get this remerged ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2159248859 ))
2024-06-10 20:44:42 +00:00
3a0d088517
Flip default value for mypy disallow_untyped_defs [5/11] ( #127842 )
...
See #127836 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127842
Approved by: https://github.com/oulgen
2024-06-08 18:49:18 +00:00