00059db034
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367 ))
2025-09-25 13:47:46 +00:00
09cb34c1dc
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-22 21:12:18 +00:00
f0078941cf
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530 ))
2025-09-22 05:39:07 +00:00
6c334885d4
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-12 10:54:42 +00:00
6b59a19242
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880 ))
2025-09-12 06:52:03 +00:00
6e8f17c580
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-12 03:56:18 +00:00
dda071587f
Revert "Make distributed modules importable even when backend not built ( #159889 )" ( #162568 )
...
This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9.
Revert "Always build USE_DISTRIBUTED. (#160449 )"
This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568
Approved by: https://github.com/huydhn
2025-09-10 04:29:42 +00:00
a0d026688c
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-08 19:10:36 +00:00
29e09a6545
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053 ) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002 ))
2025-09-08 07:04:36 +00:00
01edcd4df8
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-05 20:15:11 +00:00
70f865ac9b
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit ef3be6726f7ff4b77c22db10cec5b686f9107ea9.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011 ))
2025-09-05 18:58:47 +00:00
ef3be6726f
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-04 20:05:50 +00:00
34aa78274d
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 4ae57d448c0a7d37e4cfd5c27d977fad2cef4051.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Failing internal tests, probably typechecks. See D81588399 ([comment](https://github.com/pytorch/pytorch/pull/159889#issuecomment-3253651785 ))
2025-09-04 13:13:52 +00:00
4ae57d448c
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-03 07:33:55 +00:00
420c52ecf3
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 626cb7df8161dd4ecb4fe43b60f37ce9076f56b1.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/159889#issuecomment-3246677982 ))
2025-09-02 20:24:01 +00:00
626cb7df81
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-01 23:00:21 +00:00
4ccc0381de
[BE][5/16] fix typos in torch/ (torch/distributed/) ( #156315 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156315
Approved by: https://github.com/Skylion007 , https://github.com/albanD
ghstack dependencies: #156313 , #156314
2025-06-23 02:57:28 +00:00
145d4cdc11
Revert "[BE][5/16] fix typos in torch/ (torch/distributed/) ( #156315 )"
...
This reverts commit c2f0292bd5b4b3206f5b295e96f81cd6c178eb18.
Reverted https://github.com/pytorch/pytorch/pull/156315 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912 ) [HUD commit link](c95f7fa874
) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213 ))
2025-06-22 12:31:57 +00:00
c2f0292bd5
[BE][5/16] fix typos in torch/ (torch/distributed/) ( #156315 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156315
Approved by: https://github.com/Skylion007 , https://github.com/albanD
ghstack dependencies: #156313 , #156314
2025-06-22 08:43:26 +00:00
94dc3253a0
[BE][Easy] enable UFMT for torch/distributed/
( #128870 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870
Approved by: https://github.com/fegin , https://github.com/wconstab
2024-06-22 18:53:28 +00:00
9c929f6ce9
Revert "[BE][Easy] enable UFMT for torch/distributed/
( #128870 )"
...
This reverts commit a0e1e20c4157bb3e537fc784a51d7aef1e754157.
Reverted https://github.com/pytorch/pytorch/pull/128870 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128870#issuecomment-2181780356 ))
2024-06-21 00:38:28 +00:00
a0e1e20c41
[BE][Easy] enable UFMT for torch/distributed/
( #128870 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870
Approved by: https://github.com/fegin
ghstack dependencies: #128868 , #128869
2024-06-18 21:49:08 +00:00
9cc040fef6
Switch env variable use in test harnesses to the non-deprecated names to fix warnings ( #114880 )
...
Previously:
```
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
```
With this PR, those warnings disappear. They were introduced in #114077
This change was generated with this sed script, applied with `sed -i -f /tmp/x **/*.{py,hpp,cpp,cc}` and hand inspected.
```
s/\bNCCL_BLOCKING_WAIT\b/TORCH_NCCL_BLOCKING_WAIT/g
s/\bNCCL_ENABLE_TIMING\b/TORCH_NCCL_ENABLE_TIMING/g
s/\bNCCL_DESYNC_DEBUG\b/TORCH_NCCL_DESYNC_DEBUG/g
s/\bNCCL_ASYNC_ERROR_HANDLING\b/TORCH_NCCL_ASYNC_ERROR_HANDLING/g
s/\bENABLE_NCCL_HEALTH_CHECK\b/TORCH_ENABLE_NCCL_HEALTH_CHECK/g
s/\bNCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK\b/TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK/g
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114880
Approved by: https://github.com/kwen2501
2023-12-01 20:08:23 +00:00
ff51f94e32
[Reland] Fix default timeouts for python entrypoints (e.g. init_process_group) ( #113094 )
...
Previous PRs changed the c++ default timeout for PGNccl, but this path
was only hit in some cases, and the python defaults took over in other
cases.
This PR ensures that NCCL pg always default to the changed NCCL-specific
timeout value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113094
Approved by: https://github.com/fduwjj
2023-11-07 05:34:26 +00:00
75adb9f371
Revert "Fix default timeouts for python entrypoints (e.g. init_process_group) ( #112893 )"
...
This reverts commit f9d47e13813bbefc9f19a6c0430b7122f9d09b91.
Reverted https://github.com/pytorch/pytorch/pull/112893 on behalf of https://github.com/clee2000 due to sorry this seems to have broken inductor f9d47e1381
https://github.com/pytorch/pytorch/actions/runs/6776367936/job/18418174752 ([comment](https://github.com/pytorch/pytorch/pull/112893#issuecomment-1796979811 ))
2023-11-06 22:49:53 +00:00
f9d47e1381
Fix default timeouts for python entrypoints (e.g. init_process_group) ( #112893 )
...
Previous PRs changed the c++ default timeout for PGNccl, but this path
was only hit in some cases, and the python defaults took over in other
cases.
This PR ensures that NCCL pg always default to the changed NCCL-specific
timeout value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112893
Approved by: https://github.com/xw285cornell , https://github.com/kwen2501 , https://github.com/XilunWu
ghstack dependencies: #112611 , #112803
2023-11-06 20:48:39 +00:00
43ad172c54
make ProcessGroupDefaultTimeout the same as python ( #56549 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56549
This make the `kProcessGroupDefaultTimeout` be the same as the python
side, and python side directly use the pybind value instead
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision: D27899190
Pulled By: wanchaol
fbshipit-source-id: 388a7f42358b0abed75cf4934fb7b311fd33fee6
2021-04-21 17:56:05 -07:00
5e2f17d77a
Add NCCL_ASYNC_ERROR_HANDLING to docs ( #46856 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46856
Add reference to NCCL_ASYNC_ERROR_HANDLING in the pytorch docs,
similar to how NCCL_BLOCKING_WAIT is curently described.
ghstack-source-id: 115186877
Test Plan: CI, verifying docs change
Reviewed By: jiayisuse
Differential Revision: D24541822
fbshipit-source-id: a0b3e843bc6392d2787a4bb270118f2dfda5f4ec
2020-10-26 14:41:32 -07:00
6cb9e6b015
Back out "Revert D19871946: [distributed] pass in timeout to TCP store when initializing" ( #33434 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33434
Reland of https://github.com/pytorch/pytorch/pull/33325 , since the
unit test was flaky and failed on land.
To ensure that the test is not flaky, I bumped the timeout so the rendezvous
does not timeout (timing out the rendezvous in 1s led to the flakiness). I also
generalized our mechanism for retrying on errors to include retrying on errors
due to timeout in rendezvous.
ghstack-source-id: 98558377
Test Plan: Added UT test_tcp_store_timeout_set
Differential Revision: D19935390
fbshipit-source-id: 56ccf8c333dd2f954a33614d35cd1642d4e9473a
2020-02-19 17:17:17 -08:00
d4e4beddc4
Revert D19871946: [distributed] pass in timeout to TCP store when initializing
...
Test Plan: revert-hammer
Differential Revision:
D19871946
Original commit changeset: dd002180c4c8
fbshipit-source-id: 40b0676c51e43366c0700e81d16cc7927ee8efc2
2020-02-16 19:37:44 -08:00
df47a3abe0
[distributed] pass in timeout to TCP store when initializing ( #33325 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33325
Closes https://github.com/pytorch/pytorch/issues/32924 . There was a bug where for TCPStore, we would not respect the timeout passed into `init_process_group` while constructing the TCPStore. Instead, we'd set the timeout after the rendezvous created the store, meaning that we used the default timeout of 300s while connecting to the server. This diff passes the timeout passed into `init_process_group` to rendezvous so that it can be passed into the constructor for TCPStore, so that we can use the right timeout at construction time.
Question: Should we make this change for FileStore as well? Currently the FileStore constructor does not take in a timeout at all.
ghstack-source-id: 98401875
Test Plan: Added a UT
Differential Revision: D19871946
fbshipit-source-id: dd002180c4c883216645b8a97cc472c6116ac117
2020-02-16 17:59:44 -08:00