d2494cbb2b
Revert "[distributed] Replace assert statements with AssertionError exceptions ( #165216 )"
...
This reverts commit 74db92b21868b7e9e77cc966e5d57a8246723cbd.
Reverted https://github.com/pytorch/pytorch/pull/165216 on behalf of https://github.com/clee2000 due to I think this broke distributed/test_pg_wrapper.py::ProcessGroupNCCLWrapperTest::test_debug_level_detail_no_gloo [GH job link](https://github.com/pytorch/pytorch/actions/runs/18492765290/job/52693842750 ) [HUD commit link](74db92b218
), note to self: bad TD ([comment](https://github.com/pytorch/pytorch/pull/165216#issuecomment-3402838765 ))
2025-10-14 17:05:16 +00:00
74db92b218
[distributed] Replace assert statements with AssertionError exceptions ( #165216 )
...
Replaces 71 assert statements across 11 files in `torch.distributed` with explicit if-checks raising AssertionError to prevent assertions from being disabled with Python -O flag.
Fixes #164878
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165216
Approved by: https://github.com/albanD
2025-10-14 09:58:59 +00:00
00059db034
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367 ))
2025-09-25 13:47:46 +00:00
09cb34c1dc
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-22 21:12:18 +00:00
f0078941cf
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530 ))
2025-09-22 05:39:07 +00:00
6c334885d4
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-12 10:54:42 +00:00
6b59a19242
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880 ))
2025-09-12 06:52:03 +00:00
6e8f17c580
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-12 03:56:18 +00:00
dda071587f
Revert "Make distributed modules importable even when backend not built ( #159889 )" ( #162568 )
...
This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9.
Revert "Always build USE_DISTRIBUTED. (#160449 )"
This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568
Approved by: https://github.com/huydhn
2025-09-10 04:29:42 +00:00
a0d026688c
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-08 19:10:36 +00:00
29e09a6545
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053 ) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002 ))
2025-09-08 07:04:36 +00:00
01edcd4df8
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-05 20:15:11 +00:00
70f865ac9b
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit ef3be6726f7ff4b77c22db10cec5b686f9107ea9.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011 ))
2025-09-05 18:58:47 +00:00
ef3be6726f
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-04 20:05:50 +00:00
34aa78274d
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 4ae57d448c0a7d37e4cfd5c27d977fad2cef4051.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Failing internal tests, probably typechecks. See D81588399 ([comment](https://github.com/pytorch/pytorch/pull/159889#issuecomment-3253651785 ))
2025-09-04 13:13:52 +00:00
4ae57d448c
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-03 07:33:55 +00:00
420c52ecf3
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 626cb7df8161dd4ecb4fe43b60f37ce9076f56b1.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/159889#issuecomment-3246677982 ))
2025-09-02 20:24:01 +00:00
626cb7df81
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-01 23:00:21 +00:00
d214901133
Add a title to distributed._dist2.md ( #159385 )
...
Sphinx likes titles and complains about them when they are not there. So adding a title to address this Wartning in the build:
```
WARNING: toctree contains reference to document 'distributed._dist2' that doesn't have a title: no link will be generated
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159385
Approved by: https://github.com/d4l3k
2025-07-30 04:09:41 +00:00
b7def5ff1c
dist2: add support for passing custom configs directly to PG ( #158147 )
...
This is intended to make it easier to have backend specific "hints" that can be provided by the user to hint about certain options.
```py
import torch.distributed._dist2 as dist2
pg = dist2.new_group(backend="my_custom_backend", device=..., timeout=..., foo=1234, bar="1234")
pg.allreduce(...)
```
Test plan:
```
pytest test/distributed/test_dist2.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158147
Approved by: https://github.com/fduwjj
2025-07-15 00:02:54 +00:00
0d77364ee3
dist2: cleanup non-option methods on PG (missing, timeouts) ( #158123 )
...
This updates the ProcessGroup.* API to include timeouts on all non-option based overloaded methods. This also adds 2 missing ones `alltoall_base` and `barrier`.
Following design in: https://docs.google.com/document/d/13R-1t_yESTvmAjcCN-wQjQQadIEu0JNIdS65uZawZzY/edit?tab=t.0#heading=h.3ctbqqopzc89
Test plan:
```
pytest test/distributed/test_dist2.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158123
Approved by: https://github.com/Skylion007 , https://github.com/fduwjj
2025-07-12 00:06:37 +00:00
83700b4488
dist2: add group context manager ( #157988 )
...
This adds new context manager based PG management to dist2. This allows for managing the active process group much in the same way as a stream
```py
with dist2.process_group(pg):
dist2.current_process_group().allreduce(...).wait()
```
matches
```py
with torch.cuda.stream(stream):
torch.cuda.current_stream().synchronize()
```
Test plan:
```
pytest test/distributed/test_dist2.py -k context
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157988
Approved by: https://github.com/fduwjj
2025-07-10 22:30:19 +00:00
ed051c3084
torch.distributed: add initial _dist2 prototype API ( #157841 )
...
This adds the initial dist2 API as proposed in https://docs.google.com/document/d/13R-1t_yESTvmAjcCN-wQjQQadIEu0JNIdS65uZawZzY/edit?tab=t.0#heading=h.3ctbqqopzc89
This is a WIP experimental API and is a sandbox for a number of new features and quality of life improvements/changes to c10d.
Test plan:
```
pytest test/distributed/test_dist2.py
```
Docs
```
cd docs
make html
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157841
Approved by: https://github.com/fduwjj
2025-07-09 23:40:43 +00:00