d795fb225a
[RFC] Add pyrefly to lintrunner ( #165179 )
...
This will add pyrefly to lint runner as a warning only - and allow us to collect feedback about the tool before switching to pyrefly as the main type checker.
References the steps outlined here: : https://github.com/pytorch/pytorch/issues/163283 :
test plan:
`lintrunner init`
`lintrunner`
confirm when pyrefly errors are present results look like: https://gist.github.com/maggiemoss/e6cb2d015dd1ded560ae1329098cf33f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165179
Approved by: https://github.com/ezyang
2025-10-16 20:07:09 +00:00
9944cac6e6
Add suppressions to torch/_inductor ( #165062 )
...
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283
Split this directory into two PRs to keep them from being too large.
Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check
step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199
after:
INFO 0 errors (6,884 ignored)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165062
Approved by: https://github.com/oulgen , https://github.com/mlazos
2025-10-09 20:34:20 +00:00
7457d139c5
Add pyrefly suppressions to torch/distributed (7/n) ( #165002 )
...
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283
One more PR after this one.
Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check
step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199
after:
INFO 0 errors (6,884 ignored)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002
Approved by: https://github.com/oulgen
2025-10-09 04:08:25 +00:00
00059db034
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367 ))
2025-09-25 13:47:46 +00:00
09cb34c1dc
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-22 21:12:18 +00:00
f0078941cf
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530 ))
2025-09-22 05:39:07 +00:00
6c334885d4
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-12 10:54:42 +00:00
6b59a19242
Revert "[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )"
...
This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7.
Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880 ))
2025-09-12 06:52:03 +00:00
6e8f17c580
[RELAND] Always build USE_DISTRIBUTED ( #160449 ) and Make distributed modules importable even when backend not built ( #159889 ) ( #162594 )
...
Summary:
Original: D81957844 and D81957923
Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well
#buildall
Test Plan:
sandcastle and oss ci
Rollback Plan:
Reviewed By: H-Huang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594
Approved by: https://github.com/H-Huang , https://github.com/dcci
2025-09-12 03:56:18 +00:00
dda071587f
Revert "Make distributed modules importable even when backend not built ( #159889 )" ( #162568 )
...
This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9.
Revert "Always build USE_DISTRIBUTED. (#160449 )"
This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568
Approved by: https://github.com/huydhn
2025-09-10 04:29:42 +00:00
a0d026688c
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-08 19:10:36 +00:00
d80297a684
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-08 19:10:36 +00:00
1e0656f063
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit de893e96c775023aa3be895060848fac3296772c.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053 ) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002 ))
2025-09-08 07:04:36 +00:00
29e09a6545
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053 ) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002 ))
2025-09-08 07:04:36 +00:00
01edcd4df8
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-05 20:15:11 +00:00
de893e96c7
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-05 20:15:11 +00:00
adae7f66aa
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit c37103234afc832dcad307e9016230810957c9d5.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011 ))
2025-09-05 18:58:47 +00:00
70f865ac9b
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit ef3be6726f7ff4b77c22db10cec5b686f9107ea9.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011 ))
2025-09-05 18:58:47 +00:00
ef3be6726f
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-04 20:05:50 +00:00
c37103234a
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-04 19:43:17 +00:00
b7dad7dd49
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit 90b08643c3a6eb1f3265b7d1388bd76660759f46.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3254219358 ))
2025-09-04 15:25:07 +00:00
34aa78274d
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 4ae57d448c0a7d37e4cfd5c27d977fad2cef4051.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Failing internal tests, probably typechecks. See D81588399 ([comment](https://github.com/pytorch/pytorch/pull/159889#issuecomment-3253651785 ))
2025-09-04 13:13:52 +00:00
4ae57d448c
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-03 07:33:55 +00:00
90b08643c3
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-03 07:33:55 +00:00
4e42aa8ffc
Revert "Always build USE_DISTRIBUTED. ( #160449 )"
...
This reverts commit b7034e9c924412bfbe8ee25a22d7e95239b5ca65.
Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3246689684 ))
2025-09-02 20:28:42 +00:00
420c52ecf3
Revert "Make distributed modules importable even when backend not built ( #159889 )"
...
This reverts commit 626cb7df8161dd4ecb4fe43b60f37ce9076f56b1.
Reverted https://github.com/pytorch/pytorch/pull/159889 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/159889#issuecomment-3246677982 ))
2025-09-02 20:24:01 +00:00
626cb7df81
Make distributed modules importable even when backend not built ( #159889 )
...
This PR is greatly simplified now that it stacked on top of a PR that builds with distributed always. We only need to stub functions that may not be defined due to a backend not being enabled.
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159889
Approved by: https://github.com/wconstab
ghstack dependencies: #160449
2025-09-01 23:00:21 +00:00
b7034e9c92
Always build USE_DISTRIBUTED. ( #160449 )
...
Signed-off-by: Edward Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/dcci
2025-09-01 23:00:21 +00:00
89d842fec5
Make torch.distributed.breakpoint() set a long timeout ( #158481 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158481
Approved by: https://github.com/d4l3k
ghstack dependencies: #158469
2025-07-18 02:18:43 +00:00
98c892749b
c10d/Store: add nonblocking mode to queue_pop ( #151485 )
...
This adds a non-blocking mode to queue_pop. This allows for workers to poll if work is ready without blocking the main loop. This is useful for the case where you want to have a GPU have maximum utilization when something only periodically is sent on the queue.
We also expose a `torch.distributed.QueueEmptyError` so users can catch the error and handle it accordingly.
Test plan:
```
pytest test/distributed/test_store.py -k queue -v -s -x
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151485
Approved by: https://github.com/fduwjj , https://github.com/tianfengfrank
2025-04-18 02:14:50 +00:00
8bf3f3fc43
[c10d] Add a collective time estimator for NCCL comms ( #149343 )
...
We want to upstream the feature from new nccl for users to estimate comm time.
Resolves #147753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149343
Approved by: https://github.com/kwen2501
2025-03-19 07:54:02 +00:00
00ffeca1b1
PEP585 update - torch/distributed ( #145164 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-21 04:23:29 +00:00
6374332d33
Revert "PEP585 update - torch/distributed ( #145164 )"
...
This reverts commit 6cb186e279bc179a6bb63f0226e24ab42a07b394.
Reverted https://github.com/pytorch/pytorch/pull/145164 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing an inductor test ([comment](https://github.com/pytorch/pytorch/pull/145164#issuecomment-2602875679 ))
2025-01-20 16:46:46 +00:00
6cb186e279
PEP585 update - torch/distributed ( #145164 )
...
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-20 00:19:01 +00:00
0f90ffe94a
Remove ProcessGroupRoundRobin ( #132888 )
...
`_round_robin_process_groups` is deprecated and should be removed.
258f47fc0b/torch/csrc/distributed/c10d/ProcessGroupRoundRobin.cpp (L10-L12)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132888
Approved by: https://github.com/Skylion007 , https://github.com/wanchaol , https://github.com/c-p-i-o , https://github.com/fduwjj
2024-08-08 01:07:40 +00:00
21d4c48059
Allow distributed breakpoint to skip the first few calls ( #129511 )
...
Summary:
PDB allows to do conditional breakpoint but the ability won't work in the distributed environment. We can still do conditional breakpoint by doing the following:
```
counter = 0
global counter
count += 1
if counter > 100:
dist.breakpoint()
```
This PR makes dist.breakpoint() support this feature as a syntax sugar.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129511
Approved by: https://github.com/wconstab , https://github.com/c-p-i-o
2024-08-07 21:57:37 +00:00
94dc3253a0
[BE][Easy] enable UFMT for torch/distributed/
( #128870 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870
Approved by: https://github.com/fegin , https://github.com/wconstab
2024-06-22 18:53:28 +00:00
cc8193c707
Revert "[BE] enable UFMT for torch/nn/functional.py
( #128592 )"
...
This reverts commit f6e6e55fa7d883a89ba99584f8632c260519ba73.
Reverted https://github.com/pytorch/pytorch/pull/128592 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128592#issuecomment-2181783936 ))
2024-06-21 00:44:16 +00:00
9c929f6ce9
Revert "[BE][Easy] enable UFMT for torch/distributed/
( #128870 )"
...
This reverts commit a0e1e20c4157bb3e537fc784a51d7aef1e754157.
Reverted https://github.com/pytorch/pytorch/pull/128870 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128870#issuecomment-2181780356 ))
2024-06-21 00:38:28 +00:00
a0e1e20c41
[BE][Easy] enable UFMT for torch/distributed/
( #128870 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870
Approved by: https://github.com/fegin
ghstack dependencies: #128868 , #128869
2024-06-18 21:49:08 +00:00
f6e6e55fa7
[BE] enable UFMT for torch/nn/functional.py
( #128592 )
...
Part of #123062
- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128592
Approved by: https://github.com/mikaylagawarecki
ghstack dependencies: #128596 , #128594
2024-06-17 16:29:29 +00:00
62bcdc0ac9
Flip default value for mypy disallow_untyped_defs [4/11] ( #127841 )
...
See #127836 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841
Approved by: https://github.com/oulgen
2024-06-08 18:36:48 +00:00
ac51920656
Reapply "c10d: add Collectives abstraction ( #125978 )" ( #126695 )
...
This reverts commit d9c3485146913324ab4b3e211d2a4517e138f4af.
Reapplies #125978 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126695
Approved by: https://github.com/c-p-i-o
2024-05-21 18:00:09 +00:00
d9c3485146
Revert "c10d: add Collectives abstraction ( #125978 )"
...
This reverts commit 4b2ae2ac338f3a0de340c9711b03989b8ce66fc6.
Reverted https://github.com/pytorch/pytorch/pull/125978 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/125978#issuecomment-2119858015 ))
2024-05-20 07:40:41 +00:00
4b2ae2ac33
c10d: add Collectives abstraction ( #125978 )
...
This adds a new `Collectives` API for doing distributed collectives operations. This is intended to replace the [current Elastic store abstraction](https://github.com/pytorch/pytorch/blob/main/torch/distributed/elastic/utils/store.py ) with more performant and debugable primitives.
Design doc: https://docs.google.com/document/d/147KcKJXEHvk1Q6tISLbJVvLejHg_1kIhBQeu-8RQxhY/edit
The standard implementation is using `StoreCollectives` but other more performant backends will be added in a follow up PR.
Test plan:
```
python test/distributed/test_collectives.py -v
```
This tests both functionality using multiple threads as well as timeout behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125978
Approved by: https://github.com/shuqiangzhang
2024-05-17 05:09:11 +00:00
e0e9d3ed79
make sure device mesh can be imported from torch.distributed ( #126119 )
...
as titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126119
Approved by: https://github.com/kwen2501 , https://github.com/anijain2305
2024-05-14 05:00:48 +00:00
1885c3972d
[C10D] Add dist.get_node_local_rank helper ( #123992 )
...
Fixes #122816
Summarizing the pros/cons of the request and motivation from #122816
- (+) it's really common for users to do 'os.getenv["LOCAL_RANK"]' so we
should provide a helper
- (-) we can't really control if/how local rank information is made
available, but it is handled automatically if torchrun is used.
We can assume local rank is correctly passed if it is passed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123992
Approved by: https://github.com/shuqiangzhang , https://github.com/zdevito , https://github.com/XilunWu
2024-04-16 00:09:46 +00:00
435db051d0
get torch.distributed.breakpoint() to work under Python/Meta contexts ( #118645 )
...
I noticed that when I put a `torch.distributed.breakpoint()` in [here](https://github.com/pytorch/pytorch/blob/main/torch/_subclasses/meta_utils.py#L605 ), it would fail. This fixes it.
In theory, it would probably be better to have a way to get the `barrier()` call to skip the dispatcher completely. I wasn't sure how to do that though, and this seems to cover 90% of issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118645
Approved by: https://github.com/yifuwang
2024-04-12 16:36:52 +00:00
c90fdb9ac0
Fix torch.distributed.breakpoint ( #115705 )
...
Switches from calling breakpoint() internally to using a subclass of
Pdb.
Fixes #115685
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115705
Approved by: https://github.com/wanchaol , https://github.com/fegin
2023-12-13 20:33:56 +00:00
a827ac71f2
Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta ( #115099 )"
...
This reverts commit eaa64339d640ed1d36520ada379213f8361be5ff.
2023-12-05 08:59:36 -08:00