21 Commits

Author SHA1 Message Date
7457d139c5 Add pyrefly suppressions to torch/distributed (7/n) (#165002)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

One more PR after this one.

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002
Approved by: https://github.com/oulgen
2025-10-09 04:08:25 +00:00
da003d7b95 [3/N] Import Callable from collections.abc in torch/distributed (#164104)
This is the result of applying the ruff `UP035` check.
`Callable` is imported from `collections.abc` instead of `typing`.
This PR is the follow-up of #164054.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164104
Approved by: https://github.com/Skylion007
2025-09-30 00:28:53 +00:00
00ffeca1b1 PEP585 update - torch/distributed (#145164)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-21 04:23:29 +00:00
6374332d33 Revert "PEP585 update - torch/distributed (#145164)"
This reverts commit 6cb186e279bc179a6bb63f0226e24ab42a07b394.

Reverted https://github.com/pytorch/pytorch/pull/145164 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing an inductor test ([comment](https://github.com/pytorch/pytorch/pull/145164#issuecomment-2602875679))
2025-01-20 16:46:46 +00:00
6cb186e279 PEP585 update - torch/distributed (#145164)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145164
Approved by: https://github.com/bobrenjc93
2025-01-20 00:19:01 +00:00
08be9ec312 Migrate from Tuple -> tuple in torch/distributed (#144258)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144258
Approved by: https://github.com/aorenste
2025-01-10 08:34:54 +00:00
b44ecd91ba [c10d] Switch all timer logging in c10d to wait_counter (#141154)
Summary: The original decorator based time logger is bad in performance and capacity. So we want to replace it with pytorch `_WaitCounter` now.

Test Plan: Tested on workload and no regression has been seen: https://fburl.com/scuba/aps_instrumentation_components/mskj73ea

Differential Revision: D66218675

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141154
Approved by: https://github.com/wz337
2024-11-21 01:10:11 +00:00
5107d244ee [c10d][Logging] Remove args and kwargs from c10d logging (#140169)
This PR is trying to reland https://github.com/pytorch/pytorch/pull/139804

We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140169
Approved by: https://github.com/wz337, https://github.com/kwen2501
2024-11-09 13:57:32 +00:00
411203e7c1 Revert D65490202 (#140142)
Summary:
This diff reverts D65490202
This is causing tests to fail on open source. See distributed/test_c10d_logger.py::C10dErrorLoggerTest::test_exception_logger [GH job link](https://github.com/pytorch/pytorch/actions/runs/11736922614/job/32697709457) [HUD commit link](ba9645f6e5)

Test Plan: NA

Differential Revision: D65663063

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140142
Approved by: https://github.com/malfet, https://github.com/huydhn
2024-11-08 21:22:32 +00:00
ba9645f6e5 Fix for T206766523 ("Your diff, D65462767, broke some tests") (#139804)
Summary:
This is trying to fix a regression caused by https://github.com/pytorch/pytorch/pull/139757. We now don't want to log args and kwargs directly because if they contain tensor or tensor subclass it would take lots of time in conversion to string or even not supported.

Reviewed By: fduwjj

Differential Revision: D65490202

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139804
Approved by: https://github.com/XilunWu
2024-11-08 05:57:30 +00:00
bac10cdd6f [DCP] Fix duplicated logging messages when enable both c10d and dcp l… (#130423)
…ogger

Fixes #129951 . Would you take a moment to review it? @LucasLLC

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130423
Approved by: https://github.com/Skylion007
2024-07-11 13:43:39 +00:00
94dc3253a0 [BE][Easy] enable UFMT for torch/distributed/ (#128870)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870
Approved by: https://github.com/fegin, https://github.com/wconstab
2024-06-22 18:53:28 +00:00
9c929f6ce9 Revert "[BE][Easy] enable UFMT for torch/distributed/ (#128870)"
This reverts commit a0e1e20c4157bb3e537fc784a51d7aef1e754157.

Reverted https://github.com/pytorch/pytorch/pull/128870 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128870#issuecomment-2181780356))
2024-06-21 00:38:28 +00:00
a0e1e20c41 [BE][Easy] enable UFMT for torch/distributed/ (#128870)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128870
Approved by: https://github.com/fegin
ghstack dependencies: #128868, #128869
2024-06-18 21:49:08 +00:00
3a0d088517 Flip default value for mypy disallow_untyped_defs [5/11] (#127842)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127842
Approved by: https://github.com/oulgen
2024-06-08 18:49:18 +00:00
13070e2753 [DCP] Adds better handling in logging of specific kwargs (#123658)
Adds additional signpost integrations to DCP Logger, to add support for MLU and metric collection.

Differential Revision: [D55803461](https://our.internmc.facebook.com/intern/diff/D55803461/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123658
Approved by: https://github.com/fegin
2024-04-11 21:09:38 +00:00
de7edeea25 [DCP] DCP logger (#121352)
Adds additional logging for improved observability in DCP.

Differential Revision: [D54512626](https://our.internmc.facebook.com/intern/diff/D54512626/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121352
Approved by: https://github.com/wz337, https://github.com/fegin
2024-04-05 17:50:50 +00:00
937d616e82 Re-enable type checking for distributed_c10d.py (#115223)
Re-enable type checking for distributed_c10d.py

Type checking for distributed_c10d.py was inadvertently turned off in issues that have accumulated since.

Note: the backwards compatibility linter does not like some of these changes.  But they were incorrect before.  This needs human verification, however.

#suppress-api-compatibility-check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115223
Approved by: https://github.com/wconstab
2023-12-09 11:07:54 +00:00
07f0413b70 [c10d] add nccl version to c10d logger (#111215)
Summary: NCCL version is essential for debugging purpose and NCCL rollout monitoring. Log this info for easy access.

Test Plan:
run cmf10x on devgpu

https://pxl.cl/3B5gf

https://fburl.com/scuba/pytorch_c10d_logging/lybk2usq

Differential Revision: D50240853

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111215
Approved by: https://github.com/Skylion007
2023-10-16 18:47:49 +00:00
264df88a2d [C10D][Logger]Add more info to c10d logger (#107331)
This PR adds pg_name and world_size to c10d logging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107331
Approved by: https://github.com/kumpera
2023-08-28 15:10:56 +00:00
ee95e37a69 [c10d] Record time spent for init_process_group, new_group, _store_based_barrier (#101912)
1. Record time spent for init_process_group, new_group, _store_based_barrier
2. Rename c10d_error_logger to c10d_logger for generalization.
3. Refactor to move logger wrappers in distributed_c10d.py to logger to c10d_logger.py.
4. Rename the logger wrappers (bc breaking). Exception_handler is renamed to exception_logger to avoid confusion with logging handler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101912
Approved by: https://github.com/fduwjj
2023-05-24 09:36:34 +00:00