Commit Graph

16 Commits

Author SHA1 Message Date
9fff8155c3 [2/N] Fix clang-tidy readability checks (#164652)
This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
2025-10-06 01:06:01 +00:00
2c5ed6e7c0 Revert "[2/N] Fix clang-tidy readability checks (#164652)"
This reverts commit 3c5ca685d6f5b6f3971c0cd20a054aa355610419.

Reverted https://github.com/pytorch/pytorch/pull/164652 on behalf of https://github.com/izaitsevfb due to need to revert due to a conflict with revert of https://github.com/pytorch/pytorch/pull/162659 ([comment](https://github.com/pytorch/pytorch/pull/164652#issuecomment-3369346707))
2025-10-05 21:36:57 +00:00
3c5ca685d6 [2/N] Fix clang-tidy readability checks (#164652)
This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
2025-10-05 07:05:11 +00:00
249152475d fix sequence number for group (#134578)
Summary:
Fix sequence number in execution trace dump for matching between collective/p2p op and wait in execution trace replay.

`ProcessGroupNCCL` has 2 sequence number counter, `seqCollective_` and `seqP2P_`.
b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L1188-L1191)
However, `WorkNCCL` only has one sequence number member `seq_`. b18ba9419e/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp (L387)
We need to match collective and p2p with wait separately.
29b5a462dc

Depend on: https://github.com/pytorch/pytorch/pull/135132

Test Plan: buck2 run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_execution_trace_integration_test

Differential Revision:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134578
Approved by: https://github.com/kwen2501, https://github.com/c-p-i-o
2024-10-10 04:24:06 +00:00
816061843a [Distributed/Profiler] Fix input/output dimension overflow (#134360)
Summary: When using ParamCommsDebugInfo, the input elements and output elements are stored in `int` instead of `int64_t`

Test Plan: Run HTA with new outputted values and make sure overflow does not occur

Reviewed By: fengxizhou

Differential Revision: D61728747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134360
Approved by: https://github.com/fengxizhou, https://github.com/jeanschmidt
2024-08-25 16:25:56 +00:00
cyy
c2596fd3e0 [Distributed] [4/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124032)
This PR continues to fix some clang-tidy warnings in distributed/c10d code, following https://github.com/pytorch/pytorch/pull/123312.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124032
Approved by: https://github.com/Skylion007
2024-04-16 00:42:18 +00:00
9fa922c2ed [profiler] Log process group name instead of pg uid (#124035)
Summary:
As part of the work of unifying process group identifier, log <group_name, group_desc>,  instead of pg uid in profiler.
- group_name remains as the unique identifier, e.g. “0”, "1"
- group_desc will be the user specified name, e.g. "fsdp".

Reviewed By: aaronenyeshi, kwen2501

Differential Revision: D55610682

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124035
Approved by: https://github.com/aaronenyeshi
2024-04-15 21:49:06 +00:00
5b9e5f854b [profiler] Log process group id instead of backend id (#120475)
Summary:
https://github.com/pytorch/pytorch/pull/104373 introduced backend_id
> an unique ID for the actual backend object, this is also exposed in record_param_comms, so we can correlate these collectives with the right backend object.

However, it is inconvenient to correlate collectives with backend id. Instead, using pg id(uid) to correlate directly is a better solution.
This PR change the ID information exposted in record_param_comms from backend_id to pg_id.

Differential Revision: D53558257

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120475
Approved by: https://github.com/aaronenyeshi
2024-02-29 15:04:33 +00:00
ffc826bf10 [nccl-pg] Store PG global rank information in tracing logs (#115730)
Storing the list of global ranks associated with each PG allows us to correlate traces across different ranks.

Test Plan:

OSS CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115730
Approved by: https://github.com/fduwjj
2023-12-14 00:59:17 +00:00
aa390cec21 [profiler] Fix description to use nelems rather than size (#114735)
We were storing the number of elements in the tensor, rather than the actual bytes.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114735
Approved by: https://github.com/aaronenyeshi, https://github.com/yoyoyocmu, https://github.com/kwen2501, https://github.com/fduwjj
2023-12-01 06:21:47 +00:00
ed15fa7cc2 [Kineto][NCCL][3/n] Get the NCCL communication info from PARAM_COMMS_INFO (#111846)
This diff enables the functionality to get the NCCL communication metadata from `c10::DebugInfoKind::PARAM_COMMS_INFO` available in `ThreadLocalDebugInfo`.

To make the overhead lighweight and avoid comparing the function name on each op, we add the method `bool isNcclMeta()`, which decided during initialization.

Differential Revision: [D50439211](https://our.internmc.facebook.com/intern/diff/D50439211/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111846
Approved by: https://github.com/aaronenyeshi
ghstack dependencies: #111842, #111843
2023-10-25 20:35:06 +00:00
43d0ae4822 [Kineto][NCCL][1/n] Add the world size info in NCCL metadata (#111842)
This diff adds the world size info in NCCL metadata, as we need the information to calculate the algorithmic bandwidth and bus Bandwidth.

Differential Revision: [D50439185](https://our.internmc.facebook.com/intern/diff/D50439185/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111842
Approved by: https://github.com/aaronenyeshi, https://github.com/fduwjj
2023-10-25 03:48:55 +00:00
dd6319198d Apply clang-format to distributed/c10d folder (#107140)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107140
Approved by: https://github.com/H-Huang
2023-08-14 23:16:38 +00:00
55479fe80e Enable capturing of comm collective parameters (#98) (#85368)
Summary:
X-link: https://github.com/facebookresearch/torch_ucc/pull/98

Add tensor input, output, and other metadata for PyTorch comms.

Test Plan: P517138779

Reviewed By: Pavani-Panakanti

Differential Revision: D38357077

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85368
Approved by: https://github.com/H-Huang
2022-10-11 04:38:26 +00:00
21017ad1a1 Dispatch.h: Avoid including ivalue (#64165)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728587

Pulled By: ezyang

fbshipit-source-id: d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4
2021-09-15 12:16:44 -07:00
a016150163 Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543

Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292

Test Plan: It builds

Reviewed By: cbalioglu

Differential Revision: D29062002

fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6
2021-06-24 12:38:51 -07:00