mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
[c10d/nccl-pg] allow user to pass process group description (#123472)
Summary: We need a way to allow user set a customized description for a process group, e.g. FSDP, PP. Here are several use cases of user specified group_desc: - Logging: we can easily match a log line and understand what's this collective/pg is used to. - Pytorch traces (e.g. Kineto, Execution Trace) can benefit from the PG desc since trace analysis, benchmarks will be able to easily differentiate PG purpose like FSDP, PP. - Lower layer collectives(e.g. NCCL) debug: we will be able to expose PG desc to NCCL communicator so NCCL layer operations can be easily correlated to a PG. Solution: Add a group_desc field to c10d Differential Revision: D55781850 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123472 Approved by: https://github.com/kwen2501
This commit is contained in:
committed by
PyTorch MergeBot
parent
73f0ecc1ac
commit
4e9094533e
@ -694,6 +694,8 @@ class TORCH_API ProcessGroup : public torch::CustomClassHolder {
|
||||
|
||||
const std::string& getGroupName() const;
|
||||
void setGroupName(const std::string& name);
|
||||
const std::string& getGroupDesc() const;
|
||||
void setGroupDesc(const std::string& name);
|
||||
void enableCollectivesTiming();
|
||||
|
||||
void release_resources() override;
|
||||
@ -724,6 +726,7 @@ class TORCH_API ProcessGroup : public torch::CustomClassHolder {
|
||||
const int size_;
|
||||
const c10::intrusive_ptr<Options> options_;
|
||||
const BackendType backendType_;
|
||||
std::string pg_desc_;
|
||||
|
||||
// Debug level setting. It is parsed once when ProcessGroup is constructed and
|
||||
// remains the same across use of this process group.
|
||||
|
Reference in New Issue
Block a user