[c10d] allow sub group to be eagerly inited even if default one is not (#138665)

Summary:
Currently, eager mode is applied either to all PGs or NONE of them.
There are cases where we don't want to initialize the comms for default
PG, but we still want to initialize the comms for sub PG. Now with a
device_id passed to new group, we can achieve this case
Test Plan:
newly added UT

Tags:

Resolves https://github.com/pytorch/pytorch/issues/137018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138665
Approved by: https://github.com/kwen2501
ghstack dependencies: #138781
This commit is contained in:
Shuqiang Zhang
2024-10-24 13:49:28 -07:00
committed by PyTorch MergeBot
parent 277b32c930
commit 4c91481656
7 changed files with 64 additions and 1 deletions

View File

@ -445,6 +445,11 @@ class NCCLComm {
#endif
}
bool isInitialized() const {
std::unique_lock<std::mutex> lock(mutex_);
return initialized_;
}
bool isAborted() const {
std::unique_lock<std::mutex> lock(mutex_);
return aborted_;