mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 12:54:11 +08:00
[PGNCCL] Use long/short wait for different non-blocking calls (#142291)
In nonblocking mode, we always check if the NCCL communicator is ready between issuing commands to it. Today this is done by the `waitReady()` function. Unfortunately, the `waitReady()` function is burned with `C10D_NCCL_CHECK_TIMEOUT_SLEEP` which would sleep for an interval between two consecutive checks. While this is nice when waiting for comm init or finalize, it degrades performance of collective calls (which would almost certainly return success immediately.) This PR adds a `bool longInterval` argument to `waitReady` and let call site determine whether long wait is likely; if not, `waitReady` would use `sched_yield()` to more eagerly check for readiness. Thanks @eqy for reporting an issue that small collectives has perf impact in nonblocking mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142291 Approved by: https://github.com/eqy, https://github.com/fduwjj
This commit is contained in:
@ -329,7 +329,13 @@ class NCCLComm {
|
||||
// Wait for the communicator to be ready. This is a blocking function.
|
||||
// Useful in nonblocking mode: NCCL requires the communicator to be ready
|
||||
// before issuing a second command.
|
||||
void waitReady();
|
||||
// Arguments:
|
||||
// longInterval: if true, wait with sleep of an interval; otherwise, wait
|
||||
// with `sched_yield` which is faster (but acquires CPU more frequently).
|
||||
// Use `longInterval=true` when waiting for initialization or finalize to
|
||||
// complete. Use `longInterval=false` when waiting collective call to return
|
||||
// ncclSuccess.
|
||||
void waitReady(bool longInterval);
|
||||
|
||||
std::optional<std::string> getNcclCommFailureReason() const {
|
||||
LockType lock(mutex_);
|
||||
|
Reference in New Issue
Block a user