Updates NCCLConfig with QOS variable (#151821)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151821 Approved by: https://github.com/kwen2501
2025-10-20 21:14:14 +08:00 · 2025-04-22 11:07:23 -07:00
parent aa61707a56
commit 334aab0dea
3 changed files with 10 additions and 0 deletions
--- a/docs/source/distributed.rst
+++ b/docs/source/distributed.rst
@ -143,6 +143,9 @@ for some cloud providers, such as AWS or GCP.
 For a full list of NCCL environment variables, please refer to
 `NVIDIA NCCL's official documentation <https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/env.html>`_

+You can tune NCCL communicators even further using `torch.distributed.ProcessGroupNCCL.NCCLConfig`
+and `torch.distributed.ProcessGroupNCCL.Options`. Learn more about them using `help`
+(e.g. `help(torch.distributed.ProcessGroupNCCL.NCCLConfig)`) in the interpreter.

 .. _distributed-basics:

--- a/torch/csrc/distributed/c10d/NCCLUtils.hpp
+++ b/torch/csrc/distributed/c10d/NCCLUtils.hpp
@ -66,6 +66,10 @@ static_assert(
 #define NCCL_HAS_MEM_ALLOC
 #endif

+#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 26, 0)
+#define NCCL_HAS_QOS
+#endif
+
 // Macro to throw on a non-successful NCCL return value.
 #define C10D_NCCL_CHECK(cmd, failureReason)                                   \
  do {                                                                        \
--- a/torch/csrc/distributed/c10d/init.cpp
+++ b/torch/csrc/distributed/c10d/init.cpp
@ -3178,6 +3178,9 @@ for details.
      .def_readwrite("max_ctas", &ncclConfig_t::maxCTAs)
 #ifdef NCCL_HAS_COMM_SPLIT
      .def_readwrite("split_share", &ncclConfig_t::splitShare)
+#endif
+#ifdef NCCL_HAS_QOS
+      .def_readwrite("traffic_class", &ncclConfig_t::trafficClass)
 #endif
      .def_property(
          "net_name",