[SymmMem] Experimental NVSHMEM integration (#151261)

Adding NVSHMEM as a backend for `SymmetricMemory`, implementation of which is in `NVSHMEMSymmetricMemory.cu`.

Moving some helper functions in `CUDASymmetricMemory.cu` to `CUDASymmetricMemoryUtils.cpp`, so that they can be shared by `NVSHMEMSymmetricMemory`. These functions are mostly side-band exchange helpers (`store_all_gather`, `IpcChannel`, etc).

Adding `TORCH_SYMMEM` to control which implementation to use for CUDA tensors, currently support: `CUDA` (in-house impl), `NVSHMEM`.

The NVSHMEM feature is gated by build-time flag: `USE_NVSHMEM=1`. And `NVSHMEM_HOME` setting is required (TODO).

Ported most code from #146593.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151261
Approved by: https://github.com/fegin, https://github.com/fduwjj
This commit is contained in:
Ke Wen
2025-04-30 17:04:57 -07:00
committed by PyTorch MergeBot
parent 13add553b2
commit a7f1ddc184
15 changed files with 895 additions and 286 deletions

View File

@ -740,6 +740,7 @@ cc_library(
"torch/csrc/distributed/c10d/intra_node_comm.cu",
"torch/csrc/distributed/c10d/CUDASymmetricMemory.cu",
"torch/csrc/distributed/c10d/CUDASymmetricMemoryOps.cu",
"torch/csrc/distributed/c10d/CUDASymmetricMemoryUtils.cpp",
"torch/csrc/distributed/c10d/cuda/AsyncMM.cu",
"torch/csrc/distributed/c10d/NanCheck.cu",
"torch/csrc/distributed/c10d/quantization/quantization_gpu.cu",