[2/N][SymmMem] Add MemPool allocator and tests (#161471)

(Porting most of #161008)

Hooking SymmetricMemory Allocator to MemPool so that user can create symmetric tensors with regular `torch.zeros`, `torch.arange` etc factories. Also so that our ops can have functional variants that create `out` tensors on symmetric memory.

To end users, this PR supports a python UI as follows:
```
allocator = symm_mem.get_mempool_allocator(device)
mempool = torch.cuda.MemPool(allocator)
with torch.cuda.use_mem_pool(mempool):
    tensor = torch.arange(numel, dtype=dtype, device=device)
```

Added tests for both use cases above.

Differential Revision: [](https://our.internmc.facebook.com/intern/diff/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161471
Approved by: https://github.com/ngimel
ghstack dependencies: #161470
This commit is contained in:
Ke Wen
2025-08-30 18:32:12 -07:00
committed by PyTorch MergeBot
parent e92cd94153
commit 61e18b5304
10 changed files with 171 additions and 0 deletions

View File

@ -581,6 +581,7 @@ if(USE_CUDA)
${TORCH_SRC_DIR}/csrc/distributed/c10d/symm_mem/CUDASymmetricMemoryOps.cu
${TORCH_SRC_DIR}/csrc/distributed/c10d/symm_mem/CUDASymmetricMemoryUtils.cpp
${TORCH_SRC_DIR}/csrc/distributed/c10d/symm_mem/NCCLSymmetricMemory.cu
${TORCH_SRC_DIR}/csrc/distributed/c10d/symm_mem/cuda_mem_pool.cpp
PROPERTIES COMPILE_FLAGS "-DPYTORCH_C10_DRIVER_API_SUPPORTED=1"
)
endif()