[2/N][SymmMem] Add MemPool allocator and tests (#161471)

(Porting most of #161008)

Hooking SymmetricMemory Allocator to MemPool so that user can create symmetric tensors with regular `torch.zeros`, `torch.arange` etc factories. Also so that our ops can have functional variants that create `out` tensors on symmetric memory.

To end users, this PR supports a python UI as follows:
```
allocator = symm_mem.get_mempool_allocator(device)
mempool = torch.cuda.MemPool(allocator)
with torch.cuda.use_mem_pool(mempool):
    tensor = torch.arange(numel, dtype=dtype, device=device)
```

Added tests for both use cases above.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161471
Approved by: https://github.com/ngimel
ghstack dependencies: #161470
This commit is contained in:
Ke Wen
2025-08-26 14:38:58 -07:00
committed by PyTorch MergeBot
parent 8dd5aa9689
commit 4ed71d5412
10 changed files with 138 additions and 0 deletions

View File

@ -747,6 +747,7 @@ cc_library(
"torch/csrc/distributed/c10d/symm_mem/CUDASymmetricMemory.cu",
"torch/csrc/distributed/c10d/symm_mem/CUDASymmetricMemoryOps.cu",
"torch/csrc/distributed/c10d/symm_mem/CUDASymmetricMemoryUtils.cpp",
"torch/csrc/distributed/c10d/symm_mem/cuda_mem_pool.cpp",
"torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cu",
],
)) + torch_sources,