[2/N][SymmMem] Add MemPool allocator and tests (#161471)

(Porting most of #161008)

Hooking SymmetricMemory Allocator to MemPool so that user can create symmetric tensors with regular `torch.zeros`, `torch.arange` etc factories. Also so that our ops can have functional variants that create `out` tensors on symmetric memory.

To end users, this PR supports a python UI as follows:
```
allocator = symm_mem.get_mempool_allocator(device)
mempool = torch.cuda.MemPool(allocator)
with torch.cuda.use_mem_pool(mempool):
    tensor = torch.arange(numel, dtype=dtype, device=device)
```

Added tests for both use cases above.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161471
Approved by: https://github.com/ngimel
ghstack dependencies: #161470
This commit is contained in:
Ke Wen
2025-08-26 14:38:58 -07:00
committed by PyTorch MergeBot
parent 8dd5aa9689
commit 4ed71d5412
10 changed files with 138 additions and 0 deletions

View File

@ -1128,6 +1128,9 @@ This class does not support ``__members__`` property.)");
&::c10d::symmetric_memory::has_multicast_support)
.def_static("set_backend", &::c10d::symmetric_memory::set_backend)
.def_static("get_backend", &::c10d::symmetric_memory::get_backend)
.def_static(
"get_mempool_allocator",
&::c10d::symmetric_memory::get_mempool_allocator)
.def_property_readonly("rank", &SymmetricMemory::get_rank)
.def_property_readonly("world_size", &SymmetricMemory::get_world_size)
.def_property_readonly(