[c10d] Add NCCL memory allocator (#145675)

This PR implements a small UI improvement over #133603.

It prepares a NCCL memory allocator in torch cpp and then pybind's it out, so that user can directly use it.

UI:
```
pool = torch.cuda.MemPool(backend.mem_allocator)
with torch.cuda.use_mem_pool(pool):
    tensor = torch.arange(1024 * 1024 * 2, device=device)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145675
Approved by: https://github.com/syed-ahmed, https://github.com/wconstab
This commit is contained in:
Ke Wen
2025-01-30 10:10:58 -08:00
committed by PyTorch MergeBot
parent 7796e308d0
commit 51ee9b154e
7 changed files with 68 additions and 43 deletions

View File

@ -86,6 +86,10 @@ static_assert(
#define NCCL_HAS_COMM_REGISTER
#endif
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 19, 0)
#define NCCL_HAS_MEM_ALLOC
#endif
// Macro to throw on a non-successful NCCL return value.
#define C10D_NCCL_CHECK(cmd, failureReason) \
do { \