Uses context pointer for deleter to enable multiple CUDAPluggableAllocator usage (#130472)

We should be able to create multiple CUDAPluggableAllocators in the same pytorch program (see https://github.com/pytorch/pytorch/issues/124807, https://github.com/pytorch/pytorch/pull/125722 for context). When mixing CUDAPluggableAllocators in the same pytorch program, we need to make sure that the deleter passed in through the CUDAPluggableAllocator gets "attached" to the data_ptr and persist until program exit (when it's called to free the memory).

Currently, CUDAPluggableAllocator maintains a global `current_custom_allocator`. When creating the `DataPtr`, `raw_deleter` attaches `custom_raw_deleter` to the DataPtr which calls  `current_custom_allocator->raw_delete(...)`. This approach is fine when using only one allocator, however for multiple allocator use case, DataPtr would be using the deleter of whatever is in the `current_custom_allocator`. For example, if allocation 1 was done with `cudaMalloc` and allocation 2 was done with `ncclMemAlloc`, and if `current_custom_allocator` is currently pointing to the CUDAPluggableAllocator with `ncclMemAlloc` - when cleaning up the allocation 1, we'd be using `ncclMemFree` instead of `cudaFree`.

In this PR, we solve the above problem by remembering the `free_fn_` using a deleter context. Hence, there is no need to go through an allocator object to find the deleter.

CC: @zdevito @ptrblck @eqy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130472
Approved by: https://github.com/eqy, https://github.com/ezyang
This commit is contained in:
Syed Tousif Ahmed
2024-07-18 11:33:19 +00:00
committed by PyTorch MergeBot
parent 28a74b9fa4
commit 38b7d89aa4
5 changed files with 112 additions and 22 deletions

View File

@ -661,6 +661,7 @@ libtorch_cuda_core_sources = [
"torch/csrc/CudaIPCTypes.cpp",
"torch/csrc/cuda/comm.cpp",
"torch/csrc/cuda/memory_snapshot.cpp",
"torch/csrc/cuda/CUDAPluggableAllocator.cpp",
"torch/csrc/inductor/aoti_runner/model_container_runner_cuda.cpp",
"torch/csrc/inductor/aoti_torch/shim_cuda.cpp",
"torch/csrc/jit/codegen/fuser/cuda/fused_kernel.cpp",
@ -772,7 +773,6 @@ libtorch_python_cuda_core_sources = [
"torch/csrc/cuda/shared/cudart.cpp",
"torch/csrc/cuda/shared/nvtx.cpp",
"torch/csrc/cuda/utils.cpp",
"torch/csrc/cuda/CUDAPluggableAllocator.cpp",
]
libtorch_python_cuda_sources = libtorch_python_cuda_core_sources + [