Summary: Fixes #20651 Communication collectives in `torch.distributed` call `CUDACachingAllocator::recordStream()` on input and output tensors to prevent their memory blocks being freed too early. `CUDACachingAllocator` uses tensor's data pointer to track memory blocks, which does not accept null pointers. However, empty tensor's `storage().data()` might be null. In this case, as there is no associated memory block for the empty tensor, it should be fine to make `recordStream()` a no-op. Tests only cover `broadcast` empty tensors for GLOO backend, because GLOO does not support empty inputs (facebookincubator/gloo/issues/179). It can be addressed in either `ProcessGroupGloo` or GLOO itself. Will add more tests when that gap is filled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20658 Differential Revision: D15399371 Pulled By: mrshenli fbshipit-source-id: d29ebd1c72fddae49531f32695f81b89e42e5a4d
c10/cuda is a core library with CUDA functionality. It is distinguished from c10 in that it links against the CUDA library, but like c10 it doesn't contain any kernels, and consists solely of core functionality that is generally useful when writing CUDA code; for example, C++ wrappers for the CUDA C API.
Important notes for developers. If you want to add files or functionality to this folder, TAKE NOTE. The code in this folder is very special, because on our AMD GPU build, we transpile it into c10/hip to provide a ROCm environment. Thus, if you write:
// c10/cuda/CUDAFoo.h
namespace c10 { namespace cuda {
void my_func();
}}
this will get transpiled into:
// c10/hip/HIPFoo.h
namespace c10 { namespace hip {
void my_func();
}}
Thus, if you add new functionality to c10, you must also update C10_MAPPINGS
tools/amd_build/pyHIPIFY/cuda_to_hip_mappings.py
to transpile
occurrences of cuda::my_func
to hip::my_func
. (At the moment,
we do NOT have a catch all cuda::
to hip::
namespace conversion,
as not all cuda
namespaces are converted to hip::
, even though
c10's are.)
Transpilation inside this folder is controlled by CAFFE2_SPECIFIC_MAPPINGS
(oddly enough.) C10_MAPPINGS
apply to ALL source files.
If you add a new directory to this folder, you MUST update both c10/cuda/CMakeLists.txt and c10/hip/CMakeLists.txt