mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
So that the tensor's lifetime management is the same as the management built for the NCCL, pre and post kernels. Also so that on visualizers, they show up in the NCCL stream line. Otherwise if they show up in the compute line, user may get confused (my code does not have these kernels). The check is thus moved after the point where we depend NCCL stream from the last compute kernel. Also moved declaration of `checkForNan` from Utils.hpp to NCCLUtils.hpp, and renamed Utils.cu to NCCLUtils.cu. Differential Revision: [D61957573](https://our.internmc.facebook.com/intern/diff/D61957573) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134300 Approved by: https://github.com/shuqiangzhang, https://github.com/wconstab
17 lines
328 B
C++
17 lines
328 B
C++
#pragma once
|
|
|
|
#ifdef USE_C10D_NCCL
|
|
|
|
#include <ATen/ATen.h>
|
|
#include <c10/cuda/CUDAStream.h>
|
|
|
|
namespace c10d {
|
|
|
|
// Check for NaNs in a tensor on a given stream. If any are found, throw a
|
|
// device-side error.
|
|
void checkForNan(const at::Tensor& tensor, at::cuda::CUDAStream& stream);
|
|
|
|
} // namespace c10d
|
|
|
|
#endif // USE_C10D_NCCL
|