mirror of
https://github.com/deepspeedai/DeepSpeed.git
synced 2025-10-20 15:33:51 +08:00
CUDA tensors may have a larger storage than numel() * dtype.itemsize due to alignment considerations. Creating dummy tensors by torch.zero().as_strided() leads to out-of-bound errors in such cases. Create dummy inputs by empty_strided().zero_() instead. Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>