Files
DeepSpeed/deepspeed
Junjie Mao 660ee89529 deepcompile: Create dummy inputs using empty_strided (#7564)
CUDA tensors may have a larger storage than numel() * dtype.itemsize due
to alignment considerations. Creating dummy tensors by
torch.zero().as_strided() leads to out-of-bound errors in such cases.

Create dummy inputs by empty_strided().zero_() instead.

Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>
2025-09-15 14:19:06 -07:00
..
2025-09-01 01:12:40 +00:00
2025-08-16 18:22:19 +00:00
2025-08-16 18:22:19 +00:00
2024-11-06 18:57:12 +00:00
2025-06-06 18:49:41 -04:00
2025-08-16 18:22:19 +00:00
2025-08-16 18:22:19 +00:00
2023-06-02 00:47:14 +00:00
2025-08-16 18:22:19 +00:00
2025-09-03 18:14:18 +00:00
2025-03-28 22:48:17 +00:00
2025-06-06 18:49:41 -04:00
2025-08-16 18:22:19 +00:00