DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 23:46:02 +08:00

Files

Stas Bekman ee286e53c8 set device_id in torch's init_process_group (#7266 )

This PR overcomes this issue when using any `torch.distributed` calls w/
deepspeed:
```
[W404 00:15:21.693690333 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 
to perform barrier as devices used by this process are currently unknown. This can
 potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in
 barrier() to force use of a particular device, or call init_process_group() with a device_id.
```
by setting `device_id` to the correct device corresponding to
`LOCAL_RANK` env var.

-------------------

Update: discovered `torch.dist` deadlocks with `torch=>2.7.0` when using
`device_id` arg - switching to draft for now as we can't commit this
until we know how to work around this.

---------

Signed-off-by: Stas Bekman <stas@stason.org>
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>

2025-07-16 08:32:20 -07:00

__init__.py

[NPU] Add NPU support for unit test (#4569 )

2023-11-13 20:36:12 +00:00

abstract_accelerator.py

assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect (#5569 )

2024-06-12 09:32:17 -07:00

cpu_accelerator.py

Add cpu accelerator fp16 dtype support (#7207 )

2025-04-21 19:21:37 +00:00

cuda_accelerator.py

set device_id in torch's init_process_group (#7266 )