Files
DeepSpeed/docs/_tutorials
Ma, Guokai f03d416eae add --bind_cores_to_rank to zero offload tutorial (#7474)
In ZeRO offload, significant time is spent on CPUAdam, which is CPU
code. Thus use `--bind_cores_to_rank` in deepspeed launch command would
help improve the performance of ZeRO offload. This PR add this command
to ZeRO offload tutorial to increase user awareness.

For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with
128 CPU cores, the average step time is as follow, near 1.3x performance
improvement:
without `--bind_cores_to_rank`: 3084.44ms per step
with `--bind_cores_to_rank`: 2383.16ms per step

---------

Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>
2025-08-08 10:34:29 -07:00
..
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00
2023-05-12 12:00:35 -07:00
2023-05-12 17:06:19 +00:00
2023-03-17 11:30:24 -07:00
2025-02-05 00:56:50 +00:00
2025-02-05 00:56:50 +00:00