DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Files

Xinyu Lian af56ed4d37 SuperOffload Release (#7559 )

This PR introduces **SuperOffload**—an optimizer designed for Superchips
(Nvidia GH200 & GB200, AMD MI300A) with high CPU–GPU bandwidth. It
enables **full fine-tuning** of **GPT-OSS-20B, Qwen3-14B, and Phi-4** on
a single GH200 GPU, achieving up to **~500 TFLOPS**, using Hugging Face
Transformers and DeepSpeed—no custom modeling code required.

SuperOffload extends ZeRO-Offload with fine-grained control and CPUAdam
rollback utilities, allowing GPU execution to overlap with CPUAdam. This
reduces GPU idle time and improves overall efficiency.

Key changes:
- New SuperOffloadOptimizer_Stage3 optimizer.
- C++/CUDA binding for adam_rollback to revert one optimization step.
- Config additions including super_offload and cpuadam_cores_perc.

A detailed blog and tutorial will be available soon.

---------

Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

2025-09-24 13:09:23 +00:00

cpu_adam_impl.cpp

SuperOffload Release (#7559 )

2025-09-24 13:09:23 +00:00

cpu_adam.cpp

SuperOffload Release (#7559 )

2025-09-24 13:09:23 +00:00

fused_adam_frontend.cpp

Update DeepSpeed copyright license to Apache 2.0 (#3111 )

2023-03-30 17:14:38 -07:00

multi_tensor_adam.cu

64bit indexing fused adam (#5187 )

2024-04-22 19:47:00 +00:00

multi_tensor_apply.cuh

64bit indexing fused adam (#5187 )

2024-04-22 19:47:00 +00:00