DeepSpeed/2020-09-09-onebit-adam-news.md at fd405169232dd83bdc7883df1c7d707d482e1be6

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Files

Olatunji Ruwase fd40516923 Update GH org references (#6998 )

Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>

2025-02-05 00:56:50 +00:00

1.7 KiB

Raw Blame History

title, excerpt, date, tags, toc

title	excerpt	date	tags	toc
Up to 5x less communication and 3.4x faster training through 1-bit Adam		2020-09-09 00:00:00	training English	false

Adam is an effective and probably the most well-utilized optimizer for training many large-scale deep learning models. However, Adam is generally not compatible with communication-efficient optimization algorithms, and therefore the communication cost could become a bottleneck while scaling across distributed devices. We introduce a new algorithm - 1-bit Adam - and its efficient implementation in DeepSpeed. 1-bit Adam offers the same convergence as Adam, incurs up to 5x less communication that enables up to 3.5x higher throughput for BERT-Large pretraining and up to 2.7x higher throughput for SQuAD fine-tuning on bandwidth-limited clusters.

Brief overview, see our [press release]({{ site.press_release_v3 }}).
Detailed technology deep dive, see our blog post.
Tutorial on how to reproduce our results, see our 1-bit Adam tutorial.
The source code for 1-bit Adam can be found in the DeepSpeed repo. The implementation of 1-bit Adam is in onebit_adam.py and CUDA-Aware communication for 1-bit Adam is in custom_collectives.py. Example codes to try this feature can be found in the DeepSpeedExamples repo as shown in the tutorial.

1.7 KiB Raw Blame History

1.7 KiB

Raw Blame History