DeepSpeed/2020-10-28-progressive-layer-dropping-news.md at fd405169232dd83bdc7883df1c7d707d482e1be6

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Files

Olatunji Ruwase fd40516923 Update GH org references (#6998 )

Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>

2025-02-05 00:56:50 +00:00

1.3 KiB

Executable File

Raw Blame History

title, excerpt, date, tags, toc

title	excerpt	date	tags	toc
Progressive Layer Dropping		2020-10-29 00:00:00	training English	false

We introduce a new technology called progressive layer dropping (PLD) to speedup the pre-training of Transformer-based networks through efficient and robust compressed training. The pre-training step of Transformer networks often suffer from unbearable overall computational expenses. We analyze the training dynamics and stability of Transformer networks and propose PLD to sparsely update Transformer blocks following a progressive dropping schedule, which smoothly increases the layer dropping rate for each mini-batch as training evolves along both the temporal and the model depth dimension. PLD is able to allow the pre-training to be 2.5X faster to get similar accuracy on downstream tasks and allows the training to be 24% faster when training the same number of samples, not at the cost of excessive hardware resources.

For detailed technology deep dive, see our technical report.
For more information on how to use PLD, see our Progressive layer dropping tutorial.
The source code for PLD is now available at the DeepSpeed repo.

1.3 KiB Executable File Raw Blame History

1.3 KiB

Executable File

Raw Blame History