mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Files

Liangliang-Ma 11a62a0635 Add Compressedbackend for Onebit optimizers (#5473 )

In the process of adding onebit optimizers support for XPU devices, we
have noticed that for different accelerator, the main difference of
implementation of `compressed_allreduce` lies on `packbits` and
`unpackbits`. CUDA uses cupy and NPU uses torch_npu. Instead of replace
these to xpu only functions, we provided a CompressedBackend to do the
`compressed_allreduce` work where users can add their own
packbits/unpackbits kernels, which is a general path for all kinds of
accelerators.

In this PR, we:
1. Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam
2. Add XPU implement of packbits/unpackbits with SYCL, built in
PackbitsBuilder
3. Add tests for onebit with CompressedBackend

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

2024-06-05 20:28:46 +00:00

README.md

Add Compressedbackend for Onebit optimizers (#5473 )

2024-06-05 20:28:46 +00:00

test_compressed_backend.py

Add Compressedbackend for Onebit optimizers (#5473 )

2024-06-05 20:28:46 +00:00

test_compressed_perf.py

Add Compressedbackend for Onebit optimizers (#5473 )

2024-06-05 20:28:46 +00:00

test_mpi_backend.py

Update DeepSpeed copyright license to Apache 2.0 (#3111 )

2023-03-30 17:14:38 -07:00

test_mpi_perf.py

Update DeepSpeed copyright license to Apache 2.0 (#3111 )

2023-03-30 17:14:38 -07:00

test_nccl_backend.py

Update DeepSpeed copyright license to Apache 2.0 (#3111 )

2023-03-30 17:14:38 -07:00

test_nccl_perf.py

Update DeepSpeed copyright license to Apache 2.0 (#3111 )

2023-03-30 17:14:38 -07:00

README.md

One-Bit tests

In this folder, you can test the functionality and performance of different backend for doing compressed allreduce, which is the main algorithm in one-bit optimizers like One-Bit Adam, One-Bit Lamb and Zero-One Adam.

How to run

NCCL and MPI backend

Basically it requires your environment have relative communication backend installed, the NCCL backend of PyTorch distributed or Message Passing Interface (MPI) like MVAPICH2-GDR and OpenMPI. Detailed Pre-requisites.

To test accuracy and performance of NCCL backend:

python test_nccl_backend.py
python test_nccl_perf.py

Similarly, for MPI backend:

python test_mpi_backend.py
python test_mpi_perf.py

Compressed backend

This backend provides an approach to abstract the generic part of one-bit optimizers and implements accelerator dependent part with DeepSpeed custom op builder. To use this CompressedBackend and test it, you should make sure that your current accelerator supports PackbitsBuilder, so that it could be loaded to do high performance packing and unpacking between float and Byte datatype. An example can be found in Deepspeed/op_builder/xpu/packbits.py.

The test usage is same as others:

python test_compressed_backend.py
python test_compressed_perf.py