verl/requirements-npu.txt at 0758489422e8d41a89e6c36d4c477714520f0dcc - verl - Gitea: Git for Me

frozenleaves/verl

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Files

Cheetah 0528ba1185 [NPU] feat: Support FSDP worker and vLLM Ascend (#332 )

For developers, you can follow the docs: docs/ascend/ascend.rst

This pr is committed for supporting Ascend NPU backend.
Co-authored-by: Chendong98
[chendong136@huawei.com](mailto:chendong136@huawei.com)
Co-authored-by: zheliuyu <15750543867@163.com>
Co-authored-by: celestialli
[celestialli@outlook.com](mailto:celestialli@outlook.com)
In this pr, we add the capability to determine the type of NPU device
and we also add a new script for training on NPU.

These are change lists:

1. pyproject.toml change verison of vllm
2. requirements-npu.txt requirements for NPU
3. verl/bert_padding.py Adapted from
https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
4. verl/single_controller/ray/base.py
5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py
6. verl/trainer/fsdp_sft_trainer.py
7. verl/utils/flops_counter.py
8. verl/utils/fsdp_utils.py
9. verl/workers/actor/dp_actor.py
10. verl/workers/critic/dp_critic.py
11. verl/workers/fsdp_workers.py
12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
13. verl/workers/sharding_manager/fsdp_vllm.py
14. verl/utils/device.py get device type for different device
15. docs/ascend/ascend.md 

Here are our roadmap:

**RoadMap**

- [x] sft
- [x] ppo
- [x] grpo

News

[2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on
2*8 devices, and many params related to batch_size need to be reduced.
So this result is only for reference. We will announce the reward
results of the default params as soon as sleep mode is supported.

[2025.03.03] Modify the adaptation method of Ray

[2025.02.25] The PPO algorithm is supported for training on NPU with the
FSDP backend.

[2025.02.23] The SFT algorithm is supported for training on NPU with the
FSDP backend.

[2025.02.21] The GRPO algorithm is supported for training on NPU with
the FSDP backend.

Requirements
We use this PR testing on Ascend NPU and GPU to ensure the same codes
can run on different devices. The device information is 8 Atlas 800T A2
and 8 A100. Other software information is shown in the following table.

| Software | Version | 
|:-------|-------:|
| transformers  | 4.47.1  | 
| accelerate      | 1.3.0  | 
| torch_npu      | 2.5.1.rc1|
|CANN             | 8.1.RC1 (Not Released)|

About mean error
Due to differences in hardware structure, we cannot guarantee that the
loss of Ascend NPU is exactly the same as that of the GPU. According to
our experience, the loss differences less than 2% is acceptable. If the
loss difference is greater than 2%, we will try to fix it. The
calculation formula is as follows.

![loss_comparison](https://github.com/user-attachments/assets/4f62f713-9240-4324-bf7d-3ae59fc85b05)


N represents the number of training steps. For more information, please
refer to [Calculation accuracy
description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html)

---------

Co-authored-by: Chendong98 <chendong136@huawei.com>
Co-authored-by: zheliuyu <15750543867@163.com>

2025-05-23 21:28:57 +08:00

21 lines

263 B

Plaintext

Raw Blame History

 # requirements.txt records the full set of dependencies for development
 accelerate
 codetiming
 datasets
 dill
 hydra-core
 numpy
 pandas
 peft
 pyarrow>=15.0.0
 pybind11
 pylatexenc
 ray
 tensordict<=0.6.2
 transformers>=4.52.0
 wandb
 mathruler
 torchdata
 einops
 qwen_vl_utils