mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
For developers, you can follow the docs: docs/ascend/ascend.rst This pr is committed for supporting Ascend NPU backend. Co-authored-by: Chendong98 [chendong136@huawei.com](mailto:chendong136@huawei.com) Co-authored-by: zheliuyu <15750543867@163.com> Co-authored-by: celestialli [celestialli@outlook.com](mailto:celestialli@outlook.com) In this pr, we add the capability to determine the type of NPU device and we also add a new script for training on NPU. These are change lists: 1. pyproject.toml change verison of vllm 2. requirements-npu.txt requirements for NPU 3. verl/bert_padding.py Adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py 4. verl/single_controller/ray/base.py 5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py 6. verl/trainer/fsdp_sft_trainer.py 7. verl/utils/flops_counter.py 8. verl/utils/fsdp_utils.py 9. verl/workers/actor/dp_actor.py 10. verl/workers/critic/dp_critic.py 11. verl/workers/fsdp_workers.py 12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py 13. verl/workers/sharding_manager/fsdp_vllm.py 14. verl/utils/device.py get device type for different device 15. docs/ascend/ascend.md Here are our roadmap: **RoadMap** - [x] sft - [x] ppo - [x] grpo News [2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported. [2025.03.03] Modify the adaptation method of Ray [2025.02.25] The PPO algorithm is supported for training on NPU with the FSDP backend. [2025.02.23] The SFT algorithm is supported for training on NPU with the FSDP backend. [2025.02.21] The GRPO algorithm is supported for training on NPU with the FSDP backend. Requirements We use this PR testing on Ascend NPU and GPU to ensure the same codes can run on different devices. The device information is 8 Atlas 800T A2 and 8 A100. Other software information is shown in the following table. | Software | Version | |:-------|-------:| | transformers | 4.47.1 | | accelerate | 1.3.0 | | torch_npu | 2.5.1.rc1| |CANN | 8.1.RC1 (Not Released)| About mean error Due to differences in hardware structure, we cannot guarantee that the loss of Ascend NPU is exactly the same as that of the GPU. According to our experience, the loss differences less than 2% is acceptable. If the loss difference is greater than 2%, we will try to fix it. The calculation formula is as follows.  N represents the number of training steps. For more information, please refer to [Calculation accuracy description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html) --------- Co-authored-by: Chendong98 <chendong136@huawei.com> Co-authored-by: zheliuyu <15750543867@163.com>
21 lines
263 B
Plaintext
21 lines
263 B
Plaintext
# requirements.txt records the full set of dependencies for development
|
|
accelerate
|
|
codetiming
|
|
datasets
|
|
dill
|
|
hydra-core
|
|
numpy
|
|
pandas
|
|
peft
|
|
pyarrow>=15.0.0
|
|
pybind11
|
|
pylatexenc
|
|
ray
|
|
tensordict<=0.6.2
|
|
transformers>=4.52.0
|
|
wandb
|
|
mathruler
|
|
torchdata
|
|
einops
|
|
qwen_vl_utils
|