mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
[doc]fix: optimize ascend docs (#3063)
### What does this PR do? - 修复ascend_quick_start.rst中一些依赖软件的版本匹配错误。 - 支持现状表格中增加对actor.strategy和rollout.name的说明。 - 重命名ascend_profiling_en.rst和ascend_profiling_zh.rst,使文档标题看起来更美观些。 <img width="402" height="103" alt="image" src="https://github.com/user-attachments/assets/8f9ece22-315e-4f80-8157-04838f7467a3" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
This commit is contained in:
@ -1,4 +1,4 @@
|
||||
Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
|
||||
Data collection based on FSDP backend on Ascend devices(en)
|
||||
==========================================================================================
|
||||
|
||||
Last updated: 07/24/2025.
|
||||
|
@ -1,6 +1,8 @@
|
||||
在昇腾设备上基于FSDP后端进行数据采集
|
||||
Data collection based on FSDP backend on Ascend devices(zh)
|
||||
====================================
|
||||
|
||||
在昇腾设备上基于FSDP后端进行数据采集
|
||||
|
||||
Last updated: 07/24/2025.
|
||||
|
||||
这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程。
|
@ -1,7 +1,7 @@
|
||||
verl x Ascend
|
||||
===================================
|
||||
|
||||
Last updated: 06/17/2025.
|
||||
Last updated: 08/15/2025.
|
||||
|
||||
我们在 verl 上增加对华为昇腾设备的支持。
|
||||
|
||||
@ -28,9 +28,10 @@ Atlas 900 A2 PODc
|
||||
+-----------+-------------+
|
||||
| torch | == 2.5.1 |
|
||||
+-----------+-------------+
|
||||
| torch_npu | == 2.5.1.RC1|
|
||||
| torch_npu | == 2.5.1 |
|
||||
+-----------+-------------+
|
||||
|
||||
基础环境准备请参照这份 `文档 <https://gitee.com/ascend/pytorch>`_ 。
|
||||
|
||||
vllm & vllm-ascend
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@ -80,14 +81,11 @@ vllm & vllm-ascend
|
||||
+--------------+---------------+
|
||||
| liger-kernel | not supported |
|
||||
+--------------+---------------+
|
||||
| tensordict | 0.8.3 (ARM) |
|
||||
+--------------+---------------+
|
||||
|
||||
1. 支持通过 transformers 使能 --flash_attention_2, transformers 需大于等于 4.52.0版本。
|
||||
1. 支持通过 transformers 使能 --flash_attention_2, transformers 需等于 4.52.4版本。
|
||||
2. 不支持通过 flash_attn 使能 flash attention 加速。
|
||||
3. 不支持 liger-kernel 使能。
|
||||
4. 针对 ARM 服务器,tensordict 要求 0.8.3,可在依赖安装完成后再手动安装 tensordict。
|
||||
5. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。
|
||||
4. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@ -153,50 +151,50 @@ vllm & vllm-ascend
|
||||
trainer.total_epochs=1 \
|
||||
trainer.device=npu $@
|
||||
|
||||
MindSpeed 训练后端
|
||||
(可选) 设置MindSpeed训练后端指导
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
1. 参考 `MindSpeed Readme <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
|
||||
1. 参考 `MindSpeed README <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
|
||||
|
||||
2. 使能 Verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。
|
||||
2. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。
|
||||
|
||||
3. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入,例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。
|
||||
|
||||
4. 更多特性信息可参考 `MindSpeed Verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
|
||||
4. 更多特性信息可参考 `MindSpeed+verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
|
||||
|
||||
支持现状
|
||||
-----------------------------------
|
||||
|
||||
**表1** RL类算法
|
||||
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| algorithm | model | rewards mae | throughput ratio | hardware |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| GRPO | Qwen2.5-7B-instruct | 0.38% | 0.588 | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| GRPO | Qwen2.5-32B-instruct | 0.30% | 0.685 | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| GRPO | Qwen2.5-VL-3B-instruct | 3.14% | 0.470 | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| GRPO | Qwen2.5-VL-7B-instruct | 3.30% | 0.380 | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| DAPO | Qwen3-8B-base | 5.3% | pending | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
| DAPO | Qwen3-14B-base | 5.9% | pending | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+----------------------+
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
|
||||
| algorithm | model | rewards mae | throughput ratio | actor.strategy | rollout.name | hardware |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
|
||||
| GRPO | Qwen2.5-7B-instruct | 0.38% | 0.588 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
|
||||
| GRPO | Qwen2.5-32B-instruct | 0.30% | 0.685 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
|
||||
| GRPO | Qwen2.5-VL-3B-instruct | 3.14% | 0.470 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
|
||||
| GRPO | Qwen2.5-VL-7B-instruct | 3.30% | 0.380 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
|
||||
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
|
||||
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
|
||||
| DAPO | Qwen3-8B-base | 5.3% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
|
||||
| DAPO | Qwen3-14B-base | 5.9% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
|
||||
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
|
||||
|
||||
**表2** SFT类算法
|
||||
|
||||
+-----------+-------------------------+----------------+-------------------+----------------------+
|
||||
| algorithm | model | loss value mae | total time ratio | hardware |
|
||||
+-----------+-------------------------+----------------+-------------------+----------------------+
|
||||
| SFT-PEFT | Qwen3-8B | 0.09% | 0.618 | Atlas 900 A2 PODc |
|
||||
+-----------+-------------------------+----------------+-------------------+----------------------+
|
||||
| ReTool-SFT| Qwen2.5-7B-instruct | 0.08% | 0.775 | Atlas 900 A2 PODc |
|
||||
+-----------+-------------------------+----------------+-------------------+----------------------+
|
||||
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
|
||||
| algorithm | model | train loss mae | total time ratio | actor.strategy | hardware |
|
||||
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
|
||||
| SFT-PEFT | Qwen3-8B | 0.09% | 0.618 | FSDP | Atlas 900 A2 PODc |
|
||||
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
|
||||
| ReTool-SFT| Qwen2.5-7B-instruct | 0.08% | 0.775 | FSDP | Atlas 900 A2 PODc |
|
||||
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
|
||||
|
||||
精度对比说明
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@ -217,10 +215,10 @@ Ascend npu 和 A100 分别取日志中前4个 step 的 "perf/throughput" 做平
|
||||
计划
|
||||
-----------------------------------
|
||||
|
||||
查看 `roadmap <https://github.com/volcengine/verl/discussions/900>`_ 获取更多特性的支持进度。
|
||||
查看 `roadmap <https://github.com/volcengine/verl/discussions/2171>`_ 获取更多特性的支持进度。
|
||||
|
||||
|
||||
|
||||
声明
|
||||
-----------------------------------
|
||||
verl中提供的ascend支持代码皆为参考样例,商业使用请通过官方正式途径沟通,谢谢。
|
||||
verl中提供的ascend支持代码皆为参考样例,如在生产环境中使用请通过官方正式途径沟通,谢谢。
|
||||
|
Reference in New Issue
Block a user