[doc]fix: optimize ascend docs (#3063)

### What does this PR do?

- 修复ascend_quick_start.rst中一些依赖软件的版本匹配错误。
- 支持现状表格中增加对actor.strategy和rollout.name的说明。
- 重命名ascend_profiling_en.rst和ascend_profiling_zh.rst,使文档标题看起来更美观些。 
<img width="402" height="103" alt="image"
src="https://github.com/user-attachments/assets/8f9ece22-315e-4f80-8157-04838f7467a3"
/>

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
This commit is contained in:
Chunyu
2025-08-15 13:24:21 +08:00
committed by GitHub
parent bd756c15c8
commit 28f6e4af7e
3 changed files with 41 additions and 41 deletions

View File

@ -1,4 +1,4 @@
Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
Data collection based on FSDP backend on Ascend devices(en)
==========================================================================================
Last updated: 07/24/2025.

View File

@ -1,6 +1,8 @@
在昇腾设备上基于FSDP后端进行数据采集
Data collection based on FSDP backend on Ascend devices(zh)
====================================
在昇腾设备上基于FSDP后端进行数据采集
Last updated: 07/24/2025.
这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程。

View File

@ -1,7 +1,7 @@
verl x Ascend
===================================
Last updated: 06/17/2025.
Last updated: 08/15/2025.
我们在 verl 上增加对华为昇腾设备的支持。
@ -28,9 +28,10 @@ Atlas 900 A2 PODc
+-----------+-------------+
| torch | == 2.5.1 |
+-----------+-------------+
| torch_npu | == 2.5.1.RC1|
| torch_npu | == 2.5.1 |
+-----------+-------------+
基础环境准备请参照这份 `文档 <https://gitee.com/ascend/pytorch>`_
vllm & vllm-ascend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -80,14 +81,11 @@ vllm & vllm-ascend
+--------------+---------------+
| liger-kernel | not supported |
+--------------+---------------+
| tensordict | 0.8.3 (ARM) |
+--------------+---------------+
1. 支持通过 transformers 使能 --flash_attention_2 transformers 需大于等于 4.52.0版本。
1. 支持通过 transformers 使能 --flash_attention_2 transformers 需等于 4.52.4版本。
2. 不支持通过 flash_attn 使能 flash attention 加速。
3. 不支持 liger-kernel 使能。
4. 针对 ARM 服务器,tensordict 要求 0.8.3,可在依赖安装完成后再手动安装 tensordict
5. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。
4. 针对 x86 服务器,需要安装 cpu 版本的 torchvision
.. code-block:: bash
@ -153,50 +151,50 @@ vllm & vllm-ascend
trainer.total_epochs=1 \
trainer.device=npu $@
MindSpeed 训练后端
(可选) 设置MindSpeed训练后端指导
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1. 参考 `MindSpeed Readme <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
1. 参考 `MindSpeed README <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
2. 使能 Verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``
2. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``
3. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入,例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``
4. 更多特性信息可参考 `MindSpeed Verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_
4. 更多特性信息可参考 `MindSpeed+verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_
支持现状
-----------------------------------
**表1** RL类算法
+-----------+-------------------------+-------------+-------------------+----------------------+
| algorithm | model | rewards mae | throughput ratio | hardware |
+-----------+-------------------------+-------------+-------------------+----------------------+
| GRPO | Qwen2.5-7B-instruct | 0.38% | 0.588 | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| GRPO | Qwen2.5-32B-instruct | 0.30% | 0.685 | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| GRPO | Qwen2.5-VL-3B-instruct | 3.14% | 0.470 | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| GRPO | Qwen2.5-VL-7B-instruct | 3.30% | 0.380 | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| DAPO | Qwen3-8B-base | 5.3% | pending | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
| DAPO | Qwen3-14B-base | 5.9% | pending | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+----------------------+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
| algorithm | model | rewards mae | throughput ratio | actor.strategy | rollout.name | hardware |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
| GRPO | Qwen2.5-7B-instruct | 0.38% | 0.588 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
| GRPO | Qwen2.5-32B-instruct | 0.30% | 0.685 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
| GRPO | Qwen2.5-VL-3B-instruct | 3.14% | 0.470 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
| GRPO | Qwen2.5-VL-7B-instruct | 3.30% | 0.380 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
| DAPO | Qwen3-8B-base | 5.3% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
| DAPO | Qwen3-14B-base | 5.9% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
**表2** SFT类算法
+-----------+-------------------------+----------------+-------------------+----------------------+
| algorithm | model | loss value mae | total time ratio | hardware |
+-----------+-------------------------+----------------+-------------------+----------------------+
| SFT-PEFT | Qwen3-8B | 0.09% | 0.618 | Atlas 900 A2 PODc |
+-----------+-------------------------+----------------+-------------------+----------------------+
| ReTool-SFT| Qwen2.5-7B-instruct | 0.08% | 0.775 | Atlas 900 A2 PODc |
+-----------+-------------------------+----------------+-------------------+----------------------+
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
| algorithm | model | train loss mae | total time ratio | actor.strategy | hardware |
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
| SFT-PEFT | Qwen3-8B | 0.09% | 0.618 | FSDP | Atlas 900 A2 PODc |
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
| ReTool-SFT| Qwen2.5-7B-instruct | 0.08% | 0.775 | FSDP | Atlas 900 A2 PODc |
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
精度对比说明
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -217,10 +215,10 @@ Ascend npu 和 A100 分别取日志中前4个 step 的 "perf/throughput" 做平
计划
-----------------------------------
查看 `roadmap <https://github.com/volcengine/verl/discussions/900>`_ 获取更多特性的支持进度。
查看 `roadmap <https://github.com/volcengine/verl/discussions/2171>`_ 获取更多特性的支持进度。
声明
-----------------------------------
verl中提供的ascend支持代码皆为参考样例商业使用请通过官方正式途径沟通,谢谢。
verl中提供的ascend支持代码皆为参考样例如在生产环境中使用请通过官方正式途径沟通,谢谢。