[doc]fix: optimize ascend docs (#3063)

### What does this PR do? - 修复ascend_quick_start.rst中一些依赖软件的版本匹配错误。 - 支持现状表格中增加对actor.strategy和rollout.name的说明。 - 重命名ascend_profiling_en.rst和ascend_profiling_zh.rst，使文档标题看起来更美观些。 <img width="402" height="103" alt="image" src="https://github.com/user-attachments/assets/8f9ece22-315e-4f80-8157-04838f7467a3" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-20 13:43:50 +08:00 · 2025-08-15 13:24:21 +08:00
parent bd756c15c8
commit 28f6e4af7e
3 changed files with 41 additions and 41 deletions
--- a/docs/ascend_tutorial/ascend_profiling_en.rst
+++ b/docs/ascend_tutorial/ascend_profiling_en.rst
@ -1,4 +1,4 @@
-Data collection based on FSDP (Fully Sharded Data Parallel) backend on Ascend devices(NPU)
+Data collection based on FSDP backend on Ascend devices(en)
 ==========================================================================================

 Last updated: 07/24/2025.
--- a/docs/ascend_tutorial/ascend_profiling_zh.rst
+++ b/docs/ascend_tutorial/ascend_profiling_zh.rst
@ -1,6 +1,8 @@
-在昇腾设备上基于FSDP后端进行数据采集
+Data collection based on FSDP backend on Ascend devices(zh)
 ====================================

+在昇腾设备上基于FSDP后端进行数据采集
+
 Last updated: 07/24/2025.

 这是一份在昇腾设备上基于FSDP后端使用GRPO或DAPO算法进行数据采集的教程。
--- a/docs/ascend_tutorial/ascend_quick_start.rst
+++ b/docs/ascend_tutorial/ascend_quick_start.rst
@ -1,7 +1,7 @@
 verl x Ascend
 ===================================

-Last updated: 06/17/2025.
+Last updated: 08/15/2025.

 我们在 verl 上增加对华为昇腾设备的支持。

@ -28,9 +28,10 @@ Atlas 900 A2 PODc
 +-----------+-------------+
 | torch     | == 2.5.1    |
 +-----------+-------------+
-| torch_npu | == 2.5.1.RC1|
+| torch_npu | == 2.5.1    |
 +-----------+-------------+

+基础环境准备请参照这份 `文档 <https://gitee.com/ascend/pytorch>`_ 。

 vllm & vllm-ascend
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -80,14 +81,11 @@ vllm & vllm-ascend
 +--------------+---------------+
 | liger-kernel | not supported |
 +--------------+---------------+
-| tensordict   | 0.8.3 (ARM)   |
-+--------------+---------------+

-1. 支持通过 transformers 使能 --flash_attention_2， transformers 需大于等于 4.52.0版本。
+1. 支持通过 transformers 使能 --flash_attention_2， transformers 需等于 4.52.4版本。
 2. 不支持通过 flash_attn 使能 flash attention 加速。
 3. 不支持 liger-kernel 使能。
-4. 针对 ARM 服务器，tensordict 要求 0.8.3，可在依赖安装完成后再手动安装 tensordict。
-5. 针对 x86 服务器，需要安装 cpu 版本的 torchvision。
+4. 针对 x86 服务器，需要安装 cpu 版本的 torchvision。

 .. code-block:: bash

@ -153,50 +151,50 @@ vllm & vllm-ascend
        trainer.total_epochs=1 \
        trainer.device=npu $@

-MindSpeed 训练后端
+(可选) 设置MindSpeed训练后端指导
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-1. 参考 `MindSpeed Readme <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。
+1. 参考 `MindSpeed README <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。

-2. 使能 Verl worker 模型 ``strategy`` 配置为 ``megatron`` ，例如 ``actor_rollout_ref.actor.strategy=megatron``。
+2. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ，例如 ``actor_rollout_ref.actor.strategy=megatron``。

 3. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入，例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。

-4. 更多特性信息可参考 `MindSpeed Verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
+4. 更多特性信息可参考 `MindSpeed+verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。

 支持现状
 -----------------------------------

 **表1** RL类算法

-+-----------+-------------------------+-------------+-------------------+----------------------+
-| algorithm |         model           | rewards mae |  throughput ratio |        hardware      |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   GRPO    | Qwen2.5-7B-instruct     |    0.38%    |        0.588      |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   GRPO    | Qwen2.5-32B-instruct    |    0.30%    |        0.685      |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   GRPO    | Qwen2.5-VL-3B-instruct  |    3.14%    |        0.470      |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   GRPO    | Qwen2.5-VL-7B-instruct  |    3.30%    |        0.380      |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   GRPO    | Qwen2.5-VL-32B-instruct |    0.79%    |        0.568      |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   DAPO    | Qwen2.5-7B-instruct     |    3.83%    |        pending    |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   DAPO    | Qwen3-8B-base           |    5.3%     |        pending    |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
-|   DAPO    | Qwen3-14B-base          |    5.9%     |        pending    |  Atlas 200T A2 Box16 |
-+-----------+-------------------------+-------------+-------------------+----------------------+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
+| algorithm |         model           | rewards mae |  throughput ratio |   actor.strategy  |   rollout.name    |         hardware         |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
+|   GRPO    | Qwen2.5-7B-instruct     |    0.38%    |        0.588      |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
+|   GRPO    | Qwen2.5-32B-instruct    |    0.30%    |        0.685      |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
+|   GRPO    | Qwen2.5-VL-3B-instruct  |    3.14%    |        0.470      |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
+|   GRPO    | Qwen2.5-VL-7B-instruct  |    3.30%    |        0.380      |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
+|   GRPO    | Qwen2.5-VL-32B-instruct |    0.79%    |        0.568      |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------|
+|   DAPO    | Qwen2.5-7B-instruct     |    3.83%    |        pending    |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
+|   DAPO    | Qwen3-8B-base           |    5.3%     |        pending    |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
+|   DAPO    | Qwen3-14B-base          |    5.9%     |        pending    |        FSDP       |    vllm-ascend    |    Atlas 200T A2 Box16   |
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+

 **表2** SFT类算法

-+-----------+-------------------------+----------------+-------------------+----------------------+
-| algorithm |         model           | loss value mae |  total time ratio |        hardware      |
-+-----------+-------------------------+----------------+-------------------+----------------------+
-|  SFT-PEFT | Qwen3-8B                |      0.09%     |       0.618       |  Atlas 900 A2 PODc   |
-+-----------+-------------------------+----------------+-------------------+----------------------+
-| ReTool-SFT| Qwen2.5-7B-instruct     |      0.08%     |       0.775       |  Atlas 900 A2 PODc   |
-+-----------+-------------------------+----------------+-------------------+----------------------+
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
+| algorithm |         model           | train loss mae |  total time ratio |   actor.strategy  |        hardware      |
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
+|  SFT-PEFT | Qwen3-8B                |      0.09%     |       0.618       |        FSDP       |   Atlas 900 A2 PODc  |
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+
+| ReTool-SFT| Qwen2.5-7B-instruct     |      0.08%     |       0.775       |        FSDP       |   Atlas 900 A2 PODc  |
+-----------+-------------------------+----------------+-------------------+-------------------+----------------------+

 精度对比说明
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -217,10 +215,10 @@ Ascend npu 和 A100 分别取日志中前4个 step 的 "perf/throughput" 做平
 计划
 -----------------------------------

-查看 `roadmap <https://github.com/volcengine/verl/discussions/900>`_ 获取更多特性的支持进度。
+查看 `roadmap <https://github.com/volcengine/verl/discussions/2171>`_ 获取更多特性的支持进度。



 声明
 -----------------------------------
-verl中提供的ascend支持代码皆为参考样例，商业使用请通过官方正式途径沟通，谢谢。
+verl中提供的ascend支持代码皆为参考样例，如在生产环境中使用请通过官方正式途径沟通，谢谢。