[sglang] feat: add preparation for sglang+verl (#3506)

### What does this PR do?
support npu for verl + sglang

```python
bash examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_npu.sh
```


### Accuracy test
8b:
<img width="747" height="842" alt="8b"
src="https://github.com/user-attachments/assets/f36ef25a-b32f-4c76-97d0-2e5fe53ff183"
/>

30b:
<img width="759" height="850" alt="30b"
src="https://github.com/user-attachments/assets/97979002-7ebf-47fa-ae57-3e9b6637f12c"
/>

### Test


### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: lbk-sys <hello_lbk@163.com>
Co-authored-by: 1StepForever <wangww1Step@foxmail.com>
This commit is contained in:
lbk-sys
2025-09-29 10:21:01 +08:00
committed by GitHub
parent aa19c1afc4
commit f50e5c2e8f
8 changed files with 395 additions and 32 deletions

View File

@ -0,0 +1,113 @@
verl x Ascend
===================================
Last updated: 09/25/2025.
我们在 verl 上增加对华为昇腾设备的支持。
硬件支持
-----------------------------------
Atlas 200T A2 Box16
Atlas 900 A2 PODc
Atlas 800T A3
安装
-----------------------------------
基础环境准备
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------+-------------+
| software | version |
+-----------+-------------+
| Python | == 3.11 |
+-----------+-------------+
| CANN | == 8.3.RC1 |
+-----------+-------------+
| HDK | == 25.3.RC1 |
+-----------+-------------+
| torch | == 2.6.0 |
+-----------+-------------+
| torch_npu | == 2.6.0 |
+-----------+-------------+
**目前verl框架中sglang npu后端仅支持上述HDK、CANN和PTA版本, 商发可用版本预计2025年10月发布**
为了能够在 verl 中正常使用 sglang需使用以下命令安装sglang、torch_memory_saver和verl。
sglang
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
# sglang
git clone https://github.com/sgl-project/sglang.git
cd sglang
mv python/pyproject.toml python/pyproject.toml.backup
mv python/pyproject_other.toml python/pyproject.toml
pip install -e "python[srt_npu]"
安装torch_memory_saver
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
# torch_memory_saver
git clone https://github.com/sgl-project/sgl-kernel-npu.git
cd sgl-kernel-npu
bash build.sh -a memory-saver
pip install output/torch_memory_saver*.whl
安装verl
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
git clone https://github.com/volcengine/verl.git
cd verl
pip install --no-deps -e .
pip install -r requirements-npu.txt
其他三方库说明
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+--------------+---------------+
| software | description |
+--------------+---------------+
| transformers | v4.56.1 |
+--------------+---------------+
| triton_ascend| v3.2.0 |
+--------------+---------------+
1. sglang依赖 transformers v4.56.1
2. sglang依赖triton_ascend v3.2.0
3. 暂不支持多模态模型卸载相关安装包torchvision、timm
.. code-block:: bash
pip uninstall torchvision
pip uninstall timm
pip uninstall triton
pip install transformers==4.56.1
pip install -i https://test.pypi.org/simple/ triton-ascend==3.2.0.dev20250925
快速开始
-----------------------------------
正式使用前建议您通过对Qwen3-8B GRPO的训练尝试以检验环境准备和安装的正确性。
1.下载数据集并将数据集预处理为parquet格式以便包含计算RL奖励所需的必要字段
.. code-block:: bash
python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k
2.执行训练
.. code-block:: bash
bash verl/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_npu.sh

View File

@ -133,6 +133,7 @@ verl is fast with:
ascend_tutorial/ascend_quick_start.rst
ascend_tutorial/ascend_profiling_zh.rst
ascend_tutorial/ascend_profiling_en.rst
ascend_tutorial/ascend_sglang_quick_start.rst
.. toctree::
:maxdepth: 1