Compare commits

...

1147 Commits

Author SHA1 Message Date
061535208c [recipe] feat: Add example for gpt-oss training using agent loop (#3774)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
TODO: run training test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Hejian Sang <hsang@linkedin.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-15 16:45:11 +08:00
55f651c94d [misc] feat: bump version to 0.7.0.dev (#3772)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-15 13:40:12 +08:00
22d082f9a4 [recipe] feat: add open math reasoning (#3767)
### What does this PR do?

- Add open math reasoning recipe using sft trainer with model engine
- Support setting none to val dataset in sft trainer
- Fix main_eval
- Using aiohttp for main_generation_server to avoid hang in AsyncOpenAI

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-15 12:11:41 +08:00
8ec9bf64a1 [ci] fix: fix test_engine ci (#3771)
### What does this PR do?

- fix test_engine ci for latest transformers

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-15 12:11:17 +08:00
231d725f69 Revert "[trainer] feat: set interleave to False in dapo trainer" (#3770)
Reverts volcengine/verl#3760
2025-10-15 11:41:33 +08:00
d69164e1cb [misc] feat: bump version to 0.6.0.dev (#3768)
### What does this PR do?

- Bump version

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-15 10:47:13 +08:00
2181d5b33a [recipe] fix: update readme for gmpo-trainer (#3764)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: 刘悦 <liuyue127@xiaohongshu.com>
2025-10-15 10:24:24 +08:00
33eb86f54f [megatron] feat: support qwen3vl (#3763)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

support training qwen3vl with megatron

1. add an image with vllm0.11 and nemo's dedicated megatron that support
gpt-oss with optimized fused kernels.
2. add a script of training qwen3vl-30b with megatron
3. necessary changes to support qwen3vl megatron. (just register forward
functions, the modeling is through mbridge)


### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
<img width="372" height="314" alt="image"
src="https://github.com/user-attachments/assets/f1126e46-51a9-4e00-958f-5d034b8f94bd"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-15 10:19:22 +08:00
67f9a21b8e [trainer] feat: set interleave to False in dapo trainer (#3760)
### What does this PR do?

Set interleave to False. This way, during inference, if rollout.n is set
to a large value, it can prevent multiple identical samples from being
run on the same instance, which would otherwise lead to excessive
inference overhead.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-14 21:13:57 +08:00
d2c51dc186 Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl (#3756)
[Meta-Bandit-LLM](https://github.com/sanxing-chen/meta-bandit-llm/)
utilizes verl to train on-policy LLM agent with up to 50-turn
interations, with support of async vLLM and LoRA.

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 12:01:13 +08:00
16c2a21064 Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. (#3755)
…verl to project list

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-14 10:51:32 +08:00
3abcc09d44 [sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy (#3531)
### What does this PR do?

This PR extends the one-step-off-policy recipe by adding SGLang as an
alternative rollout engine to vLLM, allowing flexible backend selection
and improving training efficiency.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/3460
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

To validate this solution, we adopted the existing experimental
configuration from the recipe one-step-off-policy.

The evaluation demonstrates that the proposed SGLang rollout engine
integration achieves effective acceleration in one-step-off-policy
asynchronous training, providing users with enhanced rollout engine
options for diverse deployment scenarios.

**Experimental Results**

- **Machine Configuration**: 2 nodes with 16 H20 GPUs each
    - Generation: 4 GPUs
    - Training: 12 GPUs
- **Model**: Qwen2.5-Math-7B
- **Max Response Length**: 8,192 tokens
- **Algorithm**: DAPO
- **Rollout Engine**: vLLM, SGLang

| training mode | engine | step | gen | wait_prev_gen |
generate_sequences | old_log_prob | update_actor | total time |
acc/best@32/mean | acc/maj@32/mean |

|------------------------|----------------|------|-----|---------------|--------------------|--------------|--------------|---------------|------------------|-----------------|
| colocate sync | SGLang+FSDP2 | 452 | 131 | - | 125 | 54 | 199 | 12h25m
| 0.6560 | 0.4471 |
| one-step-overlap async | SGLang+FSDP2 | 406 | - | 12 | 305 | 58 | 245
| 11h12m (+11%) | 0.6303 | 0.4443 |

* colocate sync: step ≈ gen + old_log_prob + update_actor
* one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences,
old_log_prob + update_actor)

<img width="1218" height="777" alt="image"
src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7"
/>

### API and Usage Example

**Configuration Example**
```bash
# Using SGLang engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=sglang \
    # ... other configuration parameters

# Using vLLM engine
python3 -m recipe.one_step_off_policy.main_ppo \
    actor_rollout_ref.rollout.name=vllm \
    # ... other configuration parameters
```

**Script Usage**
```bash
# Using SGLang engine
bash dapo_7b_math_fsdp2_sglang_4_12.sh
bash dapo_7b_math_fsdp2_sglang_colocate.sh

# Using vLLM engine
bash dapo_7b_math_fsdp2_4_12.sh
bash dapo_7b_math_fsdp2_colocate.sh
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
2025-10-14 10:48:29 +08:00
5d378b5f95 [rollout] refactor: rename "clip" mode back to "mask" mode (#3750)
# Rollout Importance Sampling Framework

related to https://github.com/volcengine/verl/pull/3694

## Summary

This PR introduces a comprehensive **Rollout Importance Sampling (IS)**
framework to correct distribution mismatch between data-collecting
(rollout) and training policies, a critical factor for ensuring stable
and efficient model training in RL fine-tuning.

This work is motivated by the analysis in our blog post, [When Speed
Kills Stability: Demystifying RL Collapse from the Inference-Training
Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda).
If you find this implementation useful in your research, please consider
citing:

```bibtex
@misc{liu-li-2025,
  title = {When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch},
  url = {https://yingru.notion.site/When-Speed-Kills-Stability-Demystifying-RL-Collapse-from-the-Inference-Training-Mismatch-271211a558b7808d8b12d403fd15edda},
  author = {Jiacai Liu and Yingru Li and Yuqian Fu and Jiawei Wang and Qian Liu and Yu Shen},
  year = {2025},
  month = {September},
}
```

---

## Problem Statement

When using different policies for rollout generation (e.g., vLLM with
BFloat16) and training (e.g., FSDP with FP32), distribution mismatch
occurs, leading to:
- Biased gradient estimates
- Training instability and collapse
- Reduced sample efficiency
- Poor convergence properties

This framework addresses these issues through principled importance
sampling correction.

---

## Key Features & Improvements

### 1. **Flexible Aggregation Levels**
Three methods for calculating IS weights:
- **`token`**: Per-token importance ratios
- **`sequence`**: Product of per-token ratios
- **`geometric`**: Geometric mean of ratios

### 2. **Advanced Bounding Modes**
Two strategies to control weight variance:
- **`truncate`** (TIS): Caps weights at upper threshold only, preserving
gradients
- **`mask`** (MIS): Zeros out weights outside bounds, more aggressive
filtering

### 3. **Comprehensive Diagnostics**
Detailed metrics to monitor distribution mismatch and training health:

**Rollout IS Metrics** (automatically prefixed with `mismatch/`):
- Health indicators: `rollout_is_eff_sample_size`, `rollout_is_mean`
- Distribution statistics: `rollout_is_p25`, `rollout_is_p50`,
`rollout_is_p75`, `rollout_is_p95`, `rollout_is_p99`, `rollout_is_max`,
`rollout_is_min`, `rollout_is_std`
- Diagnostics: `rollout_is_veto_fraction`,
`rollout_is_catastrophic_token_fraction`, `rollout_is_masked_fraction`
(mask mode)
- Sequence-level statistics (for sequence/geometric modes):
`rollout_is_seq_mean`, `rollout_is_seq_std`, `rollout_is_seq_max`,
`rollout_is_seq_min`, etc.

**Mismatch Metrics** (computed efficiently within IS weight
computation):
- KL Divergence: `mismatch_kl` (forward KL), `mismatch_k3_kl` (K3
estimator for stability)
- Perplexity: `mismatch_training_ppl`, `mismatch_rollout_ppl`,
`mismatch_ppl_ratio`
- Log perplexity statistics: `mismatch_log_ppl_diff`,
`mismatch_log_ppl_abs_diff`, `mismatch_log_ppl_diff_max`,
`mismatch_log_ppl_diff_min`

### 4. **Outlier Mitigation**
- **Veto mechanism**: Automatically discards samples with catastrophic
importance weights (per-token ratios below threshold)
- Prevents gradient corruption from extreme outliers
- Configurable threshold (default: 1e-4)

### 5. **Numerical Stability**
- All core computations in **log-space** to prevent underflow/overflow
- Carefully designed clamping and bounding to maintain numerical
precision
- Safe handling of edge cases (zero probabilities, extreme ratios)

### 6. **Memory Efficiency**
- Optimized computation to minimize CUDA memory usage
- Efficient metric aggregation without large intermediate tensors
- Suitable for large-scale distributed training

### 7. **Metrics-Only Mode**
- Compute and monitor mismatch metrics **without** applying IS weights
- Useful for:
  - Understanding distribution mismatch before intervention
  - Deciding whether IS correction is needed
  - A/B testing IS impact
- Controlled by `algorithm.rollout_is` flag (independent of weight
computation)

### 8. **Universal PPO Support**
- Integrated with **all PPO variants**: vanilla, GSPO, GPG, Clip-Cov,
KL-Cov, geo_mean
- Consistent interface across different policy loss functions
- Automatic weight application when enabled

---

## API and Configuration Changes

### Migration from Legacy TIS

####  **Before (REMOVED)**
```yaml
# Old TIS configuration - NO LONGER SUPPORTED
actor_rollout_ref:
  actor:
    tis_imp_ratio_cap: 2.0  # Removed from actor config
```

The legacy implementation:
- Only supported token-level truncation
- No metrics tracking
- Lacked numerical stability
- Limited configurability

####  **After (New Framework)**

Configuration moved to `algorithm` section for better organization:

```yaml
algorithm:
  # Main on/off switch: null = disabled, float = enabled
  rollout_is_threshold: 2.0

  # Control weight application (independent of metrics computation)
  rollout_is: true  # true = apply weights, false = metrics only

  # Optional: lower threshold (defaults to 1/upper if null)
  rollout_is_threshold_lower: null

  # Aggregation level: "token", "sequence", or "geometric"
  rollout_is_level: token

  # Bounding mode: "truncate" or "mask"
  rollout_is_mode: truncate

  # Veto threshold for catastrophic outliers (null = disabled)
  rollout_is_veto_threshold: 1e-4

# REQUIRED: Enable log probability calculation
actor_rollout_ref:
  rollout:
    calculate_log_probs: true
```

### Configuration Examples

**1. Token-level truncation (recommended starting point)**
```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: true
  rollout_is_level: token
  rollout_is_mode: truncate
```

**2. Sequence-level masking (more aggressive)**
```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: true
  rollout_is_level: sequence
  rollout_is_mode: mask
```

**3. Metrics-only mode (monitoring without correction)**
```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: false  # Compute metrics but don't apply weights
  rollout_is_level: token
  rollout_is_mode: truncate
```

**Example script:** `bash
examples/rollout_importance_sampling/run_with_rollout_is.sh`

---

## Code Changes Overview

### New Files (4 files, 1,442 lines)

1. **`verl/trainer/ppo/mismatch_helper.py`** (459 lines)
   - Core implementation of IS weight computation
   - Three aggregation levels: token, sequence, geometric
   - Two bounding modes: truncate, mask
   - Veto mechanism for outlier detection
   - Comprehensive metrics computation (IS + mismatch)
   - All computations in log-space for numerical stability
   - Memory-efficient design

2. **`docs/advance/rollout_is_migration.md`** (642 lines)
   - Comprehensive migration guide from legacy TIS
   - Detailed explanation of all configuration options
   - Recommended threshold ranges for each aggregation level
   - Troubleshooting guide and best practices
   - Metrics interpretation guide

3. **`examples/rollout_importance_sampling/README.md`** (242 lines)
   - Quick start guide with working examples
   - Configuration templates for common scenarios
   - Threshold tuning guidelines
   - Metrics monitoring instructions

4. **`examples/rollout_importance_sampling/run_with_rollout_is.sh`** (99
lines)
   - Complete working example script
   - Demonstrates token-level and sequence-level configurations
   - Ready to run with minimal modifications

### Modified Core Files (9 files)

1. **`verl/trainer/ppo/core_algos.py`** (~50 lines changed)
   - Removed legacy TIS logic (`tis_imp_ratio_cap`)
   - Added `rollout_is_weights` parameter to all policy loss functions
   - Unified IS weight application interface across all PPO variants:
     - `compute_policy_loss_vanilla`
     - `compute_policy_loss_gspo`
     - `compute_policy_loss_gpg`
     - `compute_policy_loss_clip_cov`
     - `compute_policy_loss_kl_cov`
     - `compute_policy_loss_geo_mean`
   - Special handling for `geo_mean` (sequence-level aggregation)

2. **`verl/trainer/ppo/ray_trainer.py`** (~52 lines added)
   - New method: `compute_rollout_importance_weights_and_add_to_batch()`
   - Centralized IS computation (once per batch, on driver)
- Conditional weight distribution to workers based on
`algorithm.rollout_is`
   - Metrics collection and aggregation
   - Integration with existing training loop

3. **`verl/trainer/config/algorithm.py`** (+18 lines)
   - Added 6 new Rollout IS parameters:
     - `rollout_is_threshold` (main on/off switch)
     - `rollout_is` (weight application control)
     - `rollout_is_threshold_lower`
     - `rollout_is_level`
     - `rollout_is_mode`
     - `rollout_is_veto_threshold`
   - Comprehensive docstrings explaining each parameter

4. **`verl/workers/config/actor.py`** (-1 line)
   - Removed deprecated `tis_imp_ratio_cap` parameter

5. **`verl/workers/actor/dp_actor.py`** (~26 lines changed)
   - Updated to use new `rollout_is_weights` parameter
   - Removed legacy TIS logic

6. **`verl/workers/actor/megatron_actor.py`** (~15 lines changed)
   - Updated to use new `rollout_is_weights` parameter
   - Removed legacy TIS logic

7. **Configuration Files** (4 files updated)
   - `verl/trainer/config/ppo_trainer.yaml`
   - `verl/trainer/config/ppo_megatron_trainer.yaml`
   - `verl/trainer/config/_generated_ppo_trainer.yaml`
   - `verl/trainer/config/_generated_ppo_megatron_trainer.yaml`
- Added default Rollout IS configuration section with explanatory
comments

### Testing (2 files, 530 lines)

1. **`tests/trainer/ppo/test_rollout_is.py`** (289 lines)
   - Unit tests for `mismatch_helper.py`
   - Coverage for all aggregation levels (token, sequence, geometric)
   - Coverage for all bounding modes (truncate, mask)
   - Veto mechanism tests
   - Edge case handling (zeros, extremes, empty sequences)
   - Numerical stability verification
   - Metrics correctness validation

2. **`tests/trainer/ppo/test_rollout_is_integration.py`** (241 lines)
   - Integration tests with PPO training loop
   - End-to-end workflow validation
   - Batch processing tests
   - Configuration validation
   - Metrics collection verification
   - Compatibility with distributed training

### Updated Recipes (2 files)

1. **`recipe/dapo/dapo_ray_trainer.py`** (+5 lines)
   - Updated imports to use new framework

2. **`recipe/dapo/run_dapo_qwen2.5_32b_tis.sh`** (~42 lines changed)
   - Migrated from legacy TIS to new Rollout IS configuration
   - Updated documentation and comments

### Documentation Updates (2 files)

1. **`docs/examples/config.rst`** (~22 lines changed)
   - Updated configuration examples
   - Added Rollout IS section

2. **`docs/index.rst`** (+1 line)
   - Added link to Rollout IS migration guide

---

## Implementation Highlights

### Centralized Architecture

The new design follows a clean separation of concerns:

```
ray_trainer.py (driver)
    └─> compute_rollout_importance_weights_and_add_to_batch()
         └─> mismatch_helper.compute_rollout_importance_weights()
              ├─> Computes IS weights (token/sequence/geometric)
              ├─> Applies bounding (truncate/mask)
              ├─> Veto mechanism for outliers
              ├─> Computes IS metrics
              └─> Computes mismatch metrics (KL, PPL)
    └─> Conditionally adds weights to batch (if rollout_is=True)
    └─> Distributes batch to workers

actor workers (dp_actor, megatron_actor)
    └─> Receive batch with rollout_is_weights (if enabled)
    └─> Pass weights to policy loss function

core_algos.py
    └─> All policy loss functions accept rollout_is_weights
    └─> Apply weights if provided: pg_losses *= rollout_is_weights
```

### Key Design Decisions

1. **Centralized Computation**: IS weights computed once on driver, not
per worker
   - Reduces redundant computation
   - Ensures consistency across workers
   - Simplifies debugging and metrics collection

2. **Configuration in Algorithm**: Moved from actor config to algorithm
config
- Better conceptual organization (algorithm-level concern, not
worker-level)
   - Easier to manage and validate
   - Consistent with other algorithm parameters

3. **Two-Level Control**:
   - `rollout_is_threshold`: Enables/disables entire system (null = off)
- `rollout_is`: Controls weight application (true = apply, false =
metrics only)
   - Allows flexible monitoring and gradual rollout

4. **Metrics Consolidation**: Mismatch metrics computed within IS weight
computation
   - Eliminates duplicate computation
   - Reduces memory overhead
   - Maintains metric accuracy

5. **Universal PPO Support**: Single interface for all PPO variants
   - Minimal code changes required
   - Consistent behavior across algorithms
   - Easy to add new variants

---

## Migration Guide

### For Users of Legacy TIS

**Step 1: Update your configuration file**

```yaml
# OLD (remove this)
actor_rollout_ref:
  actor:
    tis_imp_ratio_cap: 2.0

# NEW (add this)
algorithm:
  rollout_is_threshold: 2.0  # Use same value as old tis_imp_ratio_cap
  rollout_is: true
  rollout_is_level: token
  rollout_is_mode: truncate

# REQUIRED (add if not present)
actor_rollout_ref:
  rollout:
    calculate_log_probs: true
```

**Step 2: Monitor metrics**

The first time you run with the new configuration, check these metrics:
- `mismatch/rollout_is_eff_sample_size`: Should be > 80% of batch size
- `mismatch/rollout_is_veto_fraction`: Should be < 5%
- `mismatch/rollout_is_mean`: Should be close to 1.0

**Step 3: Tune if needed**

If effective sample size is too low:
- Increase `rollout_is_threshold`
- Try `rollout_is_mode: mask` with appropriate lower bound
- Consider `rollout_is_level: sequence` for more aggressive correction

For detailed guidance, see `docs/advance/rollout_is_migration.md`.

### For New Users

Start with recommended defaults:

```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: true
  rollout_is_level: token
  rollout_is_mode: truncate

actor_rollout_ref:
  rollout:
    calculate_log_probs: true
```

Run the example script to see it in action:
```bash
bash examples/rollout_importance_sampling/run_with_rollout_is.sh
```

---

## Testing

### Unit Tests
- **289 lines** of comprehensive unit tests in `test_rollout_is.py`
- Covers all aggregation levels, bounding modes, and edge cases
- Validates numerical stability and correctness
- Fast execution (~1-2 seconds)

### Integration Tests
- **241 lines** of integration tests in `test_rollout_is_integration.py`
- End-to-end workflow with PPO training loop
- Distributed training compatibility
- Metrics collection validation
- Moderate execution time (~10-20 seconds)

### Running Tests
```bash
# Run all Rollout IS tests
pytest tests/trainer/ppo/test_rollout_is.py -v
pytest tests/trainer/ppo/test_rollout_is_integration.py -v

# Run specific test
pytest tests/trainer/ppo/test_rollout_is.py::test_token_level_truncate -v
```

---

## Metrics Reference

### Rollout IS Metrics (all prefixed with `mismatch/`)

| Metric | Description | Ideal Range |
|--------|-------------|-------------|
| `rollout_is_eff_sample_size` | Effective number of samples after IS |
> 80% of batch |
| `rollout_is_mean` | Mean IS weight | ~1.0 |
| `rollout_is_std` | Standard deviation of IS weights | Low variance |
| `rollout_is_p25` | 25th percentile | ~0.8-1.0 |
| `rollout_is_p50` | Median IS weight | ~1.0 |
| `rollout_is_p75` | 75th percentile | ~1.0-1.2 |
| `rollout_is_p95` | 95th percentile | < threshold |
| `rollout_is_p99` | 99th percentile | < threshold |
| `rollout_is_max` | Maximum weight | ≤ threshold |
| `rollout_is_min` | Minimum weight | ≥ lower threshold (mask mode) |
| `rollout_is_veto_fraction` | % sequences vetoed | < 5% |
| `rollout_is_catastrophic_token_fraction` | % catastrophic tokens | <
1% |
| `rollout_is_masked_fraction` | % tokens masked (mask mode) | Variable
|

### Mismatch Metrics (all prefixed with `mismatch/`)

| Metric | Description | What It Means |
|--------|-------------|---------------|
| `mismatch_kl` | Forward KL divergence | Distribution difference
(rollout vs training) |
| `mismatch_k3_kl` | K3 KL estimator | Stable KL estimate for small
divergences |
| `mismatch_training_ppl` | Training policy perplexity | Prediction
difficulty of training policy |
| `mismatch_rollout_ppl` | Rollout policy perplexity | Prediction
difficulty of rollout policy |
| `mismatch_ppl_ratio` | Ratio of training to rollout PPL | Relative
prediction difficulty |
| `mismatch_log_ppl_diff` | Log perplexity difference | Sequence-level
PPL mismatch |
| `mismatch_log_ppl_abs_diff` | Absolute log PPL difference | Magnitude
of mismatch |
| `mismatch_log_ppl_diff_max` | Max log PPL difference | Worst-case
mismatch |
| `mismatch_log_ppl_diff_min` | Min log PPL difference | Best-case
mismatch |
| `mismatch_training_log_ppl` | Log of training PPL | Log-scale training
perplexity |
| `mismatch_rollout_log_ppl` | Log of rollout PPL | Log-scale rollout
perplexity |

---

## Performance Impact

### Memory
- Minimal overhead: ~1-2% increase in peak memory usage
- Efficient log-space computation
- No large intermediate tensors

### Computation
- Negligible impact on training speed: < 1% overhead
- Centralized computation on driver (no per-worker redundancy)
- Optimized tensor operations

### Training Stability
- Significant improvement in stability when distribution mismatch exists
- Faster convergence in many scenarios
- Reduced risk of training collapse

---

## Breaking Changes

> [!IMPORTANT]
> This PR contains **BREAKING CHANGES** to the configuration API.

### Removed
- `actor_rollout_ref.actor.tis_imp_ratio_cap`: No longer supported

### Migration Required
All users of the legacy TIS implementation must update their
configuration files. See the migration guide above or
`docs/advance/rollout_is_migration.md` for detailed instructions.

### Backward Compatibility
- No backward compatibility with legacy TIS
- Configuration files with `tis_imp_ratio_cap` will raise validation
errors
- Affected recipes have been updated in this PR

---

## Pre-Submission Checklist

- [x] Search for similar PRs:
[https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling](https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling)
- [x] Format PR title as `[{modules}] {type}: {description}` (checked by
CI)
- **Suggested title:** `[BREAKING][rollout, trainer, algo] feat:
implement comprehensive Rollout Importance Sampling framework`
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting)
- [x] Add/update
[documentation](https://github.com/volcengine/verl/tree/main/docs) (3
new docs, 2 updated)
- [x] Add unit and integration tests (530 lines of tests)
- [x] Once PR is ready for CI, send message in `ci-request` channel

---

## References

- **Blog post:** [When Speed Kills Stability: Demystifying RL Collapse
from the Inference-Training
Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda)
- **Migration guide:** `docs/advance/rollout_is_migration.md`
- **Examples:** `examples/rollout_importance_sampling/`
- **Tests:** `tests/trainer/ppo/test_rollout_is*.py`
2025-10-13 11:06:36 -07:00
21271aabb9 [BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation (#3694)
# Rollout Importance Sampling Framework

## Summary

This PR introduces a comprehensive **Rollout Importance Sampling (IS)**
framework to correct distribution mismatch between data-collecting
(rollout) and training policies, a critical factor for ensuring stable
and efficient model training in RL fine-tuning.

This work is motivated by the analysis in our blog post, [When Speed
Kills Stability: Demystifying RL Collapse from the Inference-Training
Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda).
If you find this implementation useful in your research, please consider
citing:

```bibtex
@misc{liu-li-2025,
  title = {When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch},
  url = {https://yingru.notion.site/When-Speed-Kills-Stability-Demystifying-RL-Collapse-from-the-Inference-Training-Mismatch-271211a558b7808d8b12d403fd15edda},
  author = {Jiacai Liu and Yingru Li and Yuqian Fu and Jiawei Wang and Qian Liu and Yu Shen},
  year = {2025},
  month = {September},
}
```

---

## Problem Statement

When using different policies for rollout generation (e.g., vLLM with
BFloat16) and training (e.g., FSDP with FP32), distribution mismatch
occurs, leading to:
- Biased gradient estimates
- Training instability and collapse
- Reduced sample efficiency
- Poor convergence properties

This framework addresses these issues through principled importance
sampling correction.

---

## Key Features & Improvements

### 1. **Flexible Aggregation Levels**
Three methods for calculating IS weights:
- **`token`**: Per-token importance ratios
- **`sequence`**: Product of per-token ratios
- **`geometric`**: Geometric mean of ratios

### 2. **Advanced Bounding Modes**
Two strategies to control weight variance:
- **`truncate`** (TIS): Caps weights at upper threshold only, preserving
gradients
- **`clip`** (CIS): Zeros out weights outside bounds, more aggressive
filtering

### 3. **Comprehensive Diagnostics**
Detailed metrics to monitor distribution mismatch and training health:

**Rollout IS Metrics** (automatically prefixed with `mismatch/`):
- Health indicators: `rollout_is_eff_sample_size`, `rollout_is_mean`
- Distribution statistics: `rollout_is_p25`, `rollout_is_p50`,
`rollout_is_p75`, `rollout_is_p95`, `rollout_is_p99`, `rollout_is_max`,
`rollout_is_min`, `rollout_is_std`
- Diagnostics: `rollout_is_veto_fraction`,
`rollout_is_catastrophic_token_fraction`, `rollout_is_clipped_fraction`
(clip mode)
- Sequence-level statistics (for sequence/geometric modes):
`rollout_is_seq_mean`, `rollout_is_seq_std`, `rollout_is_seq_max`,
`rollout_is_seq_min`, etc.

**Mismatch Metrics** (computed efficiently within IS weight
computation):
- KL Divergence: `mismatch_kl` (forward KL), `mismatch_k3_kl` (K3
estimator for stability)
- Perplexity: `mismatch_training_ppl`, `mismatch_rollout_ppl`,
`mismatch_ppl_ratio`
- Log perplexity statistics: `mismatch_log_ppl_diff`,
`mismatch_log_ppl_abs_diff`, `mismatch_log_ppl_diff_max`,
`mismatch_log_ppl_diff_min`

### 4. **Outlier Mitigation**
- **Veto mechanism**: Automatically discards samples with catastrophic
importance weights (per-token ratios below threshold)
- Prevents gradient corruption from extreme outliers
- Configurable threshold (default: 1e-4)

### 5. **Numerical Stability**
- All core computations in **log-space** to prevent underflow/overflow
- Carefully designed clipping and bounding to maintain numerical
precision
- Safe handling of edge cases (zero probabilities, extreme ratios)

### 6. **Memory Efficiency**
- Optimized computation to minimize CUDA memory usage
- Efficient metric aggregation without large intermediate tensors
- Suitable for large-scale distributed training

### 7. **Metrics-Only Mode**
- Compute and monitor mismatch metrics **without** applying IS weights
- Useful for:
  - Understanding distribution mismatch before intervention
  - Deciding whether IS correction is needed
  - A/B testing IS impact
- Controlled by `algorithm.rollout_is` flag (independent of weight
computation)

### 8. **Universal PPO Support**
- Integrated with **all PPO variants**: vanilla, GSPO, GPG, Clip-Cov,
KL-Cov, geo_mean
- Consistent interface across different policy loss functions
- Automatic weight application when enabled

---

## API and Configuration Changes

### Migration from Legacy TIS

####  **Before (REMOVED)**
```yaml
# Old TIS configuration - NO LONGER SUPPORTED
actor_rollout_ref:
  actor:
    tis_imp_ratio_cap: 2.0  # Removed from actor config
```

The legacy implementation:
- Only supported token-level truncation
- No metrics tracking
- Lacked numerical stability
- Limited configurability

####  **After (New Framework)**

Configuration moved to `algorithm` section for better organization:

```yaml
algorithm:
  # Main on/off switch: null = disabled, float = enabled
  rollout_is_threshold: 2.0

  # Control weight application (independent of metrics computation)
  rollout_is: true  # true = apply weights, false = metrics only

  # Optional: lower threshold (defaults to 1/upper if null)
  rollout_is_threshold_lower: null

  # Aggregation level: "token", "sequence", or "geometric"
  rollout_is_level: token

  # Bounding mode: "truncate" or "clip"
  rollout_is_mode: truncate

  # Veto threshold for catastrophic outliers (null = disabled)
  rollout_is_veto_threshold: 1e-4

# REQUIRED: Enable log probability calculation
actor_rollout_ref:
  rollout:
    calculate_log_probs: true
```

### Configuration Examples

**1. Token-level truncation (recommended starting point)**
```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: true
  rollout_is_level: token
  rollout_is_mode: truncate
```

**2. Sequence-level clipping (more aggressive)**
```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: true
  rollout_is_level: sequence
  rollout_is_mode: clip
```

**3. Metrics-only mode (monitoring without correction)**
```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: false  # Compute metrics but don't apply weights
  rollout_is_level: token
  rollout_is_mode: truncate
```

**Example script:** `bash
examples/rollout_importance_sampling/run_with_rollout_is.sh`

---

## Code Changes Overview

### New Files (4 files, 1,442 lines)

1. **`verl/trainer/ppo/mismatch_helper.py`** (459 lines)
   - Core implementation of IS weight computation
   - Three aggregation levels: token, sequence, geometric
   - Two bounding modes: truncate, clip
   - Veto mechanism for outlier detection
   - Comprehensive metrics computation (IS + mismatch)
   - All computations in log-space for numerical stability
   - Memory-efficient design

2. **`docs/advance/rollout_is_migration.md`** (642 lines)
   - Comprehensive migration guide from legacy TIS
   - Detailed explanation of all configuration options
   - Recommended threshold ranges for each aggregation level
   - Troubleshooting guide and best practices
   - Metrics interpretation guide

3. **`examples/rollout_importance_sampling/README.md`** (242 lines)
   - Quick start guide with working examples
   - Configuration templates for common scenarios
   - Threshold tuning guidelines
   - Metrics monitoring instructions

4. **`examples/rollout_importance_sampling/run_with_rollout_is.sh`** (99
lines)
   - Complete working example script
   - Demonstrates token-level and sequence-level configurations
   - Ready to run with minimal modifications

### Modified Core Files (9 files)

1. **`verl/trainer/ppo/core_algos.py`** (~50 lines changed)
   - Removed legacy TIS logic (`tis_imp_ratio_cap`)
   - Added `rollout_is_weights` parameter to all policy loss functions
   - Unified IS weight application interface across all PPO variants:
     - `compute_policy_loss_vanilla`
     - `compute_policy_loss_gspo`
     - `compute_policy_loss_gpg`
     - `compute_policy_loss_clip_cov`
     - `compute_policy_loss_kl_cov`
     - `compute_policy_loss_geo_mean`
   - Special handling for `geo_mean` (sequence-level aggregation)

2. **`verl/trainer/ppo/ray_trainer.py`** (~52 lines added)
   - New method: `compute_rollout_importance_weights_and_add_to_batch()`
   - Centralized IS computation (once per batch, on driver)
- Conditional weight distribution to workers based on
`algorithm.rollout_is`
   - Metrics collection and aggregation
   - Integration with existing training loop

3. **`verl/trainer/config/algorithm.py`** (+18 lines)
   - Added 6 new Rollout IS parameters:
     - `rollout_is_threshold` (main on/off switch)
     - `rollout_is` (weight application control)
     - `rollout_is_threshold_lower`
     - `rollout_is_level`
     - `rollout_is_mode`
     - `rollout_is_veto_threshold`
   - Comprehensive docstrings explaining each parameter

4. **`verl/workers/config/actor.py`** (-1 line)
   - Removed deprecated `tis_imp_ratio_cap` parameter

5. **`verl/workers/actor/dp_actor.py`** (~26 lines changed)
   - Updated to use new `rollout_is_weights` parameter
   - Removed legacy TIS logic

6. **`verl/workers/actor/megatron_actor.py`** (~15 lines changed)
   - Updated to use new `rollout_is_weights` parameter
   - Removed legacy TIS logic

7. **Configuration Files** (4 files updated)
   - `verl/trainer/config/ppo_trainer.yaml`
   - `verl/trainer/config/ppo_megatron_trainer.yaml`
   - `verl/trainer/config/_generated_ppo_trainer.yaml`
   - `verl/trainer/config/_generated_ppo_megatron_trainer.yaml`
- Added default Rollout IS configuration section with explanatory
comments

### Testing (2 files, 530 lines)

1. **`tests/trainer/ppo/test_rollout_is.py`** (289 lines)
   - Unit tests for `mismatch_helper.py`
   - Coverage for all aggregation levels (token, sequence, geometric)
   - Coverage for all bounding modes (truncate, clip)
   - Veto mechanism tests
   - Edge case handling (zeros, extremes, empty sequences)
   - Numerical stability verification
   - Metrics correctness validation

2. **`tests/trainer/ppo/test_rollout_is_integration.py`** (241 lines)
   - Integration tests with PPO training loop
   - End-to-end workflow validation
   - Batch processing tests
   - Configuration validation
   - Metrics collection verification
   - Compatibility with distributed training

### Updated Recipes (2 files)

1. **`recipe/dapo/dapo_ray_trainer.py`** (+5 lines)
   - Updated imports to use new framework

2. **`recipe/dapo/run_dapo_qwen2.5_32b_tis.sh`** (~42 lines changed)
   - Migrated from legacy TIS to new Rollout IS configuration
   - Updated documentation and comments

### Documentation Updates (2 files)

1. **`docs/examples/config.rst`** (~22 lines changed)
   - Updated configuration examples
   - Added Rollout IS section

2. **`docs/index.rst`** (+1 line)
   - Added link to Rollout IS migration guide

---

## Implementation Highlights

### Centralized Architecture

The new design follows a clean separation of concerns:

```
ray_trainer.py (driver)
    └─> compute_rollout_importance_weights_and_add_to_batch()
         └─> mismatch_helper.compute_rollout_importance_weights()
              ├─> Computes IS weights (token/sequence/geometric)
              ├─> Applies bounding (truncate/clip)
              ├─> Veto mechanism for outliers
              ├─> Computes IS metrics
              └─> Computes mismatch metrics (KL, PPL)
    └─> Conditionally adds weights to batch (if rollout_is=True)
    └─> Distributes batch to workers

actor workers (dp_actor, megatron_actor)
    └─> Receive batch with rollout_is_weights (if enabled)
    └─> Pass weights to policy loss function

core_algos.py
    └─> All policy loss functions accept rollout_is_weights
    └─> Apply weights if provided: pg_losses *= rollout_is_weights
```

### Key Design Decisions

1. **Centralized Computation**: IS weights computed once on driver, not
per worker
   - Reduces redundant computation
   - Ensures consistency across workers
   - Simplifies debugging and metrics collection

2. **Configuration in Algorithm**: Moved from actor config to algorithm
config
- Better conceptual organization (algorithm-level concern, not
worker-level)
   - Easier to manage and validate
   - Consistent with other algorithm parameters

3. **Two-Level Control**:
   - `rollout_is_threshold`: Enables/disables entire system (null = off)
- `rollout_is`: Controls weight application (true = apply, false =
metrics only)
   - Allows flexible monitoring and gradual rollout

4. **Metrics Consolidation**: Mismatch metrics computed within IS weight
computation
   - Eliminates duplicate computation
   - Reduces memory overhead
   - Maintains metric accuracy

5. **Universal PPO Support**: Single interface for all PPO variants
   - Minimal code changes required
   - Consistent behavior across algorithms
   - Easy to add new variants

---

## Migration Guide

### For Users of Legacy TIS

**Step 1: Update your configuration file**

```yaml
# OLD (remove this)
actor_rollout_ref:
  actor:
    tis_imp_ratio_cap: 2.0

# NEW (add this)
algorithm:
  rollout_is_threshold: 2.0  # Use same value as old tis_imp_ratio_cap
  rollout_is: true
  rollout_is_level: token
  rollout_is_mode: truncate

# REQUIRED (add if not present)
actor_rollout_ref:
  rollout:
    calculate_log_probs: true
```

**Step 2: Monitor metrics**

The first time you run with the new configuration, check these metrics:
- `mismatch/rollout_is_eff_sample_size`: Should be > 80% of batch size
- `mismatch/rollout_is_veto_fraction`: Should be < 5%
- `mismatch/rollout_is_mean`: Should be close to 1.0

**Step 3: Tune if needed**

If effective sample size is too low:
- Increase `rollout_is_threshold`
- Try `rollout_is_mode: clip` with appropriate lower bound
- Consider `rollout_is_level: sequence` for more aggressive correction

For detailed guidance, see `docs/advance/rollout_is_migration.md`.

### For New Users

Start with recommended defaults:

```yaml
algorithm:
  rollout_is_threshold: 2.0
  rollout_is: true
  rollout_is_level: token
  rollout_is_mode: truncate

actor_rollout_ref:
  rollout:
    calculate_log_probs: true
```

Run the example script to see it in action:
```bash
bash examples/rollout_importance_sampling/run_with_rollout_is.sh
```

---

## Testing

### Unit Tests
- **289 lines** of comprehensive unit tests in `test_rollout_is.py`
- Covers all aggregation levels, bounding modes, and edge cases
- Validates numerical stability and correctness
- Fast execution (~1-2 seconds)

### Integration Tests
- **241 lines** of integration tests in `test_rollout_is_integration.py`
- End-to-end workflow with PPO training loop
- Distributed training compatibility
- Metrics collection validation
- Moderate execution time (~10-20 seconds)

### Running Tests
```bash
# Run all Rollout IS tests
pytest tests/trainer/ppo/test_rollout_is.py -v
pytest tests/trainer/ppo/test_rollout_is_integration.py -v

# Run specific test
pytest tests/trainer/ppo/test_rollout_is.py::test_token_level_truncate -v
```

---

## Metrics Reference

### Rollout IS Metrics (all prefixed with `mismatch/`)

| Metric | Description | Ideal Range |
|--------|-------------|-------------|
| `rollout_is_eff_sample_size` | Effective number of samples after IS |
> 80% of batch |
| `rollout_is_mean` | Mean IS weight | ~1.0 |
| `rollout_is_std` | Standard deviation of IS weights | Low variance |
| `rollout_is_p25` | 25th percentile | ~0.8-1.0 |
| `rollout_is_p50` | Median IS weight | ~1.0 |
| `rollout_is_p75` | 75th percentile | ~1.0-1.2 |
| `rollout_is_p95` | 95th percentile | < threshold |
| `rollout_is_p99` | 99th percentile | < threshold |
| `rollout_is_max` | Maximum weight | ≤ threshold |
| `rollout_is_min` | Minimum weight | ≥ lower threshold (clip mode) |
| `rollout_is_veto_fraction` | % sequences vetoed | < 5% |
| `rollout_is_catastrophic_token_fraction` | % catastrophic tokens | <
1% |
| `rollout_is_clipped_fraction` | % tokens clipped (clip mode) |
Variable |

### Mismatch Metrics (all prefixed with `mismatch/`)

| Metric | Description | What It Means |
|--------|-------------|---------------|
| `mismatch_kl` | Forward KL divergence | Distribution difference
(rollout vs training) |
| `mismatch_k3_kl` | K3 KL estimator | Stable KL estimate for small
divergences |
| `mismatch_training_ppl` | Training policy perplexity | Prediction
difficulty of training policy |
| `mismatch_rollout_ppl` | Rollout policy perplexity | Prediction
difficulty of rollout policy |
| `mismatch_ppl_ratio` | Ratio of training to rollout PPL | Relative
prediction difficulty |
| `mismatch_log_ppl_diff` | Log perplexity difference | Sequence-level
PPL mismatch |
| `mismatch_log_ppl_abs_diff` | Absolute log PPL difference | Magnitude
of mismatch |
| `mismatch_log_ppl_diff_max` | Max log PPL difference | Worst-case
mismatch |
| `mismatch_log_ppl_diff_min` | Min log PPL difference | Best-case
mismatch |
| `mismatch_training_log_ppl` | Log of training PPL | Log-scale training
perplexity |
| `mismatch_rollout_log_ppl` | Log of rollout PPL | Log-scale rollout
perplexity |

---

## Performance Impact

### Memory
- Minimal overhead: ~1-2% increase in peak memory usage
- Efficient log-space computation
- No large intermediate tensors

### Computation
- Negligible impact on training speed: < 1% overhead
- Centralized computation on driver (no per-worker redundancy)
- Optimized tensor operations

### Training Stability
- Significant improvement in stability when distribution mismatch exists
- Faster convergence in many scenarios
- Reduced risk of training collapse

---

## Breaking Changes

> [!IMPORTANT]
> This PR contains **BREAKING CHANGES** to the configuration API.

### Removed
- `actor_rollout_ref.actor.tis_imp_ratio_cap`: No longer supported

### Migration Required
All users of the legacy TIS implementation must update their
configuration files. See the migration guide above or
`docs/advance/rollout_is_migration.md` for detailed instructions.

### Backward Compatibility
- No backward compatibility with legacy TIS
- Configuration files with `tis_imp_ratio_cap` will raise validation
errors
- Affected recipes have been updated in this PR

---

## Pre-Submission Checklist

- [x] Search for similar PRs:
[https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling](https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling)
- [x] Format PR title as `[{modules}] {type}: {description}` (checked by
CI)
- **Suggested title:** `[BREAKING][rollout, trainer, algo] feat:
implement comprehensive Rollout Importance Sampling framework`
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting)
- [x] Add/update
[documentation](https://github.com/volcengine/verl/tree/main/docs) (3
new docs, 2 updated)
- [x] Add unit and integration tests (530 lines of tests)
- [x] Once PR is ready for CI, send message in `ci-request` channel

---

## References

- **Blog post:** [When Speed Kills Stability: Demystifying RL Collapse
from the Inference-Training
Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda)
- **Migration guide:** `docs/advance/rollout_is_migration.md`
- **Examples:** `examples/rollout_importance_sampling/`
- **Tests:** `tests/trainer/ppo/test_rollout_is*.py`

---------

Co-authored-by: Yan Bai <bayan@nvidia.com>
2025-10-13 17:05:29 +08:00
7f27789961 [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739)
### What does this PR do?

> Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to
align with Hugging Face Trainer API。

The following pull request is for refactoring the optimizer, however,
the naming issue persists.
https://github.com/volcengine/verl/pull/3656 
### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weiqi.li <weiqi.li@bytedance.com>
2025-10-13 15:58:59 +08:00
e9ee6b39c6 [model] fix: qwen3vl models shape mismatch error with SP (#3735) 2025-10-13 13:09:10 +08:00
9d4554b931 [model] fix: qwen3vl training stuck with mixed text-image data (#3734) 2025-10-13 13:08:13 +08:00
71cf69e7ad [ci] feat: increase sft e2e time (#3738)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-13 11:29:39 +08:00
7ddb9b29f0 [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 (#3600)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR continues the work started in PR #3567 by deprecating and
removing the left_right padding mode
1. Implement no-padding mode for Megatron engine using nested tensors in
sft trainer
2. Deprecating left_right padding mode for FSDP/Megatron engine
3. Introduces a transformation layer within Actor/Critic workers, see
more
[here](https://github.com/volcengine/verl/blob/main/docs/workers/model_engine.rst)
- **Input Format**:​​ Actor/Critic workers continue to receive data in
left_rightpadded format.
- ​​**Transformation**:​​ This layer dynamically converts
left_rightpadded data into the no-padding format using nested tensors.
- **Engine Format**:​​ FSDP and Megatron engines now operate exclusively
using the no-padding data format by default.


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-13 08:18:09 +08:00
8cc9e3af67 [misc] feat: support offline generation with server mode (#3732) 2025-10-12 11:00:33 +08:00
f07596c02e [misc] feat: support build DataProto from TensordDict (#3726)
### What does this PR do?

Add a utility function to support building DataProto from TensorDict,
which helps integrate TransferQueue into verl.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-11 17:28:18 +08:00
656f4e6705 [rollout] chore: Misc changes for extending internal compatibility (#3701)
### What does this PR do?

* New config field:
    * rollout: `pipeline_model_parallel_size` for internal compatibility
* ~~legacy_data: `agent_name` for default agent name if not specified in
the rldataset~~
* Registry for `RolloutReplica`
* `VERL_USE_EXTERNAL_MODULES` to import desired modules to trigger
external registration


### Test

Be covered by CI

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-11 16:08:39 +08:00
d36d3b9cbe [rollout] feat: add default agent name for agent loop (#3716)
### What does this PR do?

Add `default_agent_loop` config if `agent_name` is absent in RLDataset.
2025-10-11 14:45:30 +08:00
e960fbaeab [rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models (#3705)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Add gpt-oss tool parser to enable agent loop training for gpt-oss models
### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
Manually test offline. Let me know if we want to add unit tests. 
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Hejian Sang <hsang@linkedin.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-11 11:53:10 +08:00
d87602432c [fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync (#3712)
## Description

When `peft_config` is set and `base_sync_done` is `True`,
`per_tensor_param` is assigned directly from the `params` dict instead
of `params.items()`, causing `ValueError: too many values to unpack
(expected 2)` when passed to `get_named_tensor_buckets()` which expects
an iterator of `(name, tensor)` tuples.

This fix adds an `isinstance()` check to handle both dict and iterator
cases, maintaining backward compatibility while fixing SGLang rollout
with LoRA adapters.

**Fixes:** `ValueError` in `sglang_rollout.update_weights()` →
`get_named_tensor_buckets()`
**Related:** Multi-turn RL training with LoRA adapters on SGLang backend

---

### What does this PR do?

This PR fixes a type mismatch bug in `fsdp_workers.py` that occurs when
using LoRA adapters with SGLang backend. The issue manifests during
weight synchronization when FSDP workers attempt to pass parameters to
the bucket creation function.

**Root Cause:** Line 681 in `verl/workers/fsdp_workers.py` assigns
`params` dict directly to `per_tensor_param`, but downstream code at
line 1520 in `get_named_tensor_buckets()` expects an iterator of `(name,
tensor)` tuples for unpacking.

**Solution:** Add backward-compatible `isinstance()` check that converts
dict to `.items()` iterator when needed:
```python
per_tensor_param = params.items() if isinstance(params, dict) else params
2025-10-10 21:58:30 +08:00
e01376663b [megatron] feat: add ascend megatron merge support (#3722)
### What does this PR do?

add ascend megatron merge support

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-10 21:54:27 +08:00
152ce6a1de [misc] fix: Allow HF model ID with use_shm (#3663) 2025-10-10 13:44:53 +08:00
2d72c52e1b [misc] fix: model reassign to inner model in vllm patch file (#3668)
### What does this PR do?

The `model` has been re-assigned to its inner model `model.model` so it
does not have `layers` . fixed the reassign issue and refactor the code
logic.


f50e5c2e8f/verl/utils/vllm/patch.py (L83-L87)


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/issues/2834
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-10 12:13:49 +08:00
eb06fda2a9 [data] fix: merge metrics from all workers in DataProto.concat() (#3699)
## Summary
Fix `DataProto.concat()` to properly merge all `meta_info` keys from all
workers, preventing silent data loss when workers have different
non-metric keys.

## Problem
**Previous implementation** only preserved non-metric `meta_info` from
the first worker:
```python
# Old code - only looks at data[0]
merged_meta_info = {k: v for k, v in data[0].meta_info.items() if k != "metrics"}
```

This caused **silent data loss** when workers had different non-metric
keys:
- `data[0].meta_info = {"config": "A"}` ✓ preserved
- `data[1].meta_info = {"extra_key": "B"}`  **lost**
- Result: `{"config": "A"}` - missing `extra_key`

This contradicts the docstring which states meta_info is "merged".

## Solution
**This PR** iterates through ALL workers to merge their non-metric
meta_info while aggregating metrics:
```python
# Merge non-metric meta_info and aggregate metrics from all workers
all_metrics = []
for d in data:
    for k, v in d.meta_info.items():
        if k == "metrics":
            if v is not None:
                if isinstance(v, list):
                    all_metrics.extend(v)
                else:
                    all_metrics.append(v)
        else:
            if k in merged_meta_info:
                # Ensure consistency for overlapping non-metric keys
                assert merged_meta_info[k] == v, f"Conflicting values for meta_info key '{k}'"
            else:
                merged_meta_info[k] = v

if all_metrics:
    merged_meta_info["metrics"] = all_metrics
```

**Key improvements**:
-  All non-metric keys from all workers are preserved
-  Detects conflicting values for the same key across workers
-  Aggregates metrics from all workers in a single loop
-  Handles edge cases: missing metrics, non-list values

## Testing
Added 6 comprehensive unit tests in `tests/test_protocol_on_cpu.py`:
- `test_concat_metrics_from_multiple_workers` - All workers have metrics
- `test_concat_with_empty_and_non_list_meta_info` - Partial metrics
coverage
- `test_concat_first_worker_missing_metrics` - First worker has no
metrics
- `test_concat_non_list_metrics` - Single dict instead of list
- `test_concat_merge_different_non_metric_keys` - Different keys across
workers
- `test_concat_conflicting_non_metric_keys` - Conflict detection

## Files Changed
- `verl/protocol.py`: Updated `DataProto.concat()` to merge all
meta_info keys
- `tests/test_protocol_on_cpu.py`: Added 2 new tests (6 total) covering
all edge cases

---

### Checklist

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
- [x] Pre-commit checks passed (ruff, mypy, etc.)
- [x] Documentation updated (N/A - bug fix, no API changes)
- [x] Unit tests added (4 comprehensive tests covering all edge cases)
- [ ] CI request (pending)
2025-10-10 11:45:08 +08:00
7ffd413734 [megatron, model] fix: VLMs using mbridge together with fused kernels (#3700)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Current code is too rigid in checking support of fused forward and will
go wrong if we use mbridge and even it's a Qwen2_5VLModel, since then
the defined Qwen2.5VL multi-modal model class will be from class
definition in mbridge instead of the one in verl.

Also, many other VLMs supported in mbridge uses the `language_model`
attribute, and we just need to ensure that `model.language_model` is an
instance of mcore defined `GPTModel`, which should be a more flexible
and applicable way for checking support.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-10 11:05:32 +08:00
OC
cf619d68d4 [recipe] fix: move all collabllm files into recipe directory (#3706)
### What does this PR do?

resolve issue https://github.com/volcengine/verl/issues/3606

1. move and register reward manager into custom_reward_function file
2. register agent loop in agent.yaml
3. move collabllm_interation.py into recipe



### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
```
(TaskRunner pid=52293) step:3 - global_seqlen/min:56551 - global_seqlen/max:94884 - global_seqlen/minmax_diff:38333 - global_seqlen/balanced_min:72054 -
```

### API and Usage Example

n/a

### Design & Code Changes

n/a

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-09 18:50:37 +08:00
23877bcc64 [worker] fix: create a new event loop if none exists (#3703)
### What does this PR do?

I am working on integrating transferqueue into verl. Specifically, we
convert metadata into dataproto in the `register` method of
`single_controller/base/decorator.py/`. In this step,
`asyncio.run(tq_client.async_get_data(metadata)` is called to get the
specific data.

If `asyncio.run` and `asyncio.get_event_loop` are called sequentially in
the same thread, a RuntimeError: `There is no current event loop in
thread %r` is thrown.

This PR fixes the above-mentioned issue.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-09 17:11:58 +08:00
e56e3df071 [worker] refactor: Add kwargs to checkpoint related functions in BaseEngine and its subclasses (#3662)
### What does this PR do?

Add `**kwargs` to the checkpoint APIs of `BaseEngine` (and thread them
through `FSDPEngine`/`MegatronEngine`) to allow engines and pluggable
checkpoint backends to accept implementation-specific options without
changing the common interface. This enables extension when users
subclass `BaseEngine` or integrate internal engines, while preserving
backward compatibility—existing calls remain unchanged and extra keys
are simply ignored unless a subclass consumes them.


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: wuxibin <wuxibin@bytedance.com>
2025-10-09 14:56:22 +08:00
54fed7fec7 [rollout] feat: support async mode for multimodal data inference (#3702)
### What does this PR do?


fix https://github.com/volcengine/verl/issues/3518

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-09 14:11:09 +08:00
f06ef09f1c [rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config (#3675)
### What does this PR do?

> Bug fix for #3654

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-09 11:48:32 +08:00
fc489dbaef [rollout] fix: add batch_data_id default value check in AsyncRolloutRequest (#3657)
### What does this PR do?

This PR improves the robustness of the initialize_request method in
verl/workers/rollout/schemas.py. When input_ids exceed max_prompt_len,
if the batch_data_id field is missing from values, it will be
automatically populated with the default value. This prevents errors
during logging and enhances fault tolerance in data processing, making
future extension and troubleshooting more convenient.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: yaopandeng <yaopandeng@baidu.com>
2025-10-09 10:56:10 +08:00
d45d04946b [rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout (#3661)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

The problem is in the `get_tool_call_parser_type` function in
sglang_rollout.py (lines 225-246). The function is checking if
`parser.bot_token.strip()` exists as a single token in the tokenizer's
vocabulary, but for the gpt-oss parser type, the bot_token is
`<|start|>assistant<|channel|>commentary`, which is a compound token
sequence rather than a single special token. For gpt-oss models,
```
parser.bot_token.strip() = <|start|>assistant<|channel|>commentary
This gets tokenized as [200006, 173781, 200005, 12606, 815] (5 tokens)
```
The check parser.bot_token.strip() in tokenizer_vocab returns False
because it's looking for this entire string as a single vocabulary entry
The current logic assumes that bot_token should be a single special
token that exists in the vocabulary, but for GPT-OSS models, it's
actually a sequence of tokens that need to be tokenized.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
unit test offline
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Hejian Sang <hsang@linkedin.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-09 10:51:37 +08:00
baf7506cff [worker] fix: support for vllm V0 deprecation version (#3687)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Related to:
- https://github.com/vllm-project/vllm/pull/25901
- https://github.com/vllm-project/vllm/pull/25345

Now we first try to import `WorkerWrapperBase` from
`vllm.worker.worker_base`, if we have an error, we append `v1` there.

For `compute_logits` patch, we can just remove the import of
`SamplingMetadata`, create a wrapper that accepts any arguments with
*args, **kwargs, and pass them through to the original method, so that
it can be more flexible and future-proof.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-09 10:44:31 +08:00
798a6f8ba0 [trainer] feat: Enabled fused adamw (#3692)
### What does this PR do?

Enable fused adamw which should be generally faster and more memory
efficient.
Also provide the config parameter to set eps.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

In our internal code base, we always set fused adamw to True and works
fine. However right now, I don't have the step time comparison with and
without it.
At the same time, I can push same change in RL code too and setting to
True should be generally beneficial.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-08 08:13:46 +13:00
ab10eb2671 [model] fix: qwen3vl patch (#3686) 2025-10-07 08:32:53 +13:00
7904d0b672 [ci] fix: fix checkpoint converter ci (#3685)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-06 19:42:47 +13:00
1216ce4599 [ci] fix: merge pre-commit-full into pre-commit (#3684)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Right now, the pre-commit-full will always fail as it doesn't install
related dependencies:
https://github.com/volcengine/verl/actions/runs/18251414892

And there's no reason to duplicate pre-commit for pre-commit-full as
they are defined as same workflow, so this PR moves manual triggers and
schedule to pre-commit and remove pre-commit-full

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-06 15:56:11 +13:00
42c55ac6b3 [model] feat: add qwen3vl (#3681)
### What does this PR do?

Add qwen3vl models
Fixes #3607

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-06 15:21:19 +13:00
327e813136 [rollout] fix: qwen2_vl position_ids shape mismatch (#3653)
### What does this PR do?

> Fix qwen2_vl position_id shape mismatch:
`verl/models/transformers/qwen2_vl.py:process_position_ids` expects
`position_ids` to have a shape of `(4, batch_size, seq_length)` but
`verl/experimental/agent_loop/agent_loop.py:generate_sequences` returns
`(batch_size, 3, seq_length)` (which will be transposed to `(3,
batch_size, seq_length)`), ignoring the text dimension. This PR follows
the relevant code in `verl/utils/dataset/rl_dataset.py` to fix the
issue.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs). Not
applicable.
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-05 16:03:12 +08:00
83aebcc133 [ci] fix: disable workflows with self-host machines to run on fork (#3677) 2025-10-04 22:02:41 +13:00
4e9faafc94 [model] fix: stuck issue with mixed text-image data (#3670) 2025-10-04 12:47:09 +13:00
f50e5c2e8f [sglang] feat: add preparation for sglang+verl (#3506)
### What does this PR do?
support npu for verl + sglang

```python
bash examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_npu.sh
```


### Accuracy test
8b:
<img width="747" height="842" alt="8b"
src="https://github.com/user-attachments/assets/f36ef25a-b32f-4c76-97d0-2e5fe53ff183"
/>

30b:
<img width="759" height="850" alt="30b"
src="https://github.com/user-attachments/assets/97979002-7ebf-47fa-ae57-3e9b6637f12c"
/>

### Test


### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: lbk-sys <hello_lbk@163.com>
Co-authored-by: 1StepForever <wangww1Step@foxmail.com>
2025-09-29 10:21:01 +08:00
aa19c1afc4 [recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo (#3644)
### What does this PR do?

- Add example scirpt to run mutip-turn grpo in vllm and fsdp
- fix progressbar in dapo trainer
- When enable_filter is enabled, DAPO runs multiple batch inferences
before each actor update, but the progress bar advances once per
inference—mismatching the true training step count and leading to
confusion.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-28 20:28:25 +08:00
9e2072d120 [megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 (#3640)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Implementation refers to
b600e38d7b (diff-ff40c5dfa6c8106a478517375d98bc4e548ff71bcc3e5b25a4c1cc540f31ed3a)

Use `hasattr(parallel_state, "is_inside_encoder")` for backward
compatibility.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-09-28 12:59:32 +08:00
39e531f29e [rollout,vllm] fix: Add LoRA Loading to Async vLLM (#3639)
### What does this PR do?

Currently, async vLLM with AgentWorkerLoop throws an error when
`update_weights` with LoRA weights. This expands support for
AgentWorkerLoop with LoRAs.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-28 10:13:40 +08:00
abca659ec7 [megatron, worker] fix: use extract_multi_modal_inputs method for handling multi_modal_inputs (#3641)
Follow up for #3553

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.


Without those changes in #3315, the error when we train the mixture
modal dataset will remain unresolved, so it would be a good idea to add
them back.

```logs
File "verl/workers/actor/megatron_actor.py", line 639, in update_policy
  metric_micro_batch = self.forward_backward_batch(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/workers/actor/megatron_actor.py", line 587, in forward_backward_batch
  losses_reduced = forward_backward_func(
                   ^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/pipeline_parallel/schedules.py", line 595, in forward_backward_no_pipelining
  output_tensor, num_tokens = forward_step(
                              ^^^^^^^^^^^^^
File "megatron/core/pipeline_parallel/schedules.py", line 402, in forward_step
  output_tensor, loss_func = forward_step_func(data_iterator, model)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/workers/actor/megatron_actor.py", line 497, in forward_step
  multi_modal_inputs[key] = torch.cat(
                            ^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors
```

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-09-28 10:08:51 +08:00
4ff3ce2fed [algo, perf] feat: Vectorize GRPO Advantage Estimator - 13~26x Speedup (#3635)
### What does this PR do?  

Implements a vectorized GRPO advantage path for outcome-only RL in
core_algos.py, keeping the original implementation intact and
selectable. This yields large speedups at medium–large batch sizes by
replacing Python-side grouping loops with segment reductions and
one-pass gathers.


Results (CPU, Apple M-series example; float32):
```shell
[CPU] bs=  512 T= 512 G= 10 | orig=5.47ms vec=0.21ms speedup=26.16x
[CPU] bs= 1024 T=1024 G= 16 | orig=11.05ms vec=0.54ms speedup=20.60x
[CPU] bs= 2048 T=2048 G= 32 | orig=23.20ms vec=1.74ms speedup=13.32x
```

```shell
[GRPO] seed=0 groups=5 shape=torch.Size([64, 128]) mask_tokens=4147 adv_max_diff=2.384e-07 ret_max_diff=2.384e-07
[GRPO] seed=1 groups=8 shape=torch.Size([128, 256]) mask_tokens=16364 adv_max_diff=2.384e-07 ret_max_diff=2.384e-07
[GRPO] seed=2 groups=10 shape=torch.Size([512, 512]) mask_tokens=130968 adv_max_diff=4.768e-07 ret_max_diff=4.768e-07
```


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: #3634
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-27 17:21:08 +08:00
c03dcb0f8f [model] feat: add glm4v (#3291)
### What does this PR do?

Add GLM4.1V support

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: 武嘉涵 <lambert@wujiahandeMacBook-Pro.local>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-09-27 04:12:14 +08:00
84d5619f99 [2/N][rollout] feat: support vllm/sglang DP+EP in server mode (#3530)
### What does this PR do?

Following https://github.com/volcengine/verl/pull/3456, support
vllm/sglang DP+EP in server mode.
2025-09-26 21:52:03 +08:00
64a9860be2 [trainer] fix: Ref to #3596. More import fix for transformers version higher than 4.55.0 (#3608)
### What does this PR do?

Ref to #3596, more import fix for transformers version higher than
4.55.0

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-26 21:37:46 +08:00
e51305883d [rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas (#3625)
### What does this PR do?

To enable reusing the vllm/sglang rollout replica for the reward model,
I made some modifications to the rollout and reward configuration.

Following PR will implement the reuse.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-26 17:43:45 +08:00
2234810235 [megatron] feat: add mindspeed engine and support sft (#3599)
### What does this PR do?

As per title.

Co-authored with @baymax591 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: baymax591 <cbai@mail.nwpu.edu.cn>
2025-09-26 14:39:10 +08:00
377bbb84f0 [recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation (#3369)
Fix a typo in verl/workers/fsdp_workers.py: 
    original code: if self.model_config.generation_config is not None
    updated code: if self.generation_config is not None

Add async of generation reward model (GRM):
As the generative reward model is slow in the call. It is unreasonable
to wait for all responses to be generated before sending to GRM for
evaluation. So I add an async to start GRM evaluation once individual
response generation is finished.

---------

Co-authored-by: zhichao (jimmy) <zhichao@inflection.ai>
2025-09-26 13:22:00 +08:00
096ab6dc1b [CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error (#3631)
### What does this PR do?

As per title.

This PR is a temporary workaround for the following issues:

https://github.com/volcengine/verl/actions/runs/18013408026/job/51251922982?pr=3625

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-26 13:20:40 +08:00
231e18948d [tool] feat: support load local datasets when preparing datasets (#3621)
### What does this PR do?

This is a follow-up PR to https://github.com/volcengine/verl/pull/3362

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
python examples/data_preprocess/hellaswag.py --local_dataset_path ~/verl/data/hellaswag/ --local_save_dir ~/verl/data/hellaswag_sft
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-26 11:42:53 +08:00
fbfdc81f9a [ci] feat: increase timeout of e2e_sft (#3630)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-26 10:23:25 +08:00
6ff2b43d13 [ci] feat: upgrade sglang to 0.5.2 (#3613)
### What does this PR do?

Solve
https://github.com/volcengine/verl/pull/3530#issuecomment-3332840437
2025-09-26 09:25:53 +08:00
14c397f474 [doc] feat: Adding Table-R1 to the Awesome work (#3627) 2025-09-25 23:26:26 +08:00
21536f2b03 [ci] fix: fix sanity ci (#3626)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 23:15:10 +08:00
515f2255ac [ci] fix: use local models/configs/datasets to increase stability (#3616)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 22:14:56 +08:00
bf7aac2fa7 [rollout, tool] feat: export rollout rewards to total rewards (#3563)
### What does this PR do?

This PR exports rollout rewards including tool calling rewards and
interaction rewards to `compute_score` fn.

Currently, rollout reward_scores is calculated but not used in the final
`compute_score`.

96e7071de1/verl/workers/rollout/sglang_rollout/sglang_rollout.py (L1320-L1324)

Fix #3525 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 17:33:03 +08:00
616e933e29 [worker] fix: correctly determine is_vlm_model if sp > 1 (#3282)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address 2nd issue in
https://github.com/volcengine/verl/pull/3281#issuecomment-3239570745

Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to
check if it's a multi-modal model, but this can go wrong when we set
`data.return_multi_modal_inputs=False`, as that field won't exist even
if it's the VLM model.

As a result, it would be a reliable way to check by seeing if
`vision_config` field is in `self.actor_module.config` referring to
1985eb14ff/verl/workers/fsdp_workers.py (L317-L320)

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-09-25 17:21:40 +08:00
90154aeeb6 [doc] fix: fix doc (#3614)
### What does this PR do?

- Fix url

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 16:11:43 +08:00
7731c5c6ec [rollout] fix: remove code responsible for tool response duplication (#3604)
### What does this PR do?

> The `_handle_processing_tools_state` added the same tool response
twice when using interactions. See
[here](ba8555120a/verl/experimental/agent_loop/tool_agent_loop.py (L273))
and
[here](ba8555120a/verl/experimental/agent_loop/tool_agent_loop.py (L297)).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 16:10:36 +08:00
4d0999c161 [ci] chore: Use local dataset and models in e2e_ascend CI (#3601)
### What does this PR do?

Use local dataset and models in e2e_ascend CI.

Local datasets download by following commands:
```shell
huggingface-cli download --repo-type dataset openai/gsm8k --local-dir ${HOME}/dataset/openai/gsm8k
huggingface-cli download --repo-type dataset hiyouga/geometry3k --local-dir ${HOME}/dataset/hiyouga/geometry3k
```

Local models download by following commands:
```shell
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct
huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct
huggingface-cli download Qwen/Qwen3-0.6B --local-dir ${HOME}/models/Qwen/Qwen3-0.6B
```

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 15:14:45 +08:00
3dfa28ae32 [doc] feat: add model engine doc (#3611)
### What does this PR do?

- Add model engine doc

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-25 14:25:44 +08:00
25d78fa913 [recipe] feat: CollabLLM integration for multiturn training (#3574)
### What does this PR do?

This PR add [CollabLLM](https://aka.ms/CollabLLM) as a training recipe.
The added components include
- A customized `CollabLLMRewardManager` inheriting from
`AbstractRewardManager` to compute multiturn-aware rewards.
- A customized `CollabLLMAgentLoop` inheriting from `AgentLoop` to
sample future conversations with simulated users, which imports
`CollabLLMInteraction` from `verl/interactions/collabllm_interation.py`.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

The training rewards when running `train_rl_collabllm.sh` is increasing
in a relatively stable manner (on 8xH200):
<img width="964" height="480" alt="9baeb0700e3fa6a56596e14a54bc1049"
src="https://github.com/user-attachments/assets/53a810d8-1dd7-4145-bb28-4e475e9d7d9d"
/>

Validation reward:
<img width="974" height="538" alt="39364fd10523b0fde13d48645809f5e3"
src="https://github.com/user-attachments/assets/c34fe9e7-3d83-4132-8e1a-67e82c221d09"
/>

#### Samples of model generation
After training, when user asks generic questions with missing
information, the model learns to ask for clarification
<img width="1213" height="562" alt="c8e0ab31948a48ca396c7eccddd13673"
src="https://github.com/user-attachments/assets/ae41cd77-3c77-4402-b9d3-21993b046a18"
/>
and give suggestions:
<img width="1534" height="190" alt="7adb7d33eb9120d337c2a249c6a2dd22"
src="https://github.com/user-attachments/assets/84e1d8c1-f954-403f-b931-bce45cff1612"
/>

(In contrast, with the same prompt, **GPT-5** doesn't ask for any
clarification:)
<img width="1754" height="1126" alt="be8d8577584c0b2356cb352d6f294205"
src="https://github.com/user-attachments/assets/9b734848-9ed0-4496-af11-68bb8f8d8e08"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# No change on the existing APIs
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

Changes:
- Main files under `recipe/collabllm`
- Registered `CollabLLMRewardManager` in
`workers/reward_manager/collabllm.py`
- Added `CollabLLMInteraction` in
`verl/interactions/collabllm_interation.py`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs). Added
to `verl/docs/algo/collabllm.md`.
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: The scripts
`train_rl_collabllm.sh` and `train_sft_collabllm.sh` are tested multiple
times.
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Chen Haiquan <chenhaiquan@bytedance.com>
2025-09-25 09:53:39 +08:00
ba8555120a [trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 (#3596)
Import the index_first_axis, pad_input, unpad_input, etc in a different
way to handle the case for Transformers version higher than v4.55.0

<img width="1372" height="58" alt="Screenshot 2025-09-24 at 2 44 30 PM"
src="https://github.com/user-attachments/assets/fda7196b-2128-425b-ba15-9951fae39ee2"
/>

Since the modification of
[PR40002](https://github.com/huggingface/transformers/pull/40002) in
Transformers, `index_first_axis, pad_input, unpad_input` have been moved
to the `transformers.modeling_falsh_attention_utils`. The original
import way for NPU cannot handle it.


### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: A1wayzBeenHere <moyicong@h-partners.com>
Co-authored-by: Huazhong <hzji210@gmail.com>
2025-09-24 23:27:48 +08:00
634bd9352b [CI] chore: reopen ppo test in e2e_ascend CI (#3588)
### What does this PR do?

Fix error and reopen ppo test case in `e2e_ascend` CI test.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-24 17:46:30 +08:00
26a734e740 [algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup (#3555)
Vectorize RLOO advantage estimator
130ms -> 6ms
Similar method can be done for other advantage estimators, I just don't
have time

Implements

$$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i}
r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$

<img width="2199" height="628" alt="image"
src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764"
/>
2025-09-24 17:36:41 +08:00
69b0127b74 [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 (#3567)
### What does this PR do?

This PR continues the work started in PR #2733, it adds support for
variable sequence lengths in MultiTurnSFTDataset by introducing a
`no_padding` option for the pad_mode. When this mode is active,
sequences are not padded to a fixed length.
- Implement no-padding mode for FSDP engine using nested tensors in sft
trainer
- Add test for no-padding mode both enable/disable use_remove_padding
- Fix FSDP2 gradnorm issue

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhangchi.usc1992 <zhangchi.usc1992@bytedance.com>
2025-09-24 17:12:31 +08:00
1985eb14ff [megatron] fix: revert megatron actor refactor (#3553)
### What does this PR do?

- Revert megatron actor changes in this PR that causes perf degradation:
https://github.com/volcengine/verl/pull/3206
- We have to revert following PRs that modify the files too:
https://github.com/volcengine/verl/pull/3513 and
https://github.com/volcengine/verl/pull/3315
- We will add them back when we fix the problem

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-24 14:27:05 +08:00
2d362c490b [misc] chore: Update CODEOWNERS (#3594)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-24 14:01:10 +08:00
OC
1b4af4440f [doc] fix: add faq doc to avoid vllm issue 22103 (#3595)
### What does this PR do?

Provide a workaround for [vllm issue
22103](https://github.com/vllm-project/vllm/issues/22103), which may
result grad norm explosion.

You may hit this issue when match below conditions:
1. Using non-Hopper architecture GPUs, such as A100, L20, B200, etc.
2. Using vLLM as the inference engine.
3. The input and output texts are very long, for example, in multi-turn
scenarios using reasioning models like Qwen3 for RL training.

<img width="405" height="278" alt="截屏2025-09-24 下午1 37 50"
src="https://github.com/user-attachments/assets/47aec2e7-7c31-4cba-9f86-03af4f795457"
/>

The issue can be confirmed from comparing rollout_probs_diff_mean
metrics:

<img width="414" height="267" alt="截屏2025-09-24 下午1 39 24"
src="https://github.com/user-attachments/assets/f9cd484e-552a-49a4-b2b3-abb9e311c759"
/>

The workaround is:

`+actor_rollout_ref.rollout.engine_kwargs.vllm.disable_cascade_attn=True`

### Checklist Before Starting

- [ x] Search for similar PRs. Paste at least one query link here: ...
- [ x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

n/a

### API and Usage Example

n/a

### Design & Code Changes

n/a

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-24 13:47:36 +08:00
f1d212c6ec [megatron] feat: use flash as default attention_backend (#3578)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

1. add mapping from string to megatron's enum for attention_backend
choice.
2. use flash attention as default attention_backend, for consistency to
FSDP
2025-09-24 10:40:25 +08:00
aaa4cf590b [sglang] fix: Support SGLang>=0.5.2 (#3526)
`sglang.srt.managers.[tokenizer_manager->io_struct]` fixes refactor
https://github.com/sgl-project/sglang/pull/10028, should be compatible
>=0.4.1.post6 https://github.com/sgl-project/sglang/pull/2630

Can merge https://github.com/volcengine/verl/pull/3484
2025-09-23 20:12:17 +08:00
32575408a8 [ci] fix: fix e2e_sppo ci (#3587)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-23 19:41:40 +08:00
0603b7be1b [megatron] fix: fix bug when holding empty parameters with custom pipeline layout (#3565)
### What does this PR do?

Current code may cause runtime error when one module holds empty
parameter.
For example, when running DeepSeek 671B with megatron
pipeline_model_parallel_layout="E|(t|)*61|L", pp_rank 62 holds empty
parameter and will crash when call the `offload_megatron_optimizer ` or
`load_megatron_optimizer ` function. This PR fixes the bug.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-23 19:07:25 +08:00
5150686536 [misc] feat: remove redundant default params (#3577)
### What does this PR do?
This PR introduces two changes:

1. Removal of redundant default parameters: Default optimizer values are
already set in the .yaml configuration file. Defining them again in
other files is redundant and can cause confusion for users.

2. Alignment of warm-up step logic: Changed the condition from
`num_warmup_steps < 0` to `num_warmup_steps <= 0`. This aligns the code
with the documentation in the YAML file and matches the implementation
in Megatron.

https://github.com/volcengine/verl/blob/main/verl/trainer/config/actor/actor.yaml#L132

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
Co-authored-by: Changlong Yu <changlong.ycl@gmail.com>
2025-09-23 18:49:56 +08:00
5547dbe12b [CI] chore: Update e2e_ascend CI config (#3532)
### What does this PR do?
This PR is committed for solving 3 things:
1. All test cases in `e2e_ascend` CI pipeline use 8 NPUs by default,
which prevents the machine's original performance from being fully
utilized. This PR is committed for solving this problem. Thank @zheliuyu
for finding this problem :)
2. Remove qwen3 grpo test case in `e2e_ascend.yml` because it is similar
to qwen2.5 grpo.
3. Remove ppo test case in `e2e_ascend.yml` because it is not work since
first commit #3502 , @xvxuopop is working for solving this.
4. Update e2e_ascend CI scan path for covering most of file modification
case.
5. Ignore loading `libnuma.so` for Ascend NPU.
6. Fix qwen2_vl flash-attention related functions unavailable error for
Ascend NPU.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-23 18:49:36 +08:00
50368ae291 [trainer] refactor: move rollout log to inheritable trainer (#3576)
### What does this PR do?

move log rollout logic to one standalone function that can be re-used in
other trainers such as `DAPORayTrainer` etc and avoid duplicated code.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.


### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-23 15:52:49 +08:00
4e1948d416 [ci] fix: fix more ci by pin transformers version (#3582)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-23 15:38:44 +08:00
96e7071de1 [trainer,rollout] fix: ensure LoRA weights are loaded when vllm_sleep_level=2 and without using layerd_summon (#3541)
### What does this PR do?

Fix issue where VLLM would only load base model parameters and not LoRA
parameters when VLLM_SLEEP_LEVEL == 2 and not using layered_summon.

This fixes the LoRA trainer error where the first rollout would only use
base model parameters, and subsequent rollouts would correctly load LoRA
parameters.

Fixes: https://github.com/volcengine/verl/issues/3516
Related PR: https://github.com/volcengine/verl/pull/3461

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-22 13:40:43 +08:00
7e4eec7467 [docker] feat: dockerfile rocm7 initial commit (#3547)
### What does this PR do?

Dockerfile for rocm7.0

### Checklist Before Starting

- [x ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### API and Usage Example

```bash
 DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t verl-rocm7.0 .
```
2025-09-22 11:20:39 +08:00
fdbffe7e20 [recipe] fix: init self.model_config in fsdp worker of one-step-off policy (#3556)
### What does this PR do?

Due to updated in the main package, the rollout worker calls
`self.model_config` during `generate_sequences`
(d33c85e2c7/verl/workers/fsdp_workers.py (L869))
which hasn't been initialized in current one-step-off recipe. This will
through out runtime errors.

Similar code in the default fsdp worker:
d33c85e2c7/verl/workers/fsdp_workers.py (L563)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[...](https://github.com/volcengine/verl/pull/3531)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-22 11:16:22 +08:00
93dc6f5783 [recipe] fix: spin fsdp_workers.py bugs (#3544)
Fix TypeError: cannot unpack non-iterable NoneType object

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-22 11:13:50 +08:00
d45b44103a [ci] feat: update ci (#3552)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-22 11:06:23 +08:00
bcd227598e [megatron] chore: add a docker image for with mcore0.15 and TE2.7 (#3540) 2025-09-22 10:59:33 +08:00
d33c85e2c7 [model] feat: support parameter generator for model engine (#3529) 2025-09-19 23:20:59 +08:00
02b4cd3a85 [recipe] fix: Fix main_spin.py bugs (#3543)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-19 22:43:37 +08:00
4f7920e0ab [ci] feat: fix more ci (#3537) 2025-09-19 20:26:03 +08:00
78915c47ed [chore] fix typo (#3535) 2025-09-19 17:41:03 +08:00
bbdf819996 [Megatron] fix: compatible to mcore0.15 (#3534)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-19 17:18:55 +08:00
83205fdae0 [ci] feat: using local dataset to avoid network issue (#3533)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-19 16:21:55 +08:00
2f6a5d6b00 [worker] fix: get all multi_modal_inputs keys with in a microbatch (#3315)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Address the first issue in
https://github.com/volcengine/verl/pull/3281#issuecomment-3239570745

More work on top of https://github.com/volcengine/verl/pull/1999

Currently, the code gets the keys from the first row within the
microbatch, This can go wrong if the dataset is a mixture of pure-text
with multi-modal, where the first data in the microbatch is a pure-text
one (no `pixel_values` or `image_grid_thw` exists in the key), and the
microbatch still contains multi-modal data.

This PR fixes this issue by collecting all available keys for
`multi_modal_inputs` within the microbatch, and so that we can
concatenate those multi-modal tensors together without ignoring some of
them under the above situation.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-09-19 15:57:51 +08:00
90648ae222 [doc] chore: Update owners for ascend_tutorial documents (#3528)
### What does this PR do?

Update owners for ascend_tutorial documents

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-19 10:32:40 +08:00
c7922f0297 [doc] chore: Update ascend quick start document (#3527)
### What does this PR do?

Remove reward maes, loss mae, total time ratio and throughput
information in ascend quick start document.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-19 10:21:49 +08:00
558d4dd581 [doc] fix: Update Qwen3-30B-A3B info in ascend_quick_start.rst (#3514)
### What does this PR do?

1. Update the model name based on the training script to keep it
consistent with the Hugging Face official website.
https://github.com/volcengine/verl/pull/3189
2. Supplement the Qwen3-30B-A3B model info with actor.strategy as
megatron according to https://github.com/volcengine/verl/pull/3203


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-19 09:57:42 +08:00
c0e2b9d249 [model] fix: qwen2vl for transformers 4.52.* (#3524) 2025-09-19 06:11:15 +08:00
b6b34b2d30 [megatron] Add TIS support to megatron backend (#3513)
### What does this PR do?

Add the TIS support from https://github.com/volcengine/verl/pull/2953 to
megatron actor

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: Shuang Yu <shuangy@shuangy-mlt.client.nvidia.com>
2025-09-18 23:24:08 +08:00
0d4541f397 [model] fix: refactor qwen2vl patches & support no-image input for fsdp (#3496)
### What does this PR do?

This PR tries to fix #3491 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Tested with [latest
transformers](6e50a8afb2)

<img width="2448" height="540" alt="image"
src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21"
/>
<img width="2796" height="1394" alt="image"
src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077"
/>
<img width="2098" height="744" alt="image"
src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

#### ⚠️ Breaking

We adopt a new format for Qwen2VL's position ids: (4, batch size, seq
len)

Assuming a vision position ids (mrope) has a shape of (3, batch size,
seq len) and a text position ids (normal rope) has a shape of (1, batch
size, seq len), we concatenate both to obtain the final position ids.

This aligns with the implementation in the Transformers >= 4.54.0 🤗

https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469

#### 🎤 New

We have refactored the Qwen2VL and Qwen2.5VL patches, supporting
no-image input for FSDP by introducing fake ViT inputs. We have also
removed some redundant code for better maintainability.

#### 🚨 Changes

We move the ulysses logic into the attention function. So the position
ids will be scattered before the language model part.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-18 10:10:30 +08:00
214d0f0a94 [data] feat: support customizable loss mask in multi-turn sft dataset (#3507)
### What does this PR do?

- Support customized loss mask in multi-turn sft dataset
- Previously, we set loss mask based on whether the role is "assistant"
or not. This is limited if we only want to fit the last assistant
message. To tackle this problem, we explicitly introduce a loss_mask in
the dataset that can be optionally specified by the user.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-17 18:48:47 +08:00
f4e2047074 [model, ci] feat: add qwen3-8b ppo script on ASCEND NPU (#3502)
### What does this PR do?

add examples/ppo_trainer/run_qwen3-8b_npu.sh

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-17 18:48:24 +08:00
ee8a7af8f4 [recipe] feat: Add qwen2.5-7b DAPO NPU example script (#3501)
### What does this PR do?

#1858 support DAPO on Ascend NPU, but example `qwen2.5-7b-instruct`
training script is not added, which will be added through this PR.

The script in this PR is borrowed from
https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/PyTorch/built-in/rl/VeRL_for_PyTorch/test/train_qwen2_5_7b_instruct_DAPO_full_16p.sh

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not relaetd.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-17 16:52:28 +08:00
0665153b9a [training_utils] refactor: extract checkpoint handler into a separate file for reuse (#3505)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-17 15:24:58 +08:00
fa924a43c7 [model] fix: fix device (#3500)
### What does this PR do?

- Move micro_batch to device in forward_step

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-17 12:29:42 +08:00
04726dbf12 [ray, single_controller] refactor: Accelerate ray.put with thread (#3495)
### What does this PR do?

For a data size of 6400x20480, the time of `ray.put` was reduced from
28.85s to 19.86s following this optimization, resulting in a ~45%
improvement.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[pr2893](3e2bceb1af (diff-32eb7ca0e11460f1eee309256c2fe7d571699b18cea314cbaef4d15f58b4f7b3))
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-17 11:43:45 +08:00
5c98ed1b31 [perf, megatron] fix: bugfix if nvml can not import (#3490)
### What does this PR do?

If the `import pynvml` fails, the `initialized` variable will not be
defined, and accessing it in the finally block will cause an error.

```
  File "/tmp/ray/session_2025-09-16_14-38-49_113222_2998/runtime_resources/working_dir_files/_ray_pkg_06ff080dac9922d6/verl/utils/distributed.py", line 46, in set_numa_affinity
    if initialized:
UnboundLocalError: local variable 'initialized' referenced before assignment
```

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[pr3471](https://github.com/volcengine/verl/pull/3471)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-16 15:15:09 +08:00
cf5263e82b [perf] fix: Init some attrs earlier in Profiler (#3482)
If Profiler init process return with config.enable == False before
initialize self.prof, you will get `AttributeError: 'Profiler' object
has no attribute 'prof'` when use Profiler.check (called by other funcs
such as `Profiler.start`). For the same reasons, self.saved should also
be initialized earlier.


### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-16 13:04:46 +08:00
fd8ae66726 [1/N][rollout] feat: support vllm/sglang native http server (#3456)
### What does this PR do?

This is the first part to support vllm/sglang native http server in
server mode rollout. In native http server mode,
the inference services are launched separately from the training engine,
and the model runner share GPU with training engine but in different
processes.

We're going to support three deployment modes:
- **hybrid mode**: Training engine and model runner share GPU but in
different process. To sync weights, there's a server adapter in training
process, which is a http client to send wake_up/sleep/update_weights
request to inference server. This is used for on-policy training.
- **standalone mode**: Training engine and inference services have
separate GPU resource, disaggregated architecture. This is used for
off-policy training.
- **colocated mode**: Like hybrid mode, but without server adapter since
no need to sync weights. This is mainly used for GRM service (LLM as a
judge).
<img width="2644" height="1276" alt="image"
src="https://github.com/user-attachments/assets/2c1adf2d-adb5-4563-8a1a-8948f93b09b7"
/>

Following PR will be:
- [2/N] support DP+EP
- [3/N] standalone rollout with weight transfer by NCCL/UCX
- [4/N] colocated GRM service with wake_up/sleep(without weight
synchronization)
- [5/N] switch to `/generate` http api with token-in-token-out:
currently sglang has `/generate` api but may need some effort to support
multi-modal; while vllm still lack `/generate` api
- [6/N] switch to sglang/vllm router with better kv-cache awareness load
balance

The native http server is inspired by the design of
[slime](https://github.com/THUDM/slime), thanks to their prior work.
Also credit to @ChangyiYang @zhaochenyang20
https://github.com/volcengine/verl/pull/3090 @SuperCB
https://github.com/volcengine/verl/pull/3102 with their prior
contribution.
2025-09-16 10:41:17 +08:00
ac2f790f56 [ray] refactor: Accelerate Tensor serialization by converting to np.ndarray (#3425)
### What does this PR do?

For a data size of 6400x20480, the average serialization duration was
reduced from 3.32s to 1.32s following this optimization, resulting in a
~151% improvement.

```
# tensor
average  serialize:2.58s  deserialize:0.74s  total:3.32s

TaskRunner pid=1904793) baymax debug serialize time=2.5947s
(TaskRunner pid=1904793) baymax debug serialize time=2.593357s
(TaskRunner pid=1904793) baymax debug serialize time=2.580081s
(TaskRunner pid=1904793) baymax debug serialize time=2.582321s
(WorkerDict pid=1905183) baymax debug deserialize time=0.475745s
(WorkerDict pid=1905184) baymax debug deserialize time=0.538223s
(WorkerDict pid=1905181) baymax debug deserialize time=0.609146s
(WorkerDict pid=1905182) baymax debug deserialize time=0.61064s
(WorkerDict pid=1905189) baymax debug deserialize time=0.597746s
(WorkerDict pid=1905185) baymax debug deserialize time=0.530353s
(WorkerDict pid=1905180) baymax debug deserialize time=0.811555s
(WorkerDict pid=1905194) baymax debug deserialize time=0.513646s
(WorkerDict pid=1905193) baymax debug deserialize time=0.962868s
(WorkerDict pid=1905179) baymax debug deserialize time=0.929226s
(WorkerDict pid=1905186) baymax debug deserialize time=0.701976s
(WorkerDict pid=1905191) baymax debug deserialize time=0.867236s
(WorkerDict pid=1905192) baymax debug deserialize time=0.858472s
(WorkerDict pid=1905187) baymax debug deserialize time=1.045251s
(WorkerDict pid=1905188) baymax debug deserialize time=0.960867s
(WorkerDict pid=1905190) baymax debug deserialize time=1.010673s

# numpy
average  serialize:0.000617s  deserialize:1.32s  total:1.32s

(TaskRunner pid=1729638) baymax debug serialize time=0.00016s
(TaskRunner pid=1729638) baymax debug serialize time=0.000117s
(TaskRunner pid=1729638) baymax debug serialize time=0.000158s
(TaskRunner pid=1729638) baymax debug serialize time=0.000182s
(WorkerDict pid=1730035) baymax debug deserialize time=0.867232s
(WorkerDict pid=1730036) baymax debug deserialize time=0.97372s
(WorkerDict pid=1730028) baymax debug deserialize time=1.08627s
(WorkerDict pid=1730034) baymax debug deserialize time=1.187599s
(WorkerDict pid=1730037) baymax debug deserialize time=1.165926s
(WorkerDict pid=1730025) baymax debug deserialize time=1.281101s
(WorkerDict pid=1730029) baymax debug deserialize time=1.359834s
(WorkerDict pid=1730027) baymax debug deserialize time=1.281978s
(WorkerDict pid=1730030) baymax debug deserialize time=1.329298s
(WorkerDict pid=1730026) baymax debug deserialize time=1.475415s
(WorkerDict pid=1730031) baymax debug deserialize time=1.422345s
(WorkerDict pid=1730033) baymax debug deserialize time=1.378894s
(WorkerDict pid=1730039) baymax debug deserialize time=1.368721s
(WorkerDict pid=1730040) baymax debug deserialize time=1.601587s
(WorkerDict pid=1730042) baymax debug deserialize time=1.768378s
(WorkerDict pid=1730038) baymax debug deserialize time=1.765994s
```

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Huazhong <hzji210@gmail.com>
2025-09-16 09:33:28 +08:00
8ecf123736 [perf, megatron] chore: bind NUMA (#3471)
### What does this PR do?

Improve the data transfer efficiency between the CPU and GPU (H2D, D2H),
prepare for the offload feature.

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/3401
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### API and Usage Example

set numa affinity 
```python
set_numa_affinity()
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-16 09:30:26 +08:00
a7b8675f96 [rollout] fix: make agent loop reward worker thread-safe (#3454)
### What does this PR do?

Fixed https://github.com/volcengine/verl/issues/3407
2025-09-15 14:43:52 +08:00
44b919e5fe [ci] chore: add codeowner (#3473)
### What does this PR do?

add codeowner in npu folder
/recipe/dapo and /examples/grpo_trainer: model scripts of npu
/verl/models/transformers: npu_patch for transformers

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-15 12:33:16 +08:00
2061894891 [model] feat: add qwen3 grpo 8b/32b script on ASCEND NPU (#3310)
### What does this PR do?

add examples/grpo_trainer/run_qwen3_32b_npu.sh
<img width="1014" height="1111" alt="image"
src="https://github.com/user-attachments/assets/8cd59fc2-5f6a-419e-87ac-bf35a71856fb"
/>

add examples/grpo_trainer/run_qwen3_8b_npu.sh
<img width="844" height="930" alt="image"
src="https://github.com/user-attachments/assets/5c23c7a4-8729-4007-8828-027a8cda4779"
/>



> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...

> already support in https://github.com/volcengine/verl/pull/3300

- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: ZLiao <a627465478@gmail.com>
Co-authored-by: ZLiao <a627465478@gmail.com>
2025-09-15 10:13:01 +08:00
65170f918b [sglang, rollout] feat: enable token-in-token-out for SGLang engine (#2759)
### What does this PR do?

This PR enables token-in-token-out functionality for the SGLang engine,
improving performance by avoiding unnecessary
tokenization/detokenization steps during rollout. The engine can now
work directly with token IDs, and the rollout system passes pre-computed
token IDs to avoid recomputation.

### Checklist Before Starting

- [x] Search for similar PRs. No similar PRs found for SGLang
token-in-token-out functionality.
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This change maintains backward compatibility and does not require
additional testing beyond existing CI. The functionality is tested
through existing rollout tests.

### API and Usage Example

No API changes are introduced. The enhancement is internal to the SGLang
rollout implementation and transparent to users.

```python
# Usage remains the same - no changes to user-facing APIs
rollout = SGLangRollout(config)
results = await rollout.generate(...)
```

### Design & Code Changes

**High-level design:**
- Enable SGLang engine to skip tokenizer initialization by default
(`skip_tokenizer_init=True`)
- Modify rollout system to extract and pass token IDs directly from
engine output
- Update message handling to accept pre-computed token IDs

**Specific changes:**
1. **`verl/workers/rollout/schemas.py`**:
- Add optional `content_ids` parameter to `add_assistant_message()`
method
- Only compute token IDs if not provided, avoiding redundant
tokenization

2. **`verl/workers/rollout/sglang_rollout/sglang_rollout.py`**:
- Set `skip_tokenizer_init=True` by default for token-in-token-out mode
   - Extract `content` or `content_ids` from engine output
   - Pass `content_ids` to all `add_assistant_message()` calls

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: This change is
internal optimization that maintains existing behavior and is covered by
existing tests.
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-13 22:21:37 -07:00
b00b090149 [megatron,recipe] feat: support Qwen3-30B (MoE) DAPO training on ASCEND NPU (#3203)
### What does this PR do?

Fix of megatron config, and example shell of Qwen3-30B-Dapo with
megatron.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

critic/reward/mean:

<img width="1304" height="704" alt="dapo_30b_megatron"
src="https://github.com/user-attachments/assets/f2062e24-b37d-4d54-8dd6-e9da25f8c69b"
/>


response_length/mean:

<img width="815" height="407" alt="image"
src="https://github.com/user-attachments/assets/f59b6c7b-4f24-4aa7-9b9e-bb8184dac5d3"
/>

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-13 19:08:23 +08:00
6e6fafdc74 [model] feat: add FSDP/Megatron critic worker with model engine (#3439)
### What does this PR do?

- As title
- Add a test to compare the output of FSDP/Megatron engine with
huggingface model

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-13 12:18:58 +08:00
3c9b884ecd [model] feat: Add Apertus (#3295)
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
https://github.com/huggingface/transformers/pull/39381
Associated vLLM PR https://github.com/vllm-project/vllm/pull/23068
Associated SGLang PR https://github.com/sgl-project/sglang/pull/9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
2025-09-13 10:03:58 +08:00
b8c6d132a8 [trainer,rollout] fix: model weights will not be loaded when vllm_sleep_level=2 and using lora (#3461)
Fix: https://github.com/volcengine/verl/issues/3159,
https://github.com/volcengine/verl/issues/3437


The default value of `VLLM_SLEEP_LEVEL` was changed to 2 in PR:
https://github.com/volcengine/verl/pull/3019. However, in the previous
code, when using LoRA, the worker would only load LoRA weights when
calling `wake_up`. This does not cause any issues when
`VLLM_SLEEP_LEVEL=1`, since in this mode the base model's weights are
moved to the CPU. However, when `VLLM_SLEEP_LEVEL=2`, the weights are
completely destroyed. Therefore, we need to sync the weights from the
actor every time.

Typically, users run LoRA training when they are short on resources.
Therefore, this PR does not forcibly set `VLLM_SLEEP_LEVEL=1` when using
LoRA. On the contrary, it aims to save CPU memory whenever possible.

The basic vLLM rollout is currently skipped:
33edd95e13/tests/workers/rollout/rollout_vllm/test_vllm_spmd.py (L71-L72).
Thus, no unittest is included in this PR. I will fix the skipped vLLM
rollout and propose a follow-up PR to test LoRA vLLM inference in CI.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-12 19:29:25 +08:00
b86cd96eb7 [trainer, fsdp, megatron] feat: Support one step off async rl on Ascend NPU (#2924)
### What does this PR do?

Since Ray's collective communication interface does not support the hccl
backend, we refer to the [example
code](https://docs.vllm.ai/en/latest/examples/offline_inference/rlhf.html)
of vLLM and complete the weight synchronization between actor and
rollout. This PR mainly introduces two changes:
1. Use `StatelessProcessGroup` and `PyNcclCommunicator` instead of ray's
`create_collective_group` to create weight synchronization communication
groups.
2. Use the `ray.get_runtime_context().get_accelerator_ids` API instead
of the environment variable `RAY_LOCAL_RANK` to set device in scenarios
where is_ray_noset_visible_devices is true, so as to fix the issue at
https://github.com/volcengine/verl/issues/2971.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-12 19:18:36 +08:00
638856c986 [sglang, tool] fix: fix text only bug (#3448)
### What does this PR do?

When the model is text only, we should not do `{"type":"text", "text":
"XXX"}`, should just add the text.


### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-12 19:04:34 +08:00
b03866768f [ci] feat: move more tests to volcano engine (#3455) 2025-09-12 18:54:55 +08:00
33edd95e13 [worker] fix: respect free_cache_engine flag (#3442)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Continuation of #1464

Now, recent changes have broken the `free_cache_engine` option again.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

Unit test cases might not be feasible as the `sleep`/`wake_up` call can
happen anywhere in the codebase. An end-to-end test might be
resource-consuming.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-09-11 22:49:55 +08:00
e160d3b2e0 [trainer] fix: Loss calculations for grad accumulation steps (#3332)
### What does this PR do?

For gradient accumulation steps over micro batches, loss should be
normalised before calling loss.backwards().
Also add an optimisation so that the all reduce of gradients is only
performed in the last accumulation step.

Verified fine-tuning on few open source models with the changes in this
PR.


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-11 22:46:18 +08:00
9bbe745f80 [trainer] feat: VL support freeze vision model (#3178)
### What does this PR do?

vl model support freeze vision model 
issue: [2526](https://github.com/volcengine/verl/issues/2526)


> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.


qwen2_vl_7b_function_rm_1756093906 is vision freeze mode

<img width="4374" height="2086" alt="image"
src="https://github.com/user-attachments/assets/107772e4-039d-4ec5-b193-54688f4a7176"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mighten Dai <mighten@outlook.com>
2025-09-11 18:17:21 +08:00
f6b09acef4 [worker, sglang] feat: support generative reward model (server mode) (#3441)
### What does this PR do?

Following https://github.com/volcengine/verl/pull/3352, current
implementation of the reward model has supported both discriminative and
generative models.

For newly supported generative models, users should specify a customized
data processor to (1) convert rollout to genrm (including question,
response, and optional ground truth) chat template, and (2) convert
genrm responses to final reward scores. This args can be passed as
`reward_config.data_processor_config.{path/preprocess_fn_name/postprocess_fn_name}`,
respectively. The demo implementation can be seen in
`tests/workers/reward_model/process_fn.py`.

Specific usage of server mode RMs can be checked in
`tests/workers/reward_model/test_discriminative_reward_model.py` and
`tests/workers/reward_model/test_generative_reward_model.py`,
respectively.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/2845 (fsdp/megatron mode)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-11 14:05:02 +08:00
OC
4c7812c40b [doc] fix: table column in document (#3430)
### What does this PR do?

Added a missing column in Qwen3-30B-A3B MOE part.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

n/a

### API and Usage Example

n/a

### Design & Code Changes

n/a
### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-10 13:14:25 +08:00
e48ccf9b97 [doc] feat: add SimpleVLA-RL link in readme (#3433)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: ByteDance <wangjerry@bytedance.com>
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-09-10 13:13:48 +08:00
b5a5e88fb3 [worker] refactor: move the implementation of rm to workers.roles and polish (#3423) 2025-09-10 05:38:02 +08:00
dfa3933ac4 [tool] feat: support local gsm8k dataset in example/data_preprocess (#3362) 2025-09-09 22:29:56 +08:00
5c46f4f437 [model] feat: replace DataProto with TensorDict in engine (#3422) 2025-09-09 22:28:25 +08:00
a4d8952edc [fsdp, recipe] feat: add grpo reward model example using HH-RLHF dataset (#3417)
### What does this PR do?

One example of using SOTA BT reward model to train GRPO model 

- Reward Model:
[Skywork/Skywork-Reward-V2-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B)
- Dataset:
[Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf)


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

- Wandb training curve:

<img width="2004" height="614" alt="image"
src="https://github.com/user-attachments/assets/c6dc9003-7b59-43af-8ff4-560114fe5b10"
/>

- AlpacaEval 2.0 eval results:

| Model Name |  AlpacaEval LC Win-rate | Win-rate
|:------|:-------:|:-------:|
| mistralai/Mistral-Nemo-Instruct-2407    | 42.24 | 38.68 |
| mistral12b_skyworkllama8b_grpo_hhrlhf   |  **68.20** | **68.29** |

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-09 17:11:08 +08:00
c4f4caf0cd [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 (#2733)
### What does this PR do?

- Add TensorDict utilities and tests to cover the current DataProto
functionalities.
- Add nested tensor example to remove padding throughout the system
- Add image example
- Upgrade tensordict to v0.10

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-09 14:47:32 +08:00
eaf20fff88 [recipe] fix: Add gts argument for recipe _dump_generations (#3348)
### What does this PR do?

PR https://github.com/volcengine/verl/pull/2353 forgot to update all
`_dump_generations` in recipe codes

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-09 13:11:40 +08:00
662fae30e6 [rollout] fix: raise error if processing multimodal data without vlm processor (#3370)
Fix https://github.com/volcengine/verl/issues/3234
2025-09-09 13:10:48 +08:00
c410364ebf [rollout] chore: Add enable_prefix_caching into config (#3395)
### What does this PR do?

Added enable_prefix_caching to the RolloutConfig. This feature provides
no significant performance benefit in short-input-long-output scenarios
(e.g., 2k input to 34k output) and occupies a certain amount of device
memory.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### API and Usage Example

to disable enable_prefix_caching, using

```bash
actor_rollout_ref.rollout.enable_prefix_caching=False 
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-09 11:03:04 +08:00
eada037455 [vllm] fix: use VLLM_SLEEP_LEVEL=1 on ASCEND NPU (#3355) 2025-09-09 10:07:53 +08:00
7430285068 [ci] refactor: add ci test for refactored reward worker and add some args to GenRM config (#3385)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

- add ci test for new reward model (accuracy check for the results of
server mode rm and hf rm)
- add some args for genrm (e.g., reward_type, sampling parameters)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-09 09:59:44 +08:00
ce037bd8cc [doc] fix: edit one step off policy readme with original work (#3414)
### What does this PR do?

Updates the readme for the one-step off-policy async trainer with the
original reference paper

I introduced async RL training for LLMs in my paper
https://arxiv.org/abs/2410.18252. My method is exactly the one step
off-policy async replicated here and was on arxiv 7 months before AReal
and published at ICLR. AReal's method is different (fully async replay
buffer) but it includes a nice graphic of my setup!

Love to see you're getting similar speedups to my results and this is a
great recipe!

Figure 2 from my paper

<img width="1040" height="611" alt="image"
src="https://github.com/user-attachments/assets/0812fd40-daae-4346-bb72-85bc526bd3fa"
/>

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/2591
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-09 09:26:47 +08:00
491e636a8a [trainer] fix: avoid loading duplicated custom reward function to fix issue #3150 (#3404) 2025-09-09 06:57:55 +08:00
62549582a7 [model] feat: polish megatron engine (#3401)
### What does this PR do?

- Provide best prepare_dynamic_batch parameters to fsdp and megatron
engine
- Reuse `prepare_micro_batches` in Megatron engine

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-08 19:42:43 +08:00
a1542172d5 [model] refactor: polishing FSDP model engine (#3394)
### What does this PR do?

- Extract a separate prepare_micro_batches
- Fix prepare_dynamic_batch
- Make `forward_step` more modular 

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-08 14:42:53 +08:00
21dee53e85 [ci] fix: cpu unit test, etp config breaking change (#3390)
### What does this PR do?

[ci] fix: cpu unit test

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-08 13:30:43 +08:00
6159dee4e9 [model, megatron] feat: Add glm air support and make new model directly use mbridge (#3359)
### What does this PR do?

[model, megatron] feat: Add glm air support and make new model directly
use mbridge.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-08 09:48:50 +08:00
d26a913f43 [trainer] fix: Fix ClearML logging (#3384)
### What does this PR do?
fix typo and use `close` because `mark_completed` closes the program
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-07 22:07:47 +08:00
3a89785f9a [deployment] Fix deepseek671B grpo script (#3383)
### What does this PR do?

The current script is not actual grpo script. This PR adds the missing
parameters.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-07 21:30:30 +08:00
c3f63ebe9c [misc] fix: set default value of ETP to 1 (#3371)
### What does this PR do?

As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-07 20:00:44 +08:00
f346f96d29 [training_utils] fix: stop using math naming under reward score" (#3378) 2025-09-07 09:57:24 +08:00
cb01f10ba0 [worker,sglang] refactor: deprecate fsdp/megatron reward model with server mode (#3352) 2025-09-06 23:45:41 +08:00
7bc70bbf0b [trainer] feat: add CI for accuracy alignment of SFT trainer with model engine (#3363)
### What does this PR do?

- Add CI for SFT trainer with various fsdp and megatron configurations
and make sure their output matches

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-06 10:48:28 +08:00
10054da277 [doc] fix: fix typo in skypilot_examples.rst (#3368) 2025-09-06 07:41:51 +08:00
0b533f7bcf [rollout, vllm, sglang] fix: allow user customization of repetition_penalty to avoid watchdog timeout during GRPO rollout (#3309)
Allow user customization of `repetition_penalty` to avoid watchdog
timeout during GRPO rollout

### What does this PR do?

This PR adds an interface for users to specify `repetition_penalty`,
which helps avoid repetition in LLM generation and prevents watchdog
timeouts during GRPO rollout. If not specified, `repetition_penalty`
will remain at its default value of `1.0`.


### Checklist Before Starting

- [X] Search for similar PRs. No similar PRs found.
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR can be vetted by existing CI test cases.

### API and Usage Example

Previously, users could not specify `repetition_penalty`, but this PR
adds support for it.

For example, users can now start GRPO training with a command like:

```bash
python -m verl.trainer.main_ppo \
    +actor_rollout_ref.rollout.repetition_penalty=1.05 \
    # other params here...
```

### Design & Code Changes

This PR adds an interface allowing users to specify the
`repetition_penalty` (e.g., `1.05`), while maintaining backward
compatibility with the default value of `1.0`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-05 12:43:17 +08:00
0e25bad451 [vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue (#3345)
### What does this PR do?

After
[pr#3285](19020f6188),
[issue 2564](https://github.com/volcengine/verl/issues/2564) began to
reappear. Following the modification of
[pr#2782](https://github.com/volcengine/verl/pull/2782), [issue
2564](https://github.com/volcengine/verl/issues/2564) was solved.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[pr#2782](https://github.com/volcengine/verl/pull/2782)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-05 09:52:08 +08:00
OC
e90f18c40a [model] feat: support ByteDance Seed-OSS 36B model (#3347)
### What does this PR do?

support ByteDance Seed-OSS 36B model:
1. add RL and SFT example
2. support mfu metrics

Requirement:
pip install transformers>=4.56.0

Notes: vllm v0.10.0 does not support Seed-OSS, but can fail back to
transformers to get it working.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

(TaskRunner pid=373084) step:2 - global_seqlen/min:6260 -
global_seqlen/max:11318 - global_seqlen/minmax_diff:5058 -
global_seqlen/balanced_min:8466 - global_seqlen/balanced_max:8468 -
global_seqlen/mean:8467.375 - actor/entropy:0.47251570224761963 -
actor/kl_loss:0.03297248564194888 - actor/kl_coef:0.001 -
actor/pg_loss:-0.0494408356025815 -
actor/pg_clipfrac:0.019900403218343854 -
actor/ppo_kl:0.020935473148711026 -
actor/pg_clipfrac_lower:9.349289757665247e-05 -
actor/grad_norm:0.47875913605093956 - perf/mfu/actor:0.2823303751694612
- perf/max_memory_allocated_gb:134.74115753173828 -
perf/max_memory_reserved_gb:141.615234375 -
perf/cpu_memory_used_gb:150.75712203979492 - actor/lr:1e-06 -
training/global_step:2 - training/epoch:0 - critic/score/mean:0.3515625
- critic/score/max:1.0 - critic/score/min:0.0 -
critic/rewards/mean:0.3515625 - critic/rewards/max:1.0 -
critic/rewards/min:0.0 - critic/advantages/mean:-0.023741308599710464 -
critic/advantages/max:0.7071057558059692 -
critic/advantages/min:-0.7071057558059692 -
critic/returns/mean:-0.023741308599710464 -
critic/returns/max:0.7071057558059692 -
critic/returns/min:-0.7071057558059692 -
response_length/mean:444.4296875 - response_length/max:1024.0 -
response_length/min:50.0 - response_length/clip_ratio:0.140625 -
response_length_non_aborted/mean:444.4296875 -
response_length_non_aborted/max:1024.0 -
response_length_non_aborted/min:50.0 -
response_length_non_aborted/clip_ratio:0.140625 -
response/aborted_ratio:0.0 - prompt_length/mean:84.78125 -
prompt_length/max:141.0 - prompt_length/min:54.0 -
prompt_length/clip_ratio:0.0 -
timing_s/start_profile:6.250300793908536e-05 -
timing_s/generate_sequences:21.979598999023438 -
timing_s/generation_timing/max:22.295286178588867 -
timing_s/generation_timing/min:21.753456115722656 -
timing_s/generation_timing/topk_ratio:0.125 -
timing_s/gen:39.58543623800506 - timing_s/reward:0.031087818002561107 -
timing_s/old_log_prob:17.46088112698635 - timing_s/ref:5.804751824995037
- timing_s/adv:0.003937039989978075 -
timing_s/update_actor:57.383965655986685 -
timing_s/step:120.27422251200187 -
timing_s/stop_profile:6.923600449226797e-05 -
timing_per_token_ms/gen:0.6958608511260053 -
timing_per_token_ms/ref:0.08569290696637147 -
timing_per_token_ms/adv:5.8120727940744256e-05 -
timing_per_token_ms/update_actor:0.8471333449857052 -
perf/total_num_tokens:67739 - perf/time_per_step:120.27422251200187 -
perf/throughput:70.40057980133741
### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-04 22:41:58 +08:00
72e88ecd79 [trainer] feat: support sft_trainer with model engine (#3341)
### What does this PR do?

- support sft_trainer with model engine
- fix engine interface to handle missing data from non-pp
- add gsm8k multi-turn dataset
- add left-right padding to MultiTurnDataset so that the data format of
SFT matches with RL
- add sft e2e runnable tests with fsdp and megatron backend

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-04 19:40:41 +08:00
90acc8abc1 [doc] fix: Update skypilot_examples.rst (#3344)
### What does this PR do?

As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-04 19:01:59 +08:00
f356fc1e56 [deployment, doc] feat: Add SkyPilot integration examples (#3333)
### What does this PR do?

Adds SkyPilot integration examples for running verl training jobs
on Kubernetes/cloud platforms with GPUs. Includes configurations
for PPO, GRPO, and multi-turn tool usage training.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+skypilot
- [x] Format the PR title as `[{modules}] {type}: {description}`

### Test

Validated SkyPilot YAML configurations for Ray cluster
initialization, dataset downloading, and distributed training setup with
H100 GPUs.

### API and Usage Example

```bash
# Launch PPO training on 2 nodes
sky launch -c verl-ppo examples/skypilot/verl-ppo.yaml --secret WANDB_API_KEY -y

# Launch GRPO training
sky launch -c verl-grpo examples/skypilot/verl-grpo.yaml --secret WANDB_API_KEY -y

# Launch multi-turn tool usage training
sky launch -c verl-multiturn examples/skypilot/verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y
```

Design & Code Changes

- Added 3 SkyPilot YAML configurations for PPO, GRPO, and
multi-turn training
- Added `examples/skypilot/README.md` with setup guide
- Added `docs/examples/skypilot_examples.rst` documentation
- Updated `docs/index.rst` and `docs/start/multinode.rst` with
references

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-04 16:56:00 +08:00
4d45c12408 [recipe] fix: (dapo_ray_trainer) use global_steps to determine is_last_step when resuming (gen_steps not restored) (#3336)
### What does this PR do?

- When resuming from a checkpoint, gen_steps is not correctly restored,
causing is_last_step to be misdetected.
- Switch is_last_step logic from gen_steps to self.global_steps to
remove the dependency on gen_steps.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-04 11:30:40 +08:00
a8238d4745 [training_utils] fix: Using a non-tuple sequence for multidimensional indexing is deprecated (#3314) 2025-09-03 20:49:25 +08:00
47a483b620 [recipe] fix: bugfix of refactor omissions (#3328) 2025-09-03 20:48:30 +08:00
9ccaabf5ef [doc]Update README.md, add related works (#3331) 2025-09-03 20:45:54 +08:00
bc7c86398c [misc] feat: create issue template for verl (#3330) 2025-09-03 20:45:20 +08:00
d7a0469977 [model] feat: polish model engine (#3321) 2025-09-03 20:44:39 +08:00
1f533d65e2 [doc] feat: Adding PACS to the Awesome work (#3327) 2025-09-03 19:35:07 +08:00
2d6c6dbb39 [trainer] fix: Correct off-by-one error in SFT loss mask slicing (#3287)
### What does this PR do?

This PR fixes the SFT loss mask, which always masked the first generated
token and would lead to the SFTed model behaving as **generating the
wrong first token**.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
No similar PRs found
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A for automated unit/integration tests. Manually verified the fix with
an overfitting experiment described below, as this logic bug is best
demonstrated through training behavior rather than a simple unit test.


### API and Usage Example

Don't affect the current veRL SFT training usages.

### Design & Code Changes
before 
```python
loss_mask = batch.pop("loss_mask")[:, :-1].reshape(-1).to(self.device_name)
```
now 
```python
loss_mask = batch.pop("loss_mask")[:, 1:].reshape(-1).to(self.device_name)
```

### Overfitting Experiments
We did a one-example overfitting SFT experiment using
`qwen2.5-1.5b-base` for 5 epochs to test the necessity and functionality
of this change. The training example is from reasoning data.
```
Input: 
  def reverse_string(s: str) -> str:\n    """\n    Returns the reverse of the input string.\n    >>> reverse_string("hello") "olleh" [...omitted input]
Output: 
  <think>Okay, I need to write a Python function called reverse_string that takes a string s and returns its reverse. Let\'s see. How do I reverse a string in Python? [...omitted output]
```
In the expected case of model inference, the SFTed model would easily
output `<th`, i.e., the first token of `<think>` per the Qwen tokenizer,
as the first token.

**Before fix**
In the model inference, the top 10 first token probabilities are:
```
--- Top 10 Next Token Predictions ---
1. Token: 'Hmm' (ID: 80022) - Probability: 0.4090
2. Token: 'Let' (ID: 10061) - Probability: 0.0492
3. Token: 'The' (ID: 785) - Probability: 0.0473
4. Token: 'This' (ID: 1986) - Probability: 0.0373
5. Token: 'Okay' (ID: 32313) - Probability: 0.0362
6. Token: 'def' (ID: 750) - Probability: 0.0228
7. Token: 'So' (ID: 4416) - Probability: 0.0226
8. Token: 'We' (ID: 1654) - Probability: 0.0215
9. Token: '```' (ID: 73594) - Probability: 0.0153
10. Token: 'Oh' (ID: 11908) - Probability: 0.0118
-------------------------------------

Probability for token '<th' (ID: 13708): 9.012579539557919e-06
```
However, the top 1 should be `<th` while it gets very low prob.

**After fix**
The top 10 first token probabilities are
```
--- Top 10 Next Token Predictions ---
1. Token: '<th' (ID: 13708) - Probability: 1.0000
2. Token: '<' (ID: 27) - Probability: 0.0000
3. Token: '>' (ID: 29) - Probability: 0.0000
4. Token: 'def' (ID: 750) - Probability: 0.0000
5. Token: 'think' (ID: 26865) - Probability: 0.0000
6. Token: '<pre' (ID: 10120) - Probability: 0.0000
7. Token: '-th' (ID: 7563) - Probability: 0.0000
8. Token: '<td' (ID: 6868) - Probability: 0.0000
9. Token: '(th' (ID: 24365) - Probability: 0.0000
10. Token: '<thead' (ID: 58167) - Probability: 0.0000
-------------------------------------

Probability for token '<th' (ID: 13708): 0.9999651908874512
```
which is expected.

The following are some selected indicative token logits during the
overfitting training after fix (below the `<|endoftext|>` is a padding
token):
<img width="2388" height="1386" alt="image"
src="https://github.com/user-attachments/assets/1c7149e7-6738-40ec-8164-f1ca614c1036"
/>


In summary, the previous SFT loss mask **mistakenly shifted one bit, so
the model failed to learn the first generated token**. The trained model
behaves like adding one undesired noisy token after the input question,
as shown in the top 10 first token probabilities.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-03 14:28:47 +08:00
02e06fa2e5 [trainer] fix: ray.state.available_resources_per_node is deprecated (#3313)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Get rid of the following warning:
```log
DeprecationWarning: `ray.state.available_resources_per_node` is a private attribute and access will be removed in a future Ray version.
```

Getting available resource per node becomes a DeveloperAPI starting from
ray v2.10.0, so it should be pretty safe to make this change:

04c7b49a91

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-09-03 14:23:46 +08:00
19020f6188 [rollout] feat: deprecate all rollout sharding manager (#3285)
### What does this PR do?

Deprecate all rollout sharding manager and replaced by `trainer_mode`
and `rollout_mode` in hybrid worker.
2025-09-03 13:34:26 +08:00
1c6d9feff4 [single_controller, ray] fix: shut ray down after initializes it (#3317)
### What does this PR do?

> Add concise overview of what this PR aims to achieve or accomplish.
Reference related GitHub issues and PRs that help with the review.

To prevent Ascend NPU TBE errors caused by resource leakage, ensure that
ray.shutdown()is explicitly called after initializing Ray with
ray.init().

Address the first issue in
https://github.com/volcengine/verl/issues/3316

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: lantian7 <liuchun22@huawei.com>
2025-09-03 10:51:36 +08:00
2bef4acb73 [ci, model] feat: add qwen3 CI testcase on ASCEND NPU (#3300)
### What does this PR do?

- add qwen3-0.6b grpo in tests/special_npu
- set use_torch_compile=False for testcases since the torch_npu version
in the test image is 2.5.1 which doesn't support compile mode

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-03 10:51:17 +08:00
844c9299d6 [BREAKING][rollout] feat: Added asynchronous reward model calculation in agent loop (#3152)
### What does this PR do?

> This PR will be based on
[PR#3055](https://github.com/volcengine/verl/pull/3055), and will
further support asynchronous calculation of reward models based on the
agent loop which only supports asynchronous reward function calculation.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> If you want to use this feature, you need to add the following
configuration to the startup script configuration item


```python
    reward_model.enable_resource_pool=True 
    reward_model.n_gpus_per_node=1 
    reward_model.nnodes=1 
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-02 19:25:05 +08:00
ef43469162 [doc] fix: add rStar2-Agent as work using verl (#3298)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: ByteDance <wangjerry@bytedance.com>
2025-09-02 16:36:32 +08:00
abe5e719ee [perf] feat: add npu silu &expand the scope of patch models (#3260)
### What does this PR do?

- Add npu optimized silu.
- Patch silu and RMSNorm for more models.
- Refresh the performance of Qwen3-8B PEFT SFT.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-02 16:35:53 +08:00
f1aeb929c7 [rollout] feat: Refactor agentloop multiturn (#3171)
refactoring agentloop's if-else-based logic to a state
machine pattern, with a strong focus on reusability.
1. Add Interaction in toolagentloop
2. Refactor agentloop to FSM
3. Designed for reusability
2025-09-02 09:38:11 +08:00
91ee0a2c08 [fsdp, model] feat: support FSDP model engine (#3270)
### What does this PR do?

- Support FSDPEngine and FSDPEngineWithLMHead
- Add tests and show that fsdp engine matches with mcore and huggingface
on QWen 2.5 0.5b model

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: ziheng.jiang <ziheng.jiang@bytedance.com>
2025-09-01 16:17:45 +08:00
c780fc34b4 [fsdp] feat: add NPU fusion kernels for Qwen3 MoE (#3221)
### What does this PR do?

This PR adds following NPU fusion kernels to Qwen3 MoE model in
Transformers: GroupedMatMul, SwiGLU, RMSNorm and RoPE.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=fusion++npu+moe
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

Tested with Qwen3-30B-A3B sp8 fsdp32 on Ascend A2:
Without kernel fusion:
<img width="1832" height="468" alt="image"
src="https://github.com/user-attachments/assets/a8632a94-3a27-46f6-b408-2ebc09a37aa3"
/>
WIth kernel fusion
<img width="1842" height="440" alt="image"
src="https://github.com/user-attachments/assets/50b8cc21-6720-42bc-9a9d-ae684f4cb0bf"
/>

Test results with train_prompt_bsz=512 sp8 fsdp32 on Ascend A2. The
orange line represents GPU, the pink line represents NPU, max absolute
error in reward is less than 5%.
<img width="718" height="444" alt="image"
src="https://github.com/user-attachments/assets/3d4a47f6-fb91-40a6-a8e6-bf39545f8375"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: Shangwei-Li <lishangwei2@huawei.com>
2025-09-01 11:41:49 +08:00
fd1a121324 [hardware] fix: update source in dockerfile.rocm (#3284)
### What does this PR do?

> Update the resource in `Dockerfile.rocm`

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> docker build -f Dockerfile.rocm -t verl-rocm:local .
```
docker run --rm -it verl-rocm:local python -c "import torch; print('ok')"
```

### Design & Code Changes

> Update the resource in `Dockerfile.rocm`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-09-01 11:32:44 +08:00
14227201ec [training_utils] fix: allow empty image_key/video_key in rl dataset (#3281) 2025-08-31 17:35:00 +08:00
98676e8add [misc] fix: use uid for grouping in validation to avoid prompt confusion in multimodal tasks (#3280)
### What does this PR do?

Fix #3238. Follow #2815. 

#2815 seems to have no follow-up process. This PR switched from text
prompt to grouping by uid when calculating validation metrics.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/2815.
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-31 10:13:34 +08:00
f9035b7016 [data] fix: None has no attribute get when extra_info in Parquet is NaN (#3272)
### What does this PR do?

This PR wants to fix a bug in rl_dataset.py

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

All modifications can be covered with existing CI test cases.

### API and Usage Example

API and usage remain the same.

### Design & Code Changes

This PR injects a default dict when `extra_info` is None, due to the
`extra_info` field in Parquet file is NaN.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-30 09:50:26 +08:00
a73b2aba85 [worker] fix: Fix missing rollout_log_probs argument in policy loss functions (#3274)
### What does this PR do?

<!--
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
-->

In the recent PR:

- https://github.com/volcengine/verl/pull/2953,

the file `workers/actor/dp_actor.py` was updated so that
`rollout_log_probs` is passed to `policy_loss_fn`:


38d23914ee/verl/workers/actor/dp_actor.py (L448-L456)

In that PR, the "vanilla" policy loss function was modified to accept
`rollout_log_probs` as an argument. However, other policy loss functions
(e.g., "gspo") were not updated accordingly, which leads to an error
such as:

```
TypeError: compute_policy_loss_gspo() got an unexpected keyword argument 'rollout_log_probs'
```

when setting `config.policy_loss.loss_mode` to one of these
alternatives.

Therefore, in this PR, `rollout_log_probs` is also added as an argument
to the other policy loss functions.



### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-30 09:08:20 +08:00
e1603dc97f add gptoss grpo example script (#3212)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Adding a script to run gpt-oss 20B model with VeRL. 


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: RichardW <richard.junwang@bytedance.com>
Co-authored-by: GeLee-Q  <leege233@gmail.com>
Co-authored-by: zhaochenyang20  <zhaochen20@outlook.com>
2025-08-29 11:32:24 -07:00
4d24449193 [recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. (#3252)
### What does this PR do?
Remove redundant parameters to resolve errors in the script caused by
the latest Verl main branch.
Related issue: [issue](https://github.com/volcengine/verl/issues/3248)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Design & Code Changes
Removed the two unnecessary parameters **dp_model_parallel_size** and
**rollout_world_size** from the relevant files.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-29 21:48:42 +08:00
fc05070fa0 [ckpt] fix: TypeError when save VL model ckpt (#3268)
### What does this PR do?

Fix #3267 .

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+checkpoint+
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
2025-08-29 21:41:09 +08:00
cc2799b235 [hardware] fix: Call synchronization when using the td.to("cpu") operation on NPU to avoid potential precision issues (#3222)
### What does this PR do?

In verl, the driver process aggregates the computation results of
workers via Ray. Therefore, after a worker completes its computation
job, it will package the output using tensordict and transfer it to the
CPU. Since the `to` operation of tensordict is non-blocking, when
transferring data from a device to the CPU, it is necessary to ensure
that a batch of data has been completely transferred before being used
on the host; otherwise, unexpected precision issues may arise.
Tensordict has already noticed this problem and fixed it. Ref:
https://github.com/pytorch/tensordict/issues/725

However, the relevant modifications only cover CUDA and MPS devices and
do not take effect for third-party devices such as NPUs. This patch
fixes this issue, and the relevant modifications can be removed once the
fix is merged into tensordict.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-28 20:42:24 +08:00
1065a29d14 [megatron, model] feat: add MegatronEngine, MegatronEngineForCausalLM (#3235) 2025-08-28 19:36:05 +08:00
e95bd9edf2 [sglang] feat: add native sgl server (#3090)
### What does this PR do?

**Summary**

This PR introduces a native HTTP server implementation for SGLang,
aiming to fundamentally improve flexibility, scalability, and
integration capabilities. By transitioning to a more robust
client-server architecture, this change addresses several core
bottlenecks in the current design.

**Key Changes**

* **Engine Replacement** – Replaced the original `sgl.Engine` instance
with a native HTTP server.  **Completed**
* **Distributed Optimization** – Utilizing a server-based architecture
to remove the requirement of gathering all data to TP rank 0. This
change resolves the previous `dist.barrier` timeout issue by replacing
the collective wait with per-sample synchronization. 🚧 **In Progress**
* **Router Integration** – Plan to integrate with the native SGLang
router for streamlined request handling. 💡 **Nice to have**

**Motivation**

The current `sgl.Engine` driver model presents several architectural
challenges, particularly in complex distributed environments. Moving to
an HTTP server architecture is motivated by the need to solve the
following critical issues:

1.  **Eliminate Data Flow Bottlenecks and Improve Performance:**
* **Problem:** The data flow logic of the existing driver process is
misaligned with the training data flow. It requires all data for a
single SGLang instance to be gathered to TP rank 0. This data is then
processed by the tokenizer manager and sent via ZMQ to the various
schedulers. As a result, the `preprocess` and `postprocess` steps are
slower than expected.
* **Solution:** The HTTP server architecture decentralizes this process,
allowing each rank to handle requests independently. This removes the
"gather to rank 0" bottleneck, dramatically improving data throughput
and overall performance.

2.  **Resolve CPU Resource Contention:**
* **Problem:** At the request level, the SGLang driver object cannot be
pickled for use in subprocesses. This limitation means that the
request-level asynchronous rollout logic and the engine itself are
forced to compete for the same CPU time slices, leading to performance
degradation.
* **Solution:** By decoupling the request handling (client) from the
inference engine (server), we isolate the processes, eliminating the CPU
contention and allowing for more efficient resource utilization.

3.  **Fix Distributed Synchronization Timeouts:**
* **Problem:** The `dist.barrier` timeout is a frequent issue where
worker ranks remain idle while waiting for TP rank 0 to complete its
intensive processing. This collective wait time creates inefficiency and
can lead to failures.
* **Solution:** The HTTP server model shifts this from a collective
barrier to a per-sample synchronization. Workers communicate with the
server as needed, removing the long wait times and making the
distributed setup more stable and efficient.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-28 12:40:19 +08:00
1e413344a2 [recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding (#3242)
### What does this PR do?

This PR introduces a new recipe, `infigui-g1`, for training Multimodal
Large Language Models (MLLMs) in GUI grounding tasks. This recipe
implements a reinforcement learning approach that significantly improves
the model's ability to understand and interact with graphical user
interfaces.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/search?q=repo%3Avolcengine%2Fverl+gui&type=pullrequests
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

The effectiveness of this recipe has been validated through experiments.
Key results are as follows:
- The training curves for reward, validation accuracy, and exploration
success rate all show a upward trend.
- After 156 steps of training on sample data, the 3b model achieves a
score of **41.2** on the `screenspot-pro` benchmark, a substantial
improvement over the base model's score of **18.2**.
<img width="345" height="291" alt="Screenshot 2025-08-27 172010"
src="https://github.com/user-attachments/assets/9ecd93d5-4f9b-4c40-831c-79a50fd197c4"
/>
<img width="347" height="292" alt="Screenshot 2025-08-27 171902"
src="https://github.com/user-attachments/assets/2e437c1f-9eb0-4106-a6c3-b22125026a79"
/>
<img width="346" height="293" alt="Screenshot 2025-08-27 171928"
src="https://github.com/user-attachments/assets/9c94515d-1501-40f4-979c-95e2f819dc62"
/>

### API and Usage Example

The recipe is self-contained and can be run using the provided scripts.
For example, to run training with the 3B parameter model:

```bash
# In verl path
bash recipe/infigui-g1/run_3b.sh
```

### Design & Code Changes

This PR adds a new, independent recipe located in `recipe/infigui-g1/`.
The changes are fully encapsulated within this directory and do not
affect any other part of the codebase.

The new files include:
- `recipe/infigui-g1/README.md`: An introduction to the recipe.
- `recipe/infigui-g1/run_3b.sh`, `run_7b.sh`: Scripts to launch
training.
- `recipe/infigui-g1/reward_fn.py`: Custom reward function
implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-27 23:35:22 +08:00
53b68c638b [fsdp, training_utils] Fix: LoRA w/ VLMs when Using Layered Summon (#3231)
### What does this PR do?

Currently, LoRA parameters are not correctly streamed when training Qwen
2.5 VL with `layered_summon=True`. This is due to a missing prefix for
the Qwen VL models.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: "VLM
LoRA", "LoRA"
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Without this change, when using (1) Qwen VLM, (2) LoRA, (3) layered
summon, we see this log when weight updates are sent to vLLM:
```
(WorkerDict pid=424928) INFO:2025-08-26 22:22:24,788:vLLM load weights, loaded_params: 0
```

After:
```
(WorkerDict pid=424928) INFO:2025-08-26 22:22:24,788:vLLM load weights, loaded_params: 504
```


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-27 21:51:42 +08:00
b7df22ec51 [trainer] fix: Unified use of the def to() in Class DataProto (#3227)
Removed all `.to()` operations work on TensorDict instance directly,
make them use `def to()` in DataProto instead.

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-27 19:55:08 +08:00
ff4f30b467 [doc] fix: fix slack invitation link (#3230) 2025-08-27 07:45:09 +08:00
b8dc5377c6 [BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling (#2953)
### What does this PR do?

Support [vLLM-FSDP off-policy importance sampling
correction](https://fengyao.notion.site/off-policy-rl) using Truncated
Importance Sampling (TIS):

<img width="859" height="382" alt="TIS"
src="https://github.com/user-attachments/assets/adc8f797-aa14-4b29-b265-a682c281d08e"
/>




### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=gae \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=1024 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-32B-Instruct \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=Qwen/Qwen2.5-32B-Instruct \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size_per_gpu=8 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='verl_example' \
    trainer.experiment_name='Qwen2.5-32B-Instruct_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=4 \
    trainer.save_freq=20 \
    trainer.test_freq=10 \
    trainer.total_epochs=15 \
    actor_rollout_ref.rollout.calculate_log_probs=True \   # add this config to return rollout prob
    +actor_rollout_ref.actor.behav_imp_weight_cap=10.0$@   # add this config to set up C value in TIS
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Narsil-Dinghuai Zhang 张鼎怀 <dinghuai233@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: LiyuanLucasLiu <llychinalz@gmail.com>
2025-08-26 14:06:07 -07:00
5362d704be [rollout] fix: Restore the parameter 'limit_images' in RolloutConfig (#3217)
### What does this PR do?

- This PR adds the parameter `limit_images` in RolloutConfig. Users can
specify the image limit in vllm by setting
`+actor_rollout_ref.rollout.limit_images=xxx`

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-08-26 20:30:52 +08:00
9f0f8b0e7c [ci] fix: fix type convergence check (#3219)
### What does this PR do?

[ci] fix: fix type convergence check

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-26 14:17:18 +08:00
27b63c724a [env, sglang] feat: Bump new sglang version to fix vlm OOM (#3216)
### What does this PR do?
- Bump new version of sglang
- This version's sglang can fix vlm OOM issue, detail are in:
https://github.com/sgl-project/sglang/issues/9365

### Test

Using instruction following
https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/latest_sglang.md

Now we have new version of sglang:
<img width="786" height="154" alt="image"
src="https://github.com/user-attachments/assets/bcec557e-196c-40c0-aa0f-c19d9f5c3e98"
/>

`gsm8k`:
using `verl/examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh`

[Wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/dtcdin9b?nw=nwuserpopsoda)
<img width="532" height="329" alt="image"
src="https://github.com/user-attachments/assets/12f67d1a-a57e-497d-bfe5-6ff8c642e83f"
/>

It can work well.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-26 13:29:36 +08:00
4ed7811813 [megatron] refactor: refactor MegatronPPOActor (#3206)
### What does this PR do?

- Make megatron related print only print on rank zero
- Remove unused code in megatron actor
- Modularize megatron loss computation so that it can be used for SFT as
well

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-26 10:41:57 +08:00
7592d69cbb [trainer] refactor: PPO config validation fast fail (#3187)
### What does this PR do?

Make main ppo script validate config as soon as all needed info is
available. this enables the script to fail as fast as possible in case
of bug in config.
New changes would avoid downloading and loading tokenizer and loading
data before validating config
solve #3182 

### Design & Code Changes

Isolated config validation in utils (out of PpoRayTrainer) and call it
from main_ppo as soon as possible.
2025-08-26 10:31:39 +08:00
b4a410197c [doc] fix: fix a documentation typo for nsys (#3214)
### What does this PR do?

[doc] fix: fix a documentation typo for nsys

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-26 10:11:15 +08:00
f67dc19503 [rollout] fix: apply copy_to_local before init hf config (#3204)
Change-Id: Ic0ddfdfa13a38a56571b9c59125e9ebeea5c7802

### What does this PR do?

- Fixed a bug where the original HDFS path was passed due to not using
`copy_to_local` when initializing the hf config.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: wangzhunheng <wangzhunheng@bytedance.com>
2025-08-26 09:26:00 +08:00
40bf9628ee [data] fix: update parquet_files type check to support multi-file input (#3211) 2025-08-26 05:18:59 +08:00
9b6a07fa77 [docker] feat: update to vllm 0.10.0, mcore 0.13, transformers 4.55.4 (#3192) 2025-08-26 05:17:57 +08:00
a5df7d31ea [perf] fix: fix profiler discrete mode unavailability (#3188)
### What does this PR do?

- Fix the issue where profiling cannot be collected in discrete mode,
for both NPU and nsys.
- Adjust the corresponding unit tests accordingly. 
- Adjust the npu profiler script due to changes in ref.yaml

In discrete mode, distribution is handled through the `annotate` class
method of the `DistProfiler` class in `verl/utils/profiler/profile.py`.
Adjust the `annotat` method of NPUProfiler and NsightSystemsProfiler to
be instance method.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-25 19:39:31 +08:00
2398d36be3 [recipe] feat: Add Qwen3 30B MoE NPU recipe (#3189)
### What does this PR do?

> Update recipe/dapo/run_dapo_qwen3_30b_npu.sh.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: 
https://github.com/volcengine/verl/pulls?q=fsdp+npu+30b+recipe
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

Critic/rewards/mean Comparison Chart, where the orange line represents
ascend NPU, the pink line represents GPU.
<img width="3182" height="1272" alt="image"
src="https://github.com/user-attachments/assets/5c275127-6cb3-4bf9-ac89-0fa6abb668c0"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```shell
# Add code snippet or script demonstrating how to use this
cd /path/to/verl
bash recipe/dapo/run_dapo_qwen3_30b_base_npu_fsdp.sh
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: Shangwei-Li <lishangwei2@huawei.com>
2025-08-25 19:38:23 +08:00
e243d6dd66 Revert "[rollout] feat: use dummy load_format when init AsyncServer" (#3207)
Reverts volcengine/verl#3184
2025-08-25 19:15:52 +08:00
11a43b6cad [env] fix: Improve License Check Hook Flexibility (#3202)
### What does this PR do?

Solve #3201

#### Problem
The existing license check hook scans all directories recursively from a
single root directory, which causes issues in local development
environments:

* Virtual environments (`.venv`, `venv/`) get scanned and fail license
checks
* No easy way to exclude common build/cache directories without
hardcoding exclusions
* Different behavior between local development (with venvs) and CI/CD
(clean environment)

#### Solution
Modified the `check_license.py` script to accept multiple target
directories instead of a single root directory with exclusions.

### Design & Code Changes
Changed argument from `--directory` to `--directories`
* Now accepts multiple `Path` arguments using `nargs="+"`
* Allows specifying exactly which directories to scan
* in local mode: `--directories examples recipe scripts tests verl
setup.py`
* in github workflow: `--directories .`
2025-08-25 16:50:15 +08:00
58c847b17f [doc] fix: set use_dist_checkpointing to False for ref model in qwen3moe-30b script (#3198)
### What does this PR do?

Set use_dist_checkpointing to False for ref model in qwen3moe-30b
script, because there is not dist_megatron_ckpt model path for ref
model.
2025-08-25 12:33:24 +08:00
cb5818c6fc [rollout] fix: add missing extra_reward_info to AgentLoopOuput (#3194)
### What does this PR do?

Fix https://github.com/volcengine/verl/pull/3055, add missing
`extra_reward_info` to AgentLoopOuput, which is needed by metrics
calculation.
2025-08-25 12:23:32 +08:00
7ff2386987 [rollout, sglang] feat: Add sync mode for bash (#3186)
### What does this PR do?
- Use `sync` mode for `dapo`, `gsm8k` and `geo`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
2025-08-24 20:43:11 -07:00
2c7a9c5708 [rollout] feat: use dummy load_format when init AsyncServer (#3184)
### What does this PR do?

- Loading weights in AsyncServer is duplicated and is time-consuming for
large models
- Use dummy weights instead as the actual weights will be transferred by
the trainer

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-25 10:30:48 +08:00
28a3e418d8 [misc] feat: Add RL-PLUS to awesome work list (#3197)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

**Adding RL-PLUS to the README as a list of work that used veRL, with
only a 1-line change to the README.md.**


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-25 09:33:33 +08:00
4ea0583bad [Optimize]Safe tool parameter access standardization in SGLang rollout (#3196)
Fix https://github.com/volcengine/verl/issues/3195

Changes:  
1. 🔒 Replace all direct dict[key] access with .get(key, {}) pattern for
tool kwargs
2.  Add validation in _preprocess_prompt_to_async_rollout_requests  
3. 🧪 New test cases covering:  
   • Missing tool configs  
   • Partial execute_kwargs  
   • Empty tool schemas  

Impact:  
• Prevents KeyError crashes when tools/kwargs are missing  
• Maintains existing flexible tool parameter system  
• Zero breaking changes to valid configurations
2025-08-24 12:42:58 -07:00
3a394c9bd0 [recipe] fix: Setting DAPO baseline in SGLang multi-turn RL (#3175)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR adds the dapo baseline in SGLang multi-turn rollout. Basically
speaking, the previous DAPO multi-turn baseline with retool doesn't
actually converge, since we find that the previous reward of retool is
just encouraging the model to generate more turns to call more tools.
The answers are not actually correct.

In this fix, we (SGLang RL Group) do a manual SFT and make a new model
`font-info/qwen3-4b-sft-SGLang-RL` instead of
`Qwen/Qwen3-4B-Instruct-2507`. Without finetune, the model can not
converge.

In the same time, we reduce the default value of minial reward in
retool, from 0 to -0.6, `result["score"] = min(-0.6, result["score"] +
tool_call_reward)`. Thus, if a model can not generate the correct
answer, it will get a score as -0.6, rather than 0. So in our
demonstration, we do converge!

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: Zhuorany <yzr1914001753@gmail.com>
Co-authored-by: mao cheng <maocheng@berkeley.edu>
Co-authored-by: Hecate0821 <hec4te0821@gmail.com>
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
2025-08-22 21:26:44 -07:00
bf56a2aa27 [megatron] feat: set_expandable_segments for megatron (#3181)
### What does this PR do?

- As title
- We use set_expandable_segments to resolve memory fragmentation

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-23 10:49:44 +08:00
ce89063712 [misc] feat: Add L40S and A40 flop counts (#3177)
### What does this PR do?

Adds flop counts for more GPUs

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-22 17:31:46 +08:00
OC
f6f910069b [doc] fix: add qwen3moe-30b script and fix error in qwen3-235b (#3174)
1. add qwen3moe-30b script for 1 to 4 H20 nodes with best performance
2. fix error in qwen3-235b: 
  - vllm enable_expert_parallel may result invalid output
  - megratron num_layers_in_last_pipeline_stage is a depreciate option
 
---------
Co-authored-by: Yan Bai <bayan@nvidia.com>
2025-08-22 13:59:24 +08:00
0e15c9b11c [sglang] fix: remove unused padding in SGLang rollout (#3138)
### What does this PR do?

What does this PR do?
There are some unused padding talked in this issue: 
https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/193
- There are just 5 key fields which need to return back after
rollout(example in `agent_loop`):
```python
batch = TensorDict(
{
    "prompts": prompt_ids,  # [bsz, prompt_length]
    "responses": response_ids,  # [bsz, response_length]
    "response_mask": response_mask,  # [bsz, response_length]
    "input_ids": input_ids,  # [bsz, prompt_length + response_length]
    "attention_mask": attention_mask,  # [bsz, prompt_length + response_length]
    "position_ids": position_ids, 
    # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length]
},
batch_size=len(inputs),
)
``` 
- Remove some unused variable like `prompt_loss_mask`
- Make `response_position_id` all zero tensor
- Copy class to avoid constructing a new class

### Test

`over_sample = 0.1` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda)

<img width="1555" height="680" alt="image"
src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1"
/>

No issue.

`over_sample = 0.0` 

[wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda)

<img width="1532" height="683" alt="image"
src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a"
/>

As expected too

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
2025-08-21 14:01:50 +08:00
5b5e09d9cc [sglang] fix: fall back to default FSDP1 (#3156)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-08-20 15:04:59 -07:00
864ba99876 [fsdp, trainer, tool] feat: add memory snapshot & visualization support for debugging GPU memory leaks (#3099)
### What does this PR do?

This PR adds a memory snapshot and visualization tool to help identify
potential GPU memory leaks during training.

In some training runs, we observed increasing GPU memory usage across
steps, suggesting memory might not be properly released. To support
debugging, this PR enables:

* Periodic memory snapshot dumping via PyTorch's internal APIs.
* Manual snapshot dumping at key points (e.g., after each step).
* Easy integration with `torch.memory_viz` for post-hoc visualization.

---

### Checklist Before Starting

* [x] Search: [[memory snapshot
PRs](https://github.com/volcengine/verl/pulls?q=is%3Apr+memory+snapshot)](https://github.com/volcengine/verl/pulls?q=is%3Apr+memory+snapshot)
* [x] Title: `[fsdp, trainer, tool] feat: add memory snapshot &
visualization support`

---

### Test

* Enabled `enable_memory_visualize` in config and verified snapshot
`.pickle` files are generated.
* Confirmed snapshot files work with `torch.memory_viz`.
* Validated both periodic and manual snapshot dumping.

---

### API and Usage Example

**Enable in config:**

```yaml
fsdp_config:
  enable_memory_visualize: true
  memory_snapshot_interval_sec: 300
  memory_snapshot_out_dir: "./mem_snapshots"
```

**Manually dump after each step:**

after each step, adds like this:

```python
if self.config.actor_rollout_ref.actor.fsdp_config.enable_memory_visualize:
    self.actor_rollout_wg.dump_memory_snapshot(
        tag=f"post_update_step{self.global_steps}",
        sub_dir=f"step{self.global_steps}"
    )
```

---

### Design & Code Changes

* New FSDP config fields:
`enable_memory_visualize`, `memory_snapshot_interval_sec`,
`memory_snapshot_out_dir`
* New utility functions in `memory_utils.py`:

  * `enable_memory_visualize()`
  * `dump_memory_snapshot(...)`
  * `MemorySnapshotSampler` (background thread)
* Integrated into `FSDPWorkers` and training loop (`ray_trainer.fit()`)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: narutolhy <582909902@qq.com>
2025-08-20 14:07:04 -07:00
31771cfade [rollout] feat: add response token logprobs in agent loop output (#3151)
### What does this PR do?

Add response token logprobs in agent loop output
2025-08-20 23:19:52 +08:00
d2126e7afd [recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU (#3146)
### What does this PR do?
Provide an script for DAPO-training qwen2.5-32B on NPU, and update
experiment result.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[[recipe] feat: support qwen3-8B/14B DAPO training on ASCEND
NPU](https://github.com/volcengine/verl/pull/2836)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

The following are some comparison charts of relevant data from script
testing, where red represents NPU and blue represents GPU.

Critic/rewards/mean Comparison Chart
<img width="1314" height="714" alt="image"
src="https://github.com/user-attachments/assets/3c303100-7106-491b-a6ea-e0bd1926076c"
/>

Response_length/mean Comparison Chart
<img width="1322" height="714" alt="image"
src="https://github.com/user-attachments/assets/9fa01f6f-2774-4b07-a38b-71cb6b5c8359"
/>

Val-core/math_dapo/acc/mean@32 Comparison Chart (Test by aime-2024)
<img width="1320" height="716" alt="image"
src="https://github.com/user-attachments/assets/b6912e3c-89c6-4999-90bb-fa961edc6e4a"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```bash
cd /path/to/verl
bash recipe/dapo/run_dapo_qwen2.5_32b_npu.sh
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-20 20:13:57 +08:00
012d972223 [fsdp, sglang] fix: Using Agreesive Empty Cache instead (#3136)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-08-20 19:32:48 +08:00
944264b583 [rollout] fix: KeyError "CPU" init agent loop workers (#3141)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Fix #3137. Take into consideration the ray nodes set to `num_cpus=0`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 19:31:49 +08:00
23d6f77513 [megatron] fix: fix megatron micro_batch_size assertion (#3142)
### What does this PR do?

- fix megatron micro_batch_size assertion. When using `use_dynamic_bsz`,
we don't need to set `micro_batch_size`

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 14:37:34 +08:00
ae46f5a41a [ci] fix: model tests, transformers 4.55 has troubles with backward (#3139)
### What does this PR do?

[ci] fix: model tests, transformers 4.55 has troubles with backward

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 13:33:12 +08:00
9d66534060 [doc] feat: documentation Update, Ray Job Management Commands (#3131)
### What does this PR do?

Added two Ray CLI commands to the documentation for better job
monitoring:

1. **Job ID Retrieval Command**  
`ray job list | grep submission_id | grep JobStatus | grep RUNNING |
grep -oP 'raysubmit_[^'\''"]+' | head -n 1`
   This pipeline fetches the latest running job's submission ID by:
   - Filtering active jobs (`RUNNING` status)
   - Extracting `raysubmit_*` IDs
   - Returning the first match

2. **Continuous Log Streaming**  
   `ray job logs <Submission ID> --follow`  
Added the `--follow` parameter to enable real-time log streaming,
allowing users to:
   - Continuously monitor job output
   - Debug long-running processes interactively
   - Maintain persistent log connection until job completion

These additions enhance operational visibility for Ray job management
workflows.
2025-08-20 11:03:48 +08:00
de0b31ceee [sglang] feat: make sglang properly handle the max_num_seqs configuration (#3134)
### What does this PR do?

vllm async engine receives the  `max_num_seqs` option from yaml, 
but sglang ignore it, 
this PR patches this issue.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
2025-08-20 10:59:35 +08:00
26ccffa83f [rollout] fix: numpy.int64 serialization error in Weave tracing during validation (#3112)
### What does this PR do?

During validation steps, the following Pydantic serialization error
occurs when Weave tracing is enabled:
```bash
(AgentLoopWorker pid=2278557, ip=x) weave: Task failed: PydanticSerializationError: Unable to serialize unknown type: <class 'numpy.int64'> [repeated 16286x across cluster]
(AgentLoopWorker pid=2278557, ip=x) ERROR:2025-08-18 16:45:08,385:Task failed: PydanticSerializationError: Unable to serialize unknown type: <class 'numpy.int64'> [repeated 16279x across cluster]
```
The issue occurs in:

313366fd85/verl/experimental/agent_loop/agent_loop.py (L315)

When the batch doesn't contain an "index" field (which commonly happens
during validation), `np.arange()` creates a numpy array with numpy.int64
elements. These values are then passed through the following chain:

1. `get_trajectory_info()` → `trajectory_info` dict with `sample_index`:
`numpy.int64`
2. `rollout_trace_attr()` → `attributes` dict with `sample_index`:
`numpy.int64`
3. `weave.attributes(attributes)` → `Pydantic serialization fails`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Design & Code Changes

Convert numpy array to Python native integers to ensure Pydantic
compatibility.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 10:12:11 +08:00
fc2bfd9a72 [misc] fix: update peft's version in requirements-npu.txt (#3127)
### What does this PR do?

As title.
Limit the version of PEFT to ensure sft's workflow is not interrupted on
npu.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 10:11:29 +08:00
6469be213e [recipe] fix: make compute of step consistent across all trainers (#3132)
### What does this PR do?
follow-up to #3117
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 09:54:29 +08:00
47033fc8a2 [megatron] fix: mbridge save/load (#2519)
### What does this PR do?

Currently mbridge will not save optimizers. Fix mbridge save and load
path. Add CI test.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-08-20 09:09:56 +08:00
5deb8cc9a6 [megatron] fix: add temperature parameter for logits scaling (#3133)
### What does this PR do?

> This PR fixes the handling of the temperature parameter in Megatron by
explicitly propagating it through the forward path. Logits are now
scaled by dividing with temperature during processing, aligning Megatron
with the FSDP logits handling implemented in
[dp_actor.py#L192](https://github.com/volcengine/verl/blob/main/verl/workers/actor/dp_actor.py#L192).
The issue became apparent when running Megatron training with
temperature=0.9.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> Here is the comparison plot for `raw-mcore (t=0.9)`, `fsdp (t=0.9)`,
and `fixed-mcore (t=0.9)`.

<img width="949" height="497" alt="image"
src="https://github.com/user-attachments/assets/7f06120b-bf8f-4222-86ed-138fbca382f7"
/>

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: xinyanguan <xinyanguan@tencent.com>
2025-08-20 08:46:13 +08:00
afd759789b [trainer] fix: move testing out of step timings (#3117)
### What does this PR do?
Possible fix of #3116

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[query](https://github.com/volcengine/verl/issues?q=step%20timing)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submittin
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-19 19:54:59 +08:00
6e0fa3f0df [trainer] fix: only load memory in micro batch for compute_log_prob, compute_values and update_critic (#3094)
### What does this PR do?
Modified data loading logic to transfer only micro-batches to GPU memory
during training/inference instead of the entire batch for saving memory.
Like the pr
12c83e8ada
and https://github.com/volcengine/verl/pull/2908
2025-08-19 18:18:01 +08:00
04efe11df6 [ci] fix: fix precommit (#3128)
### What does this PR do?

- fix precommit

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-19 18:17:05 +08:00
c3c2f9a9bc [rollout] feat: compute reward score in agent loop (#3055)
### What does this PR do?

Compute reward score for each prompt once the agent loop is finished,
this can significantly hide the reward computation time.

https://github.com/volcengine/verl/issues/2618
2025-08-19 16:38:23 +08:00
8494135e5c [rollout] feat: use rollout worker in MegatronWorker (#3111)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-19 15:07:52 +08:00
43cb93c8d1 [trainer] fix: only load memory in micro batch for megatron backend (#3106)
### What does this PR do?

Modified data loading logic to transfer only micro-batches to GPU memory
during training instead of the entire batch for saving memory for
megatron backend. Like the pr
12c83e8ada
https://github.com/volcengine/verl/pull/2908, and
https://github.com/volcengine/verl/pull/3094
2025-08-19 13:04:34 +08:00
dd13051602 Fix python version (#3103) 2025-08-18 20:46:09 -07:00
6e55669fd0 [trainer, worker] fix: setting old log probs equal to log probs for on policy training (#3119)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Since training backend (FSDP/Megatron)'s recompute of log probs are not
accurate, so given an exact batch forwarding twice, the `old_log_probs`
vs `log_probs` are not the same even in on policy training. This PR
quickly fix this issue.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-08-19 09:43:59 +08:00
603c07d999 [doc, perf] feat: add profiling doc (#3113) 2025-08-19 09:06:33 +08:00
313366fd85 [misc] fix: fix precommit (#3109)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-18 16:46:32 +08:00
ee5ac8e182 [doc] feat: Add Kimina-Prover-RL to awesome work (#3108)
### What does this PR do?

Add Kimina-Prover-RL to the list of awesome work using verl.

Kimina-Prover-RL is a training pipeline designed to teach large language
models to solve formal proof goals in Lean 4, using a two-stage output
structure: a natural language reasoning trace followed by corresponding
Lean code.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-18 16:43:33 +08:00
97b65c63c7 [perf] fix: fix npu profiler and add mstx UT (#3052)
### What does this PR do?

- fix the parameter passing error for profile_level
- fix the error when creating npu profiler in discrete mode
- modify the execution script
- modify ascend profiling doc
- add the discrete parameter in tool_config
- add mstx_profile UT

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-18 15:18:08 +08:00
507e932941 [fsdp, trainer, ckpt] feat: support custom model init and merging for FSDP (#3012)
### What does this PR do?

This PR adds support for custom model initialization and merging in
fsdp.

Custom models are no longer required to follow the naming conventions
like `xxxForCausalLM` or `xxxForConditionalGeneration`. Besides, it can
be loaded using different AutoClass specified by `auto_map`, such as
using `AutoModelForCausalLM` to load `xxxForConditionalGeneration` or
`xxxForChat`, etc..


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

1. Custom Model Definition
```python
from transformers import Gemma3ForConditionalGeneration

class CustomModelForChat(Gemma3ForConditionalGeneration):
   ...
```
2. Custom Model Config
```json
{
  "architectures": [
    "CustomModelForChat"
  ],
  "auto_map": {
    "AutoConfig": "configuration_custom.CustomModelConfig",
    "AutoModelForCausalLM": "modeling_custom.CustomModelForChat"
  },
  ...
  "transformers_version": "4.53.0",
}

```

3. Testing
If the model config has `auto_map`, then load the specified AutoClass,
otherwise, fall back to the default architectures handling.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-18 14:56:32 +08:00
c86c831893 [recipe] fix: checkpoint in last step might be ignored to save in dapo (#3034)
1. The is_last_step variable is not updated in a timely manner and
should be updated promptly after self.gen_step is modified.
2. If, in the last step, the batch is not fully formed due to the
filter_group logic, it will trigger a "continue" statement, thereby
skipping the checkpoint saving logic.

### What does this PR do?

This PR fixes two related issues:

Ensures the is_last_step flag is correctly updated after self.gen_step
changes, to properly indicate the last generation step.
Prevents the checkpoint-saving logic from being incorrectly skipped in
the last step when the batch is not full due to filtering (e.g., via
filter_group).
These changes help ensure that checkpoints are saved appropriately at
the end of generation, improving reliability and consistency in training
or inference workflows.

### similar PR

https://github.com/volcengine/verl/pull/2619#issue-3242284106

This PR seems to address a similar issue, but I still encountered
problems when using its code. Therefore, I made further modifications
based on that version.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-18 11:19:10 +08:00
bc1b760fb1 [BREAKING] [rollout] feat: add a separate rollout worker (#3071)
### What does this PR do?

- Introduce a separate rolloutworker that can be instantiated without
hybridengine
- Introduce a ModelConfig that wraps all model related config
- Remove hf_rollout (will replace with TP support in the future if
needed)
- Next PR: modify MegatronWorker to use separate rollout worker

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-18 10:57:33 +08:00
4c3310db28 [sglang] fix: Qwen VLM Baseline and sgl CI (#3101)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

fix recent borken CI on SGLang and asend.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-17 19:28:25 -07:00
966719c36a Update ray_trainer.py (#3092)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-08-17 20:29:43 +08:00
e32cceea4a [sglang] fix: Qwen VLM Baseline (#3083)
### What does this PR do?

This PR fix the script in
https://github.com/volcengine/verl/blob/main/examples/grpo_trainer/run_qwen2_5_vl-7b.sh

The core issue was `TypeError: 'NoneType'` object is not callable which
occurred because the variable flash_attn_varlen_func was assigned None.
This happened when the primary import from
`transformers.modeling_flash_attention_utils` failed.

I add a nested try...except block to first attempt the import from
transformers, and if that fails, to then try importing
`flash_attn_varlen_func` directly from the `flash_attn` package as a
solution.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

I added a new test script here:

`examples/grpo_trainer/run_qwen2_5_vl-7b-sglang.sh`

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-08-16 18:22:31 -07:00
e764d408df [fsdp] fix: patch fsdp2 to support hf transformer==4.54.0 and above (#3072)
### What does this PR do?
@ETOgaosion found that fsdp2 cannot work with
GenericForTokenClassification in transformer==4.54.0+
https://github.com/pytorch/pytorch/issues/160068
https://github.com/volcengine/verl/pull/2947

FSDP2 complains about object layerout mismatch when constructing
FSDPGenericForTokenClassification. The solution is to let FSDPModule
inherit ABC as well

this should unblock @vermouth1992 on deprecating fsdp1

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
add following script to apply_fsdp2
```
# pip install transformers==4.54.0
from transformers import AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained("Qwen/Qwen2-0.5B")
with maybe_patch_fsdp_module(model):
    fully_shard(model, **fsdp_kwargs)
return
```

```
PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
 data.train_files=$HOME/data/gsm8k/train.parquet \
 data.val_files=$HOME/data/gsm8k/test.parquet \
 data.train_batch_size=256 \
 data.max_prompt_length=512 \
 data.max_response_length=256 \
 actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 actor_rollout_ref.actor.optim.lr=1e-6 \
 actor_rollout_ref.actor.ppo_mini_batch_size=64 \
 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
 actor_rollout_ref.rollout.name=vllm \
 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
 actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
 actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
 critic.optim.lr=1e-5 \
 critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 critic.ppo_micro_batch_size_per_gpu=4 \
 algorithm.kl_ctrl.kl_coef=0.001 \
 trainer.logger=console \
 trainer.val_before_train=False \
 trainer.n_gpus_per_node=2 \
 trainer.nnodes=1 \
 trainer.save_freq=10 \
 trainer.test_freq=10 \
 trainer.total_epochs=15 \
 critic.strategy=fsdp2 \
 actor_rollout_ref.actor.strategy=fsdp2
```

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-16 15:12:31 +08:00
6bbbff13a1 [fsdp] fix: add missing mixed precision configuration to FSDPEngineConfig (#3068)
### What does this PR do?

The `FSDPEngineConfig` dataclass was missing the `mixed_precision` field
that the runtime code expected. By adding:

```python
mixed_precision: Optional[dict[str, Any]] = None
```

The dataclass now properly supports the mixed precision configuration
that the FSDP workers code uses with `fsdp_config.get("mixed_precision",
None).`


55e3c5bc09/verl/workers/fsdp_workers.py (L371)


Otherwise, if we run with:

```bash
python3 -m verl.trainer.main_ppo \
    actor_rollout_ref.actor.fsdp_config.mixed_precision.param_dtype=bf16 \
    actor_rollout_ref.actor.fsdp_config.mixed_precision.reduce_dtype=fp32 \
    actor_rollout_ref.actor.fsdp_config.mixed_precision.buffer_dtype=fp32 \
    # ... other parameters
```

The following error may occur:

```bash
raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error in call to target 'verl.workers.config.engine.FSDPEngineConfig':
TypeError("FSDPEngineConfig.__init__() got an unexpected keyword argument 'mixed_precision'")
full_key: actor_rollout_ref.actor.fsdp_config
```

### Backward compatibility

No behavior change for existing configs (default remains None).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=mixed_precision
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-16 08:20:28 +08:00
e31a883121 [rollout] fix: vllm sleep level=2 bug (#3082)
### What does this PR do?

1. vllm sleep level=2 has bug and has been fixed:
https://github.com/vllm-project/vllm/pull/16889 and the bug fixed is
released in version 0.8.5:
https://github.com/vllm-project/vllm/releases/tag/v0.8.5
2. fix a typo in deepseek benchmark doc.
2025-08-16 08:19:06 +08:00
2bbd09245c [ray] feat: add support for ray init kwargs (#3049)
### What does this PR do?

This PR adds support for passing parameters to `ray.init`.
Users can now dynamically configure settings such as `address`, `port`,
`_temp_dir`, and more based on their specific needs.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

```bash
# when /tmp/ray/ is used by others
# when ray is initialized at 6379 by others
# when the dashboard is not accessible at localhost
# ...
bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh \
    +ray_kwargs.ray_init._temp_dir=/tmp/ray/my_dir \
    +ray_kwargs.ray_init.address=127.0.0.1:6378 \
    +ray_kwargs.ray_init.dashboard_host=0.0.0.0
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-15 20:02:56 +08:00
55e3c5bc09 [tool] fix: support non-ascii characters in search results (#3044)
### What does this PR do?

A small change from `json.dumps({"result": final_result})` to
`json.dumps({"result": final_result}, ensure_ascii=False)`, supporting
customized search engines that return docs containing non-ascii
characters (e.g., CJK characters).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/1682
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-15 13:55:18 +08:00
d253526c73 [ray] feat: remove worker group register center (#3066)
### What does this PR do?

Remove worker group register center, instead we schedule a task in first
placement group to get `MASTER_ADDR` and `MASTER_PORT`.
2025-08-15 13:54:46 +08:00
28f6e4af7e [doc]fix: optimize ascend docs (#3063)
### What does this PR do?

- 修复ascend_quick_start.rst中一些依赖软件的版本匹配错误。
- 支持现状表格中增加对actor.strategy和rollout.name的说明。
- 重命名ascend_profiling_en.rst和ascend_profiling_zh.rst,使文档标题看起来更美观些。 
<img width="402" height="103" alt="image"
src="https://github.com/user-attachments/assets/8f9ece22-315e-4f80-8157-04838f7467a3"
/>

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-15 13:24:21 +08:00
bd756c15c8 [BREAKING][rollout] feat: allow users pass all vllm/sglang engine args (#3037)
This PR allows users to pass all vllm/sglang engine args and optimizes
qwen3 rollout speed through vllm Engine argument.

1. deprecate the default value of previous engine_kwargs
2. pass all the engine_kwargs to vllm/sglang engine
3. optimize Qwen3-235B rollout speed by setting TP=8 and enabling expert
parallel.

From top to bottom: tp=16 without EP, tp=8 without EP and tp=8 with EP.
<img width="1000" height="808" alt="image"
src="https://github.com/user-attachments/assets/6b096be4-3896-4e96-8916-d8d6e13a58cc"
/>

PS: The DeepSeek-V3's rollout slows down after enabling expert
parallelism.
2025-08-14 19:12:26 +08:00
bd3b735514 [trainer] fix: Remove redundant 'data.to()' codes (#3051)
### What does this PR do?

Removed redundant ```data.to()``` codes.

`data.batch = data.batch.to("cpu")` in `def update_actor()`: 
The data already loaded into CPU after latest computation, which is`def
compute_log_prob()` in DAPO for example.

`data = data.to(get_device_id())` in `def compute_ref_log_prob() & def
compute_log_prob()`:
Both of data are gonna be move to GPU or NPU... at
`self.actor.compute_log_prob(data=data, calculate_entropy=True)`.



> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-14 19:10:21 +08:00
76e41368b4 [hardware] add flops count support for A3 device (#3053)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

add flops count support for A3 device

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-14 19:09:49 +08:00
8aa09db4b6 [rollout,vllm] feat: support multi-modal in agent loop (#3016)
### What does this PR do?

Follow https://github.com/volcengine/verl/pull/2398, support vLLM
multi-modal.
2025-08-14 19:08:47 +08:00
1a62568f80 [rollout] feat: remove over-catched errors in SGLang rollout (#3047)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

As named, the catch should not cover abort. Abort should work as
expected.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: ChangyiYang <changyiyang2023@gmail.com>
2025-08-13 23:19:21 -07:00
e2e4c35ecb [doc] feat: add benchmark for deepseek (#3046)
Add a benchmark result for Deepseek. Other benchmark results are on the
way.
2025-08-14 13:29:28 +08:00
ea885f32f0 [rollout] feat: support over sampling rollout in SGLang Rollout (#2929)
### What does this PR do?

This PR introduces an **over-sample strategy** for verl's SGLang
multi-turn rollout to address the long-tail problem, where a few slow
requests disproportionately increase the overall rollout time. The core
idea is to over-sample the number of requests at the start of the
rollout and then aggressively cancel any requests that haven't finished
once a target number of completions is met.

- **Improves rollout efficiency** for multi-turn conversations by
reducing total time spent waiting for slow requests.
- **Implements a new request monitoring and cancellation mechanism** to
cut off unnecessary computation.

wandb results is as follow:


https://wandb.ai/zhaochenyang20/benchmark_over_sample_2/workspace?nw=nwuserzhaochenyang20

-----

Of course, this strategy has its share of issues. For example, many
might question why the over-long requests that are dropped aren't simply
saved and continued in the next round. This is certainly possible—it's a
partial rollout strategy—but it would require verl to have a data
buffer, which is beyond the scope of this PR. Furthermore, saving and
continuing these requests would introduce an off-policy problem.

There is also a valid concern that this rather "brutal" dropping
strategy could unfairly ignore very long requests. I agree this is a
very reasonable point, but currently, we don't have a lossless solution.
However, our dropping strategy is very flexible and could even change
with our curriculum learning. For instance, in the example I gave, I
just directly dropped the last 20% of requests. **In practice, we can
dynamically adjust this drop rate and even set different dropping
methods. For example, we could record the return time (t) for the 80% of
requests and then drop any requests that haven't returned after 1.5t.**

We've provided an initial, validated idea and have completed its
implementation. We welcome everyone to join the discussion on how to
accelerate multi-turn rollouts with acceptable losses.

### Test

The new over-sample strategy was tested with an 8-GPU setup on the
**gsm8k** dataset, yielding the following results:

- **Rollout Time:** Significant reduction in overall rollout time per
step.
  - **Training Rewards:**
- The reward metric for training steps shows a positive bias. This is
because we exclude the aborted requests (which are typically more
difficult and have lower rewards) from the reward calculation.
- The reward metric for validation steps remains accurate and aligns
with the baseline. This is because the cancellation logic is not
triggered during validation, ensuring a fair and complete evaluation.

### API and Usage Example

This feature modifies `sglang_rollout.py` and `metric_utils.py`. To use
it, follow the standard setup and then run the training script with the
over-sample parameters.


https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/over_sample.md

### Design & Code Changes

The design is centered on three main functions that orchestrate the
over-sampling logic: `run_with_cancellation`,
`process_request_with_monitoring`, and `monitor_and_cancel`. These
functions rely on global variables, such as `all_tasks` and
`completion_lock`, to manage state.

- **`run_with_cancellation`:** This is the entry point. It launches all
requests as `process_request_with_monitoring` tasks concurrently with a
single `monitor_and_cancel` task. It uses `asyncio.gather` to wait for
all tasks to complete (or be canceled) and converts any exceptions from
canceled tasks into padding requests before returning the final output.

- **`process_request_with_monitoring`:** This async function handles a
single request. It waits for the request to complete using
`_async_rollout_a_request` and then checks a shared counter,
`completed_count`, using a `completion_lock` for thread safety. If the
target completion count has not been reached, it returns the real
result. If the target has been met, it returns padding data instead,
effectively "discarding" the late result.

- **`monitor_and_cancel`:** This is a separate async task that polls the
`completed_count`. Once the count reaches the `target_completion`
threshold, it immediately cancels all remaining tasks and sends an
`abort_requests` signal to the SGLang engine, halting any ongoing GPU
computation for those requests.

Key code changes:

  - **`sglang_rollout.py`**:
- Adds the three core asynchronous functions for the over-sample
strategy.
- The `AsyncEngine` class now includes a new `abort_request` method that
calls the synchronous `abort_request` in the `tokenizer_manager`.
  - **`metric_utils.py`**:
- The `compute_data_metrics` function is updated to exclude the aborted
requests (identified by padding) from the denominator when calculating
average rewards during training. This prevents the training reward from
being artificially lowered by the zero-reward aborted requests.

This implementation is designed to be a straightforward and effective
solution for the long-tail problem, though some aspects of the
asynchronous design and the impact on training variance require further
investigation.

---------

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com>
Co-authored-by: ChangyiYang <changyiyang2023@gmail.com>
Co-authored-by: PrinsYin <yzr1914001753@gmail.com>
Co-authored-by: WindowsXp-Beta <xinpwei@amazon.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-08-13 21:12:57 -07:00
0807da9115 [misc] feat: add B200 and GB200 flops count (#3041)
### What does this PR do?

- add B200 and GB200 flops count

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-14 09:49:24 +08:00
b6cdcdf805 [doc] feat: Add VTool-R1 in the list of "awesome works using verl (#3036)
Add VTool-R1 into `Awesome work using verl`

### What does this PR do?

This PR adds a recent work built upon verl into the "Awesome work using
verl" Section of the README.md file.
Add VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning
on Multimodal Tool Use, into `Awesome work using verl`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-08-13 18:32:58 +08:00
OC
e6843cc82b [fsdp] fix: set _set_allocator_settings to True to avoid fsdp2 oom (#3020)
### What does this PR do?

Enable expandable_segments to avoid the increasing memory fragmentation
caused by temporary variables during the training process of fsdp2,
which may trigger probabilistic out-of-memory (OOM) errors.

Since both sglang and vllm can not work with expandable_segments:True,
it has to be turn off during rollout.


### Test

Without this fix, memory reserved could be very high after
compute_log_prob or update_actor.
```
(WorkerDict pid=339320) [2025-08-11 17:43:01] dp actor After compute_log_prob, memory allocated (GB): 5.53, memory reserved (GB): 73.59, device memory used/total (GB): 77.47/79.15
```

With this fix, it stays low during training.
```
(WorkerDict pid=396879) [2025-08-12 07:39:42] dp actor After compute_log_prob, memory allocated (GB): 4.95, memory reserved (GB): 14.20, device memory used/total (GB): 17.72/79.15
```

---------
Co-authored-by: narutolhy <luhongyu.4869@bytedance.com>"
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-08-13 15:59:58 +08:00
22a15365db [ci] fix: try fix vllm test network issue (#3031)
### What does this PR do?

[ci] fix: try fix vllm test network issue

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-13 14:01:32 +08:00
83cfc76f73 [recipe] fix: make LangGraph agent example runnable out-of-the-box (#3029)
### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

** End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
2025-08-13 11:02:51 +08:00
65c59c719c [trainer,rollout,doc] feat: reduce minimum gpus to 96 for deepseek-v3 (#3019)
### What does this PR do?
reduce minimum gpus to 96 for deepseek-v3 and 32 for Qwen3-235B

change details:
1. use cpu adam to save GPU memory
2. change vllm sleep level to 2 to save CPU memory
3. fix conflict between megatron HybridDeviceOptimizer and verl
load_megatron_optimizer.
4. provide new training scripts and document.

training logs:
DeepSeek-V3 with 12 Nodes:
<img width="3420" height="1308" alt="image"
src="https://github.com/user-attachments/assets/23bec729-bf39-41c8-a4c2-c51f389d052c"
/>


Qwen3-235B with 4 Nodes:
<img width="3426" height="1380" alt="image"
src="https://github.com/user-attachments/assets/4eeacab4-833f-4409-b294-10bd51d0fde9"
/>

sleep1 vs sleep2 speed on Qwen2.5-7B:
level1 mean:8.26s level2 mean: 8.33s
<img width="698" height="638" alt="image"
src="https://github.com/user-attachments/assets/e3dcbb4b-f841-4d7c-b60f-40da8ffe6c42"
/>
2025-08-13 10:52:45 +08:00
3cc7695f4c [hardware, recipe] chore: support retool sft &update peft sft perf on npu (#3000)
### What does this PR do?

- Add time statistics for all train steps.
- Support retool sft on ascend npu.
- Update peft sft performance.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Tonyztj <1445297443@qq.com>
2025-08-13 10:51:46 +08:00
5957412767 [rollout] feat: add rollout config (#3010)
### What does this PR do?

- Add rollout config

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-13 10:50:27 +08:00
3315c1ab1e [misc] chore: add GPU memory to names that train large models (#3023)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 18:37:10 +08:00
0123ca6ce1 [misc] chore: add gpu memory to deepseek script (#3022)
…atron_80gb.sh

### What does this PR do?

- Rename run_deepseek671b_math_megatron.sh to
run_deepseek671b_math_megatron_80gb.sh

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 17:46:47 +08:00
77b79cba55 [rollout] fix: Add soft node affinity to the agent loop workers (#3006)
### What does this PR do?
This adds (soft) node affinity such that AgentLoopWorkers get scheduled
in the same node if possible.

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/issues?q=is%3Aissue%20%20node%20affinity)
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
Tested by running on our cluster with a ray cluster with multiple nodes,
and verified AgentLoopWorker's are assigned to the same node id through
the dashboard.

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 17:20:59 +08:00
45b4ce910a [perf] feat: Add rollout longtail observation metrics (#3009)
### What does this PR do?

[perf] feat: Add rollout longtail observation metrics, show max and min
rollout timing and top 10% rollout take-up ratio.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-12 13:03:52 +08:00
92cbc2f417 [misc] feat: Support trackio (#3017)
### What does this PR do?

Support Trackio, a lightweight experiment tracking library from Hugging
Face.

Features are listed in https://huggingface.co/blog/trackio

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 13:02:20 +08:00
492bd63e7c [ci] fix: add flash_attn_supports_top_left_mask to ignore list (#3004)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 11:17:47 +08:00
b8c4871c21 [trainer] fix: reduce memory footprint by moving data to the device only in mini batch (#3011)
### What does this PR do?
Reduce peak memory usage during update_actor/critic by moving data to
the device only in mini batch.
Same operation can be seen in
[fsdp_workers.py](https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py#L729)


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 10:37:45 +08:00
9f4161e250 [recipe] feat: add deepeyes recipe (#2398)
### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-12 09:51:58 +08:00
b79263ad60 [perf] refactor: part 2 - Profiler ci test and fixes (#3001)
### What does this PR do?

[perf] refactor part 2: Profiler ci test and fixes

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-12 08:59:39 +08:00
6110410797 [sglang]fix: Reduce memory footprint during rollout by adding load_grad=False when loading megatron weights. (#3007)
When I ran grpo training with sglang on DeepSeek-V3 671B with 256*H100,
found OOM error here. There is no need to load grad when attempts to
convert mcore weights and then conducts rollout generation.

### What does this PR do?

Reduce peak memory usage during rollout generation by not loading
gradient when calling `load_megatron_model_to_gpu()`.
Same operation can be seen in
[megatron_vllm.py](814e421c54/verl/workers/sharding_manager/megatron_vllm.py (L150))

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+rollout+grad)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-11 19:29:46 +08:00
814e421c54 [rollout,vllm] feat: unify vllm and sglang method to async (#2982)
### What does this PR do?

Change vLLM method to async to unify with SGLang.
2025-08-11 14:24:06 +08:00
61dde81e0a [trainer] feat: Specify apply_chat_template_kwargs from config (#2998)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Now we support specifying apply_chat_template_kwargs from config

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

e.g. For #1711, now users can directly use

```
+data.apply_chat_template_kwargs.enable_thinking=False
```

to disable thinking mode in Qwen3, without the need to modify the code.

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

Users can pass and append anything to `data.apply_chat_template_kwargs`,
and this will be passed when `.apply_chat_template()` is called.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-08-11 13:52:48 +08:00
a901f56b8f [model] fix: Handle flash_attn_supports_top_left_mask import for older transformers (#2985)
## Summary
Fix ImportError when using older transformers versions that don't have
`flash_attn_supports_top_left_mask` function.

## Root Cause
The `flash_attn_supports_top_left_mask` function was added in newer
versions of transformers. Users with older versions encounter
ImportError.

## Solution  
- Add try/except blocks to handle the import gracefully
- Provide a safe fallback (return False) for older versions
- Applied to all affected model files

## Changes
- `verl/models/transformers/qwen2_vl.py`
- `verl/models/transformers/qwen2.py`
- `verl/models/transformers/llama.py`
- `verl/models/transformers/kimi_vl.py`

## Testing
-  Tested import compatibility
-  Verified Python syntax
-  Code follows existing patterns in the codebase

Fixes #2968

---------

Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-08-11 13:38:56 +08:00
e63f6acbb7 [ray] fix: Fix function name in worker helper (#2868)
### What does this PR do?

Fix function name in worker helper.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: Ata Fatahi <immrata@gmail.com>
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-08-11 10:03:40 +08:00
545f899844 [BREAKING] [perf] refactor: Profiler api refactor (#2894)
### What does this PR do?

Refactor profiler CI to a unified way.

TODO:

- nsys use `save_path`
- nsys descrete tests are disabled
- torch profiler

cc: @davidmlw 

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

Global profiler config:

```yaml
global_profiler:
  _target_: verl.utils.profiler.ProfilerConfig
  tool: null
  steps: null
  profile_continuous_steps: false
  save_path: outputs/profile
  tool_config:
    nsys:
      _target_: verl.utils.profiler.config.NsightToolConfig
      discrete: false
    npu:
      _target_: verl.utils.profiler.config.NPUToolConfig
      discrete: false
      contents: []
      level: level1
      analysis: true
    torch:
      _target_: verl.utils.profiler.config.TorchProfilerToolConfig
      step_start: 0
      step_end: null
```

Local profiler config:

```yaml
profiler:

  # Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs
  _target_: verl.utils.profiler.ProfilerConfig

  # profiler tool, default same as profiler.tool in global config
  # choices: nsys, npu, torch
  tool: ${oc.select:global_profiler.tool,null}

  # whether enable profile on critic
  enable: False

  # Whether to profile all ranks.
  all_ranks: False

  # The ranks that will be profiled. [] or [0,1,...]
  ranks: []

  # profile results saving path
  save_path: ${oc.select:global_profiler.save_path,null}

  # specific tool config
  tool_config: ${oc.select:global_profiler.tool_config,null}
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-11 09:52:41 +08:00
287ef7e262 [rollout] fix: avoid repeated multiplication by n for GRPO (#2881)
For GRPO the number of generation has already been specified at
2fdfbdcba6/verl/trainer/ppo/ray_trainer.py (L1117),
so the original code in huggingface rollout will generate $n^2$
responses for each prompt.
2025-08-11 09:46:23 +08:00
H
cb809d66e4 [doc] feat: update contact and news (#2993)
### What does this PR do?

Update contact email. 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-08-10 09:16:33 +08:00
OC
da7fc8e015 [rollout,trainer] feat: offload param before wake up inference engine (#2977) 2025-08-09 06:57:05 +08:00
beb6246100 [rollout,vllm] fix: max_num_seqs not take effect (#2960) 2025-08-09 06:55:21 +08:00
980b018c85 [ray, trainer] fix: fix working_dir when launching via uv (#2859)
### What does this PR do?

This PR fix the ray working_dir when launching via `uv run`

`ray` will change the runtime_env when running via uv:
[_maybe_modify_runtime_env](b62ce29706/python/ray/_private/worker.py (L1322-L1346))

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

```python
@ray.remote(num_cpus=1)  # please make sure main_task is not scheduled on head
class TaskRunner:
    """Ray remote class for executing distributed PPO training tasks.

    This class encapsulates the main training logic and runs as a Ray remote actor
    to enable distributed execution across multiple nodes and GPUs.
    """

    def run(self, config):
        print(os.getcwd())
```

1. When launching not using uv: `path/to/verl`
2. When launching using uv:
`/tmp/ray/session_2025-08-01_09-07-33_265741_98359/runtime_resources/working_dir_files/_ray_pkg_225a7865bd7dee4c`,
then checkpoints will be saved on this temp dir.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-08 23:18:04 +08:00
21b99ed741 [misc] feat: Added: "tensorboard" to the requirements.txt (#2900)
### What does this PR do?

> This PR adds tensorboard as a dependency to requirements.txt file,
across several Dockerfiles (Dockerfile.ngc.vllm, Dockerfile.ngc.vllm0.8,
Dockerfile.ngc.vllm0.8.sagemaker), a setup script
(install_vllm_sglang_mcore.sh), and the main setup.py file. This change
ensures that the tensorboard package is consistently installed, enabling
visualization of training metrics for various configurations and
deployment environments. This is a maintenance task that enhances the
project's observability without altering core functionality.

### Test

> This change is a dependency update and doesn't require specific
testing beyond confirming the installation is successful.

### API and Usage Example

> No API changes are introduced. The usage of TensorBoard would be
initiated by the user after installing the requirements.

```python
# No code snippet is applicable for this change
2025-08-08 22:39:53 +08:00
OC
12c83e8ada [trainer] fix: only load memory in micro batch (#2908)
### What does this PR do?

In update_actor, it load the whole bath into GPU memory, actually only
the micro batch is necessary.
It is a regression from https://github.com/volcengine/verl/pull/2477


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+micro+batch
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
<img width="700" height="325" alt="截屏2025-08-05 下午1 01 53"
src="https://github.com/user-attachments/assets/31dc4fea-8cb0-4f51-8ed2-f93d90a94040"
/>
<img width="1359" height="607" alt="截屏2025-08-05 下午12 45 50"
src="https://github.com/user-attachments/assets/747636e6-b919-4eca-a3eb-5baf3722b5fc"
/>



### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-08-08 22:38:22 +08:00
31ac4dc6fa [data] fix: fix bug of '_io.BytesIO' object has no attribute 'startswith' (#2430)
### What does this PR do?

FIX: '_io.BytesIO' object has no attribute 'startswith'

https://github.com/volcengine/verl/issues/1976

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
1. download the test dataset.
```
huggingface-cli download --repo-type dataset xylcbd/pgdp5k_mini
```
2. convert data to parquet format
```
import argparse
import os

import datasets

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--local_dir", default="~/data/pgdp5k_mini")
    args = parser.parse_args()

    data_source = "xylcbd/pgdp5k_mini"
    dataset = datasets.load_dataset(data_source)
    train_dataset = dataset["train"]
    train_dataset.to_parquet(os.path.join(args.local_dir, "train.parquet"))
```
3. test dataset loading:
```
import os

import torch
from omegaconf import OmegaConf
from torch.utils.data import DataLoader
from verl.utils import hf_processor, hf_tokenizer
from verl.utils.dataset.rl_dataset import RLHFDataset, collate_fn

model_path = "Qwen/Qwen2.5-VL-3B-Instruct"
tokenizer = hf_tokenizer(model_path)
processor = hf_processor(model_path)
config = OmegaConf.create(
    {
        "prompt_key": "prompt",
        "max_prompt_length": 1024,
        "filter_overlong_prompts": True,
        "filter_overlong_prompts_workers": 2,
    }
)
dataset = RLHFDataset(
    data_files=os.path.expanduser("~/data/pgdp5k_mini/train.parquet"),
    tokenizer=tokenizer,
    config=config,
    processor=processor,
)

dataloader = DataLoader(dataset=dataset, batch_size=2, shuffle=True, drop_last=True, collate_fn=collate_fn)

a = next(iter(dataloader))

from verl import DataProto

tensors = {}
non_tensors = {}

for key, val in a.items():
    if isinstance(val, torch.Tensor):
        tensors[key] = val
    else:
        non_tensors[key] = val

data_proto = DataProto.from_dict(tensors=tensors, non_tensors=non_tensors)

assert "multi_modal_data" in data_proto.non_tensor_batch, data_proto
assert "multi_modal_inputs" in data_proto.non_tensor_batch, data_proto

data = dataset[0]["input_ids"]
output = tokenizer.batch_decode([data])[0]
print(f"type: type{output}")
print(f"\n\noutput: {output}")
```
4. Error reported before repair (no error reported after repair)
```
AttributeError: '_io.BytesIO' object has no attribute 'startswith'
```
### API and Usage Example

No change for the API.

### Design & Code Changes

No change for the design.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-08-08 22:24:39 +08:00
H
01b4a290b3 [trainer] refactor: make main_ppo TaskRunner more modular (#2885)
### What does this PR do?

- Added `__init__()` method to initialize `self.role_worker_mapping =
{}`
- Extracted worker setup logic into dedicated methods:
- `add_actor_rollout_worker()` - handles strategy-specific worker
imports and setup (lines 130-153)
- `add_critic_worker()` - sets up critic worker role mapping (lines
170-176)
- `init_resource_pool_mgr()` - creates resource pool specifications
(lines 178-187)
- `add_reward_model_worker()` - conditionally adds reward model workers
(lines 195-203)
- `add_ref_policy_worker()` - conditionally adds reference policy
workers (lines 205-208)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

relying on existing unit tests


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-08-08 21:04:04 +08:00
OC
ff6978c14c [rollout] feat: add cudagraph_capture_sizes option to customize cuda graph memory (#2956)
### What does this PR do?

1. enable vllm cuda graph in default
2. add a `cudagraph_capture_sizes` option to customize cuda graph memory

vllm cuda graph can improve performance in every case I have tested. It
is better to enable in default as sglang.

<img width="1145" height="321" alt="截屏2025-08-07 上午11 59 37"
src="https://github.com/user-attachments/assets/b750fb93-f42b-48e8-a5e5-6c5c67e8a5ac"
/>

The default cudagraph_capture_sizes has best performance, but also come
with larger memory occuption.
If oom occurred during update policy, `cudagraph_capture_sizes ` option
can help to reduce memory.

<img width="1043" height="318" alt="截屏2025-08-07 下午12 03 02"
src="https://github.com/user-attachments/assets/2892a67c-6ba9-448c-ae42-2833f010ff06"
/>

Additional memory\latency\batch size testing data from NV:


![20250807-120513](https://github.com/user-attachments/assets/74b2382a-7f17-42da-ab12-922e29cfa3e2)


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-08 21:00:13 +08:00
6bd44e6313 [megatron] feat: Allow override optimizer config (#2959)
### What does this PR do?

Megatron allow override optimizer config to enable features like cpu
adam. But specific feature enable needs debug and implementation.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-08 20:59:00 +08:00
d527d91d12 [sglang] fix: Fix No command 'hf' found for dapo multi-turn as alternative baseline (#2973)
### What does this PR do?

> Fix No command 'hf' found for dapo multi-turn as alternative baseline 


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-08 15:30:06 +08:00
083da9ab13 [misc] fix: fix DataProto __getstate__ bug (#2962) 2025-08-08 08:24:31 +08:00
ae285703a8 [doc] fix: fix typo in docs/preparation/prepare_data.rst (#2957)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

I fixed the typo from RewardModule to RewardModel in
docs/preparation/prepare_data.rst

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-07 21:16:26 +08:00
68598bd31d [rollout] fix: Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES (#2967)
### What does this PR do?

Fix local rank binding issue when setting
RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES

### Checklist Before Starting

[done] Search for similar PR(s).

### Design & Code Changes

change verl/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py

### Checklist Before Submitting

[ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
[ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).

---------

Co-authored-by: liaochangyue <liaochangyue@bytedance.com>
2025-08-07 20:58:11 +08:00
7bece3cf59 [ci] fix: limit e2e_one_step_off_policy timeout (#2964)
### What does this PR do?

e2e_one_step_off_policy may encounter network hanging issue, occupy GPUs
over 1h, which normally execute in 2~3 minites.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-07 18:01:37 +08:00
3ebe6717ad [megatron] fix: retain MLA config in mcore config converter (#2933)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

- in the current `check_and_disable_incompatible_configs` function, we
will drop config if it's not an attribute of `TransformerConfig`,
however when using `MLATransformerConfig`, this funcion will drop mla
config like `q_lora_rank`, and cause a lots of problems in the
downstream pipeline
- this pr refactored `check_and_disable_incompatible_configs` to a
factory function `check_and_construct_configs `, which accecpt a class
type bounded with TransformerConfig, and return a TransformerConfig
instance.

@ETOgaosion 

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-08-07 12:35:18 +08:00
6f559540e7 [sglang] feat: add dapo multi-turn as alternative baseline (#2952)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

as named

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-08-06 18:27:07 -07:00
05eb0c7a6d [tool] feat: handle cases when func calling without params (#2936)
### What does this PR do?

This PR enhances tool calling without params.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

**input tool config:**
```yaml
tools:
  - class_name: verl.tools.my_tool.RandomNumTool
    config:
      type: native
    tool_schema:
      type: function
      function:
        name: random
        description: Generate a random number.
```

---

**parsed tool:**
```bash
<tools>
{"type": "function", "function": {"name": "random", "description": "Generate a random number.", "parameters": OpenAIFunctionParametersSchema(type='object', properties={}, required=[]), "strict": false}}
</tools>
```


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-06 21:29:15 +08:00
aebc51a235 [megatron] chore: update example 671B script, no offline dist-ckpt needed any more (#2945)
### What does this PR do?

update example 671B script, no offline dist-ckpt needed any more
2025-08-06 21:07:01 +08:00
8e1fc242d3 [fsdp] fix: call reshard() to resolve no shard attribute (#2941)
### What does this PR do?

we should call .reshard() otherwise it throws error of undefined
attribute "shard". this is a typo from a recent PR
https://github.com/volcengine/verl/pull/2843


> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-06 16:26:37 +08:00
c344e9eb2c [megatron] feat: support for pipeline layout with vpp in mcore 0.13.0 (#2749)
### What does this PR do?

Add support for pipeline layout with vpp in mcore 0.13.0.

Breaking change for user in 0.12.0.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: donpromax <donlv1997@163.com>
Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-08-06 15:58:44 +08:00
d37674c8ae [misc] refactor: deprecate sharding manager (part 1) (#2912)
### What does this PR do?

- Since we introduce register device_mesh inside the worker, there is no
need to use sharding manager any longer. We will remove the usage of
sharding manager gradually in the main branch.
- This PR removes the sharding manager usage inside fsdp_workers

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-06 11:05:11 +08:00
796871d7d0 [sglang] fix: remove unnecessary maybe_set_triton_cache_manager (#2926)
### What does this PR do?
remove unnecessary maybe_set_triton_cache_manager


> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-05 14:17:11 -07:00
02f4386ae8 [megatron] fix: qwen2vl megatron fused forward param bug (#2595)
### What does this PR do?

fix: qwen2vl megatron fused forward param bug.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-08-05 16:13:46 +08:00
8fd671638c [rollout, sglang] fix: fix encoding logic bug (#2901)
### What does this PR do?

Fix the `input_ids` encoding logic bug. This bug appears when we have
tool to init and tool init return some text or images. The output will
be `<im_start>assistant\ntool\nXX` but we want just
`<im_start>tool\nXX`.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Run any multiturn code and no this error
```python
logger.warning(
    f"Inconsistent training and inference tokenization detected{mode_str}. This may lead to "
    f"unexpected behavior during training. Please review your chat template to determine if this "
    f"is intentional. For more information, refer to the multiturn README.md."
)
```
### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-05 15:53:29 +08:00
d0ecc3fad5 [megatron] refactor: simplify module init in megatron_workers, extract common operations (#2400)
### What does this PR do?

[megatron] refactor: simplify module init in megatron_workers, absorb
common operations

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-08-05 15:33:19 +08:00
c0f99f3da2 [BREAKING] [ray, megatron] feat: remove RayMegatronWorker (#2895)
### What does this PR do?

- Following https://github.com/volcengine/verl/pull/2893, we can now
directly register dispatch and collect function inside the worker. So,
there is no need to maintain RayMegatronWorker and
RayMegatronWorkerGroup, which is a hacking solution

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-05 11:05:38 +08:00
e74fade589 [doc] fix: Specify rollout engine in quickstart.rst (#2905) 2025-08-04 17:56:35 -07:00
3e2bceb1af [ray] feat: support directly register dispatch device mesh (#2893)
### What does this PR do?

A better solution than https://github.com/volcengine/verl/pull/1260

The current dispatch methods are quite limited:

In hybrid engine, we would like to dispatch infer_tp x infer_dp for
generation and train_tp x train_dp x train_pp for training. However,
currently implementation can only dispatch train_tp x train_dp x
train_pp for training and dp for generation and perform allgather inside
the workergroup.
When two megatron models colocate, their device mesh has to be
identical.
We have to subclass RayWorkerGroup in order to implement various
distributed strategies. This makes create_colocated_worker hacky to
implement in the future.

The difference in this implementation:
- We register directly inside the worker of the dp_rank and whether the
output of this rank will be collected.
- By doing so, we can 1) completely remote MegatronWorker and the
necessity to subclass RayWorkerGroup in the future to implement flexible
dispatch methods. 2) remove all other dispatch/collect methods

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-04 19:57:42 +08:00
13731e553b [rollout] feat: add rollout_skip to skip rollout by reusing previously generated sequences (#2602)
### Adds rollout_skip to skip rollout 

Adds rollout_skip to skip rollout by reusing previously generated
sequences from a specified dump directory.

Two parameters need to be configured:

`actor_rollout_ref.rollout.skip_rollout=True`
Enables the rollout skipping functionality

`actor_rollout_ref.rollout.skip_use_default_dir="/tmp/rollout_dump"`
Sets the dump directory path for storing rollout results

#### Behavior:

On first run: The system will generate and dump the rollout inference
results to the specified directory
On subsequent runs: The system will automatically check for and load
results from this directory, skipping the rollout computation
> Note: The directory path should be persistent across runs to maintain
the caching benefit
> If either of these parameters changes between runs:
> - actor_rollout_ref.rollout.n 
> - data.gen_batch_size
> 
> The trainner will:
> 1. Ignore previously dumped data
> 2. Regenerate new rollout sequences 
> 3. Create a new dump folder with the naming pattern:
`InferGBS{gen_gbs}__N{n}`
> (where {gen_gbs} is the current gen_batch_size and {n} is the current
rollout.n value)


This feature is particularly valuable for:
- Development iterations with same parameters
- Debugging sessions


#### Result example: 

worked with NPU
<img width="899" height="681" alt="image"
src="https://github.com/user-attachments/assets/f6253e36-14b8-47ab-9817-ce6c42b3168d"
/>

worked with GPU

<img width="1809" height="950" alt="image"
src="https://github.com/user-attachments/assets/0920e50e-e415-40cb-80b5-2e148015a8e4"
/>


### Checklist Before Starting

- [x] Search for similar PRs. 
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

This is an example of how to patch rollout_skip in `RayPPOTrainer`.
> Both `RayDAPOTrainer()` (in `verl/recipe/dapo/dapo_ray_trainer.py`)
and `RayPPOTrainer()`(in `verl/trainer/ppo/ray_trainer.py`) have already
been adapted.

```python
from verl.utils.rollout_skip import RolloutSkip

...
class RayPPOTrainer:
    ...
    def fit(self):
        ...

        # Add code as follow:
        rollout_skip = RolloutSkip(self.config, self.actor_rollout_wg)
        rollout_skip.wrap_generate_sequences()

        ...

        for epoch in range(self.config.trainer.total_epochs):
            for batch_dict in self.train_dataloader:
                ...
```

To enable this PR's functionality, simply add these two parameters to
your launch script:

```bash
    actor_rollout_ref.rollout.skip_rollout=True \
    actor_rollout_ref.rollout.skip_dump_dir="/tmp/rollout_dump" \
```

1. `actor_rollout_ref.rollout.skip_rollout=True`
   - Enables the rollout skipping functionality

2. `actor_rollout_ref.rollout.skip_use_default_dir="/tmp/rollout_dump"`
   - Sets the dump directory path for storing rollout results

### Design & Code Changes

Not related.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-04 19:42:16 +08:00
b3e999fa74 [FSDP] feat: Allows specifying a different reference model (#2050)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Extend FSDP worker to allow end users to specify separate
actor/reference model.

There were two similar issues/asks
https://github.com/volcengine/verl/issues/699
https://github.com/volcengine/verl/issues/744

Wanted to get some initial feedback if this is on the right track.
Completely fine if someone from verl core team can take up this task in
separate PR to speed up development cycle.

### Usage Example

Default config.yaml will have model = null under reference model section
to preserve the original behavior of using actor model as reference
model. End users can change path to a different model if they wish to
use a separate reference model.

```yaml
# default behavior, same as before actor = ref model
ref:
   model: null

# new, specify a different ref model
ref:
   model:
        path: "model path"
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-08-04 16:04:02 +08:00
551d4cc56d [misc] feat: support logging rollout prob vs. actor probs in **multi-turn** for debugging purpose, follow up of #1712 (#2808)
### What does this PR do?

This PR is a follow-up to https://github.com/volcengine/verl/pull/1712.
- adds support for recording rollout log-probs in multi-turn
conversations
- moves the diff-computation code into a separate file.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-04 11:54:46 +08:00
4e9d2878ee [fsdp, trainer] fix: save config parameters to wandb in SFT (#2884) 2025-08-03 20:16:05 -07:00
0da8eb67e6 [ci] fix: retry type check on cpu (#2887) 2025-08-03 20:12:00 -07:00
483cd55c76 [trainer] chore: Add ground truth data to generation dumps in RayPPOTrainer (#2353) 2025-08-03 07:39:18 -07:00
6017c9e2fc [tool, sglang] feat: add tool create info (#2870) 2025-08-03 07:38:23 -07:00
65c74dda9b [doc] fix: multi turn argument is not available (#2883)
### What does this PR do?

Removed a deprecated parameter comment

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Co-authored-by: kangsheng <kangsheng.ks@bytedance.com>
2025-08-03 18:31:16 +08:00
06bc679a57 [sglang] chore: bump transformer formers 4.54.0 and fix QWen VL issues (#2869)
### What does this PR do?

Do not merge before: https://github.com/volcengine/verl/pull/2720


> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
Bump xformers, fixing patched model code
### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-03 18:30:51 +08:00
d31d1bebd8 [trainer] fix: move UID generation before batch processing for future conditioning support (#2880)
### What does this PR do?

Moves UID generation to the beginning of batch processing, before any
`pop` operations or generation steps. This change:

1. **Fixes timing for future conditioning**: Enables UID generation to
condition on original batch data (e.g., prompt content) before any data
is removed via `pop` operations

The change is backward-compatible and doesn't affect current
functionality, but enables future enhancements where UID generation
might need access to the complete original batch data.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-03 16:04:19 +08:00
1836d9537e [misc] feat: add nccl timeout configuration to fsdp workers (#2321)
### What does this PR do?

add nccl timeout config for fsdp backend

---------

Signed-off-by: shinytang6 <shinytang6@gmail.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-08-02 21:33:05 -07:00
3f71144961 [trainer, ci] fix: fix error variable in new engine impl and add ci test, fix math_dataset path error (#2647)
### What does this PR do?

PR #1977 is a great job, I tried using the new engine and found some
minor problems and add ci test for FSDPEngine.
- Use newest name `gather_outputs_and_unpad` for the function
`gather_outputs_and_unpad`.
- Removed invalid calculations originally used for gradient accumulation
(gradient accumulation has been moved to loss_fn in new engine).
- Fixed misuses of two variable.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: eric-haibin-lin <linhaibin.eric@gmail.com>
2025-08-02 21:32:33 -07:00
2fdfbdcba6 [doc] fix: Fix the role assignment error in the interaction demo file and doc. (#2476)
### What does this PR do?

Fix the role assignment error in the interaction demo file
verl/interactions/gsm8k_interaction.py and doc. The assistant is
expected to solve problems, while users provide problems and feedback
within the messages list.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Update tests/interactions/test_gsm8k_interaction.py.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-08-02 17:04:15 -07:00
a24241092d [misc] refactor: Add AbstractRewardManager abstract class (#2763)
### What does this PR do?

Adds a new `AbstractRewardManager` class to codify the interface for a
reward manager.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-02 16:39:58 -07:00
ae3506dd33 [data] feat: dump train/test example as JSON (#2666)
### What does this PR do?

This PR adds functionality to save one training and one testing example
as JSON files for reference, making it easier to inspect dataset
formatting and preprocessing.
Related to potential future debugging and reproducibility improvements.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Manually verified that two files train_example.json and
test_example.json are saved correctly in the specified local_dir.

### API and Usage Example

This change does not alter the public API.

### Design & Code Changes

- Added code to save train_dataset[0] and test_dataset[0] as JSON files
in local_dir

- Helps with quick inspection and reproducibility of dataset inputs

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: easy code
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-08-02 11:09:56 -07:00
afc9de1eba [trainer, hardware] chore: add pin_memory_device when pin_memory is enabled (#2871)
### What does this PR do?

To use pin_cemory, we need to set pin_memory_name="npu".

About pin_memory, see:
[torchdata/stateful_dataloader/stateful_dataloader.py](https://github.com/pytorch/data/blob/main/torchdata/stateful_dataloader/stateful_dataloader.py)

```
if self._pin_memory:
    self._pin_memory_thread_done_event = threading.Event()

    # Queue is not type-annotated
    self._data_queue = queue.Queue()  # type: ignore[var-annotated]
    if self._pin_memory_device == "xpu":
        current_device = torch.xpu.current_device()  # type: ignore[attr-defined]
    elif self._pin_memory_device == torch._C._get_privateuse1_backend_name():
        custom_device_mod = getattr(torch, torch._C._get_privateuse1_backend_name())
        current_device = custom_device_mod.current_device()
    else:
        current_device = torch.cuda.current_device()  # choose cuda for default
    pin_memory_thread = threading.Thread(
        target=_utils.pin_memory._pin_memory_loop,
        args=(
            self._worker_result_queue,
            self._data_queue,
            current_device,
            self._pin_memory_thread_done_event,
            self._pin_memory_device,
        ),
    )
    pin_memory_thread.daemon = True
    pin_memory_thread.start()
    # Similar to workers (see comment above), we only register
    # pin_memory_thread once it is started.
    self._pin_memory_thread = pin_memory_thread
else:
    self._data_queue = self._worker_result_queue  # type: ignore[assignment]

```

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-02 10:34:41 -07:00
67187c4fd4 [ci] fix: fix fsdp test in transformers 4.54.1 (#2874)
### What does this PR do?

When we upgrade to transformers 4.54.1, the fsdp checkpoint manager test
breaks, and here are some observations:
- If we switch the "attn_implementation" to "eager" or "sdpa",
everything works fine. So, it suggests that the issue lies within the
flash_attention_2 backend of transformers.
- Previously, this test passes in input_ids and attention_mask. However,
workers in verl passes in input_ids and position_ids to utilize rmpad.
After we switch the input to `input_ids` and `position_ids`, all the
tests passed.
- If we do not call loss.backward, everything works fine
- So, FSDP works fine, checkpoint manager works fine. The problem must
lie in how transformers handles different type of input combinations in
flash_attention_2 backend.
- In this PR, we modify the input to pass the test.

TODO: write a test to replicate the issues when passing in input_ids and
attention_mask to transformers library

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-02 17:11:54 +08:00
53f9b2ba5e [fsdp,megatron,sglang] feat: Accelerate and Simplify Update weights logic and bump SGLang to 0.4.9.post6 (#2720) 2025-08-02 08:01:06 +08:00
0da1a3de06 [megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts (#2864)
### What does this PR do?

They were removed in #2651, but #2691 overlooked some of them.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] (CI is not needed for this change) Once your PR is ready for CI,
send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-08-01 20:51:33 +08:00
f0fbd67a5d [recipe] feat: modify dapo deepseek megatron script (#2711)
### What does this PR do?

As title.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-01 17:52:19 +08:00
e68dcb7884 [fsdp] feat: optimize fsdp2 (#2843)
### What does this PR do?

- fix fsdp2 load/offload
- optimized fsdp2's sharding placement

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-01 17:48:43 +08:00
a970718ea5 [misc] feat: optimize GRPO-family algorithms with torch.stack and improve tensor creation consistency (#2827)
### What does this PR do?

## 🚀 Performance Optimization for GRPO Algorithms

This PR delivers significant performance improvements to GRPO and
related advantage estimation algorithms through
  comprehensive tensor operation optimizations.

  ### 📈 Performance Gains
  - **GPU**: **6.5x speedup** by fixing device placement issues
  - **CPU**: **40% faster** tensor creation operations
  - **Memory**: Reduced redundant allocations in training loops

  ### 🔧 Key Optimizations

  #### 1. Device-Aware Tensor Creation
- **Replace** `torch.tensor()` with `torch.stack()` for scalar tensor
lists
- **Fixes** device placement bugs where `torch.tensor()` forces CPU
placement
- **Preserves** GPU tensors on GPU, eliminating costly CPU-GPU transfers

  #### 2. Eliminate Redundant Operations
- **Remove** duplicate tensor creation in statistical computation loops
- **Optimize** tensor reuse for both mean and standard deviation
calculations
  - **Standardize** tensor creation patterns across all algorithms

  ### 🎯 Functions Optimized
  - `compute_grpo_outcome_advantage` - core GRPO algorithm
- `compute_reinforce_plus_plus_baseline_outcome_advantage` - RF++
baseline
  - `compute_rloo_outcome_advantage` - RLOO algorithm
  - `compute_opo_outcome_advantage` - OPO algorithm
  - `compute_gpg_outcome_advantage` - GPG algorithm

  ### 🔍 Technical Details

**Root Cause**: `torch.tensor(list_of_tensors)` always creates result on
CPU
**Solution**: `torch.stack(list_of_tensors)` preserves input tensor
device

  **Before**:
  ```python
scores_tensor = torch.tensor(id2score[idx]) # Forces CPU, created twice
  id2mean[idx] = torch.mean(scores_tensor)
  scores_tensor = torch.tensor(id2score[idx])  # Redundant creation
  id2std[idx] = torch.std(scores_tensor)

  After:
scores_tensor = torch.stack(id2score[idx]) # Preserves device, created
once
  id2mean[idx] = torch.mean(scores_tensor)
  id2std[idx] = torch.std(scores_tensor)      # Reuses same tensor
```

   Safety & Compatibility

  -  Zero functional changes: Maintains identical mathematical results
  -  Fully backward compatible: No API modifications
  -  Extensively tested: CPU/GPU validation with various configurations
  -  Production ready: All tests pass with identical numerical outputs

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
  - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`
  - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
  - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s) if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the specific changes.

- Fixed inconsistent tensor creation: Changed torch.std(torch.tensor([id2score[idx]])) to
  torch.std(torch.tensor(id2score[idx])) to match the pattern used in mean calculation on the same function
- Applied to both instances: Fixed lines 313 and 675 in verl/trainer/ppo/core_algos.py
- Added comprehensive test coverage: Created tests/trainer/ppo/test_grpo_consistency.py with multiple test scenarios


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
- [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: chiliu <chiliu@paypal.com>
2025-08-01 12:00:24 +08:00
e2b773528f [megatron] feat: Add MindSpeed support on the NPU device (#2707)
### What does this PR do?

Add MindSpeed(Megatron) support on the NPU device. 
First, import the Megatron adapter to avoid import errors, and reapply
the patch according to the configuration.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-01 10:58:29 +08:00
2633140a73 [doc] feat: add verl multinode SkyPilot example (#2849) 2025-07-31 12:48:24 -07:00
c70b7470c1 [recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU (#2836)
### What does this PR do?

>Provide qwen3-8B/14B DAPO training script on ASCEND NPU, and update
experiment result.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[[hardware] feat: support qwen2_5_vl on ASCEND
NPU](https://github.com/volcengine/verl/pull/1924/)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
#### Qwen3-8B-Base model

##### Throughput Comparison
<img width="1058" height="508" alt="image"
src="https://github.com/user-attachments/assets/dd818187-bce2-4b9f-a442-b29a7acedd55"
/>

##### Rewards Comparison
<img width="1048" height="518" alt="image"
src="https://github.com/user-attachments/assets/66d00cc7-efb6-4426-932a-cd63a69474dc"
/>

##### Test Comparison (aime-2024)
<img width="1060" height="506" alt="image"
src="https://github.com/user-attachments/assets/000cebf3-1d5b-402b-b1e6-2cfa5ee7a3ad"
/>

##### Response_length Comparison
<img width="1280" height="608" alt="image"
src="https://github.com/user-attachments/assets/4fe77406-a43b-4d3b-bf13-7a6417887831"
/>

#### Qwen3-14B-Base model

##### Throughput Comparison
<img width="1130" height="614" alt="image"
src="https://github.com/user-attachments/assets/5d03b334-b9c9-485d-ba84-23e628d2f573"
/>

##### Rewards Comparison
<img width="1114" height="534" alt="image"
src="https://github.com/user-attachments/assets/aba90536-eb66-430b-83b6-c4e86a90e917"
/>

##### Test Comparison (aime-2024)
<img width="1126" height="538" alt="image"
src="https://github.com/user-attachments/assets/44c59e5b-9f77-48fc-8bce-9d431f5f3e87"
/>

##### Response_length Comparison
<img width="1280" height="692" alt="image"
src="https://github.com/user-attachments/assets/c008a419-9a1e-4b59-81e1-23b5b3d97660"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```bash
ray start --head
bash run_dapo_qwen3_8b_base_npu.sh
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-01 00:21:16 +08:00
6e37279f8e [training_utils] feat: Support assert_case for sandbox fusion (#2374)
Support `assert_case` for sandbox fusion

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-31 18:58:20 +08:00
5f8cd1bf38 [CI] feat: update npu image to vLLM-ascend-v0.7.3.post1+mindspeed0.12.1 (#2838)
### Checklist Before Starting

[done] Search for similar PR(s).

### Design & Code Changes

Change .github/workflows/e2e_ascend.yml

### Checklist Before Submitting

[ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
[ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).

---------

Co-authored-by: liaochangyue <liaochangyue@bytedance.com>
2025-07-31 18:43:22 +08:00
f5bc3cac78 [rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset (#2740)
… in dataset

### What does this PR do?

> tool_agent_loop did not pass in the call tool's' creat_kwargs',
resulting in a missing ground_truth

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)


### issue
In the previous implementation, the parameters for tool calls in the
dataset were not passed in, resulting in the absence of ground_truth in
the gsm8k task. Like:

<img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08"
src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced"
/>

On this basis, passing tool_kwargs can solve this problem.
```python
    async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]:
        """Call tool and return tool response."""
        tool, instance_id = None, None
        try:
            # TODO: append malformed tool_call to the prompt: invalid function name or arguments
            tool_name = tool_call.name
            tool_args = json.loads(tool_call.arguments)
            tool = self.tools[tool_name]
            kwargs = tools_kwargs.get(tool_name, {})
            instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {}))
            tool_response, _, _ = await tool.execute(instance_id, tool_args)
```

So the `ground_truth` can be used in Tool:

<img width="1984" height="188" alt="image"
src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552"
/>


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-31 15:27:03 +08:00
a6002de8ac [tool] fix: load MCP tools in async rollout mode (#2821)
### What does this PR do?

Currently, the tool registry isn't aware of an event loop already
existing, so it may fail when using the new async rollout architecture.
This PR allows `initialize_tools_from_config` to load MCP tools when
using the async architecture by spawning a new, temporary event loop in
a separate thread to load from config.

There is also a minor bugfix to `mcp_base_tool` which fixes a
possibility of concatenating a string to None.

NOTE: in the future, we should use async methods entirely, since this
fix is not the most elegant. This fix works for now as verl is
transitioning to a full async architecture.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-31 14:22:32 +08:00
0e14d812da [ci] fix: vllm no dataset (#2831)
### What does this PR do?

ci fix: vllm no dataset

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-31 13:56:01 +08:00
4a651f5425 [perf, doc] feat: Add profiling continous steps in one database (#2695)
### What does this PR do?

Some customers would like to observe continuous steps in one database,
so the gap between steps can be eliminated. The feature will dump the
continuous steps in `profile_steps` into one database controlled by a
new config, `trainer.profile_continous_steps`. For example [1, 2, 5], 1
and 2 will be in one database, 5 will be in another.

Also add warning when nvtx is not available in cuda platform.


### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-31 12:26:10 +08:00
1fe72ba510 [sglang] fix: fix missing engine_kwargs (#2823)
### What does this PR do?

- fix missing engine_kwargs that causes CI on main to fail

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-07-31 12:23:51 +08:00
f32e54deaa [docker] feat: Upgrade sglang 0.4.9 + transformers 4.53.2 (#2794)
### What does this PR do?

feat: Upgrade sglang 0.4.9 + transformers 4.53.2

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-31 00:49:27 +08:00
a479fc81b9 [rollout] feat: pass all dataset fields to agent loop run (#2810)
### What does this PR do?

Pass all dataset fields from `RLHFDataset` to agent loop run, including:
- raw_prompt
- tools_kwargs
- multi_modal_data
- ...
2025-07-31 00:34:44 +08:00
cc1d89b7ad [sglang] fix: support the configuration of attention_backend in sglang (#2818)
### What does this PR do?

This resolves issue https://github.com/volcengine/verl/issues/2769.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-30 20:46:23 +08:00
b75b1f0bf1 [algo] feat: add GSPO-token policy loss computation function (#2775)
### What does this PR do?

This PR implements the GSPO-token policy loss calculation proposed by
paper https://arxiv.org/pdf/2507.18071

### Test

<img width="1341" height="637" alt="image"
src="https://github.com/user-attachments/assets/bc5e2245-b0f5-4a1f-aa7c-4c2b28d95142"
/>

Compared GRPO and GSPO under the same settings. GRPO uses the following
script:
``` sh
python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=512 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.actor.policy_loss.loss_mode="vanilla" \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=10 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='verl_gspo_cmp' \
    trainer.experiment_name='qwen2.5-3B-GRPO' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@
```

GSPO uses the following script:
```sh
python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=512 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.actor.policy_loss.loss_mode="gspo" \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=10 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='verl_gspo_cmp' \
    trainer.experiment_name='qwen2.5-3B-GRPO' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

```

### API and Usage Example

To use GSPO, users only need to set
`actor_rollout_ref.actor.policy_loss.loss_mode` to `gspo`.

```shell
python3 -m verl.trainer.main_ppo \
  ... \
  actor_rollout_ref.actor.policy_loss.loss_mode="gspo" \
  ...
```


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: BounharAbdelaziz <bounhar.abdelaziz@gmail.com>
2025-07-30 18:47:09 +08:00
d04c69f47f Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" (#2813)
Reverts volcengine/verl#2739

For https://github.com/volcengine/verl/pull/2794 to solve all CI faults.
2025-07-30 16:56:37 +08:00
bf89f612e8 [vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue (#2782)
### What does this PR do?

> Handle the use case of verl + vllm + vllm-ascend(v0.9.1), detail
information see #2564
vllm-ascend v0.9.1 is the next upcoming commercial release branch, with
the previous commercial release branch is vllm-ascend v0.7.3.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> Test on ascend npu: GRPO, FSDP backend, Qwen2.5-0.5B mode.

### API and Usage Example

> No changes

### Design & Code Changes

#### vllm+vllm-ascend(v0.9.1) normal use case:
In vllm 0.7.3 uses pytorch2.5.1, and the type hint for infer_schema of
mode is List[int]. vllm 0.9.1 uses pytorch 2.7, and to keep consistence
with pytorch2.7, and vllm changed hint type to list[int] to infer_schema
of mode. Type hint List[int] and list[int] is not compatible.
<img width="754" height="434" alt="image"
src="https://github.com/user-attachments/assets/40a40e4f-6092-4d89-baff-95c88437a13b"
/>
vllm-ascend version 0.9.1 needs to be used in conjunction with vllm
0.9.1. But vllm-ascend 0.9.1 max supportted pytorch version is 2.5.1. As
pytorch 2.5.1 using List[int] as type hint, vllm-ascend needs add patch
to infer_schema of mode to successfully running in pytorch 2.5.1
environment. The patch workflow as following graph display.
As vllm hardware limits, vllm-ascend currently patch type hint list[int]
during vllm LLM instance creating process. This is okay for most
vllm-ascend applications.

#### vllm+vllm-ascend(v0.9.1) currently workflow in verl:
For verl+vllm-ascend+pytorch2.5.1 there has a problem, as following:
verl currently import vllm modes before create vllm LLM instance
operation. So error take place: pass list[int] to infer_schema which
needs List[int], and then running failed.
<img width="954" height="660" alt="image"
src="https://github.com/user-attachments/assets/844631b5-1d60-4412-9feb-4324d80d415d"
/>

#### workflow in this PR in verl to handle hint type mismatch issue:
Actully patch_vllm_moe_model_weight_loader is after the operation of LLM
create. "import vllm modes" action just been prematurely executed during
_build_rollout processing, just file becasue vllm_utils.py in which
there has another functions that needs by _build_rollout, by the way
those functions been imported, vllm mode been imported. Lets
vllm-hardware plugins makes it's patchs take effect, then this problem
fixed: move patch_vllm_moe_model_weight_loader funciton to independent
file, and import patch_vllm_moe_model_weight_loader just before wight
loader patching operation.

![image](https://github.com/user-attachments/assets/70a7527e-2671-4435-9d96-ca8595b534c7)


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-07-30 13:10:29 +08:00
23aa10533f [training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn (#2741)
### What does this PR do?

This PR updates the `collate_fn` logic inside
`verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields
as 1D object arrays, preventing runtime errors during concatenation in
downstream code such as `recipe/dapo/dapo_ray_trainer.py`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1
* Note: This branch is for testing purposes only and is not intended for
merge.

* The data used for testing comes from the `train.parquet` and
`test.parquet` files released by the [Tool N1
repository](https://github.com/NVlabs/Tool-N1).
* part of training script
```python
python3 -m recipe.dapo.main_dapo \
        data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \
        data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \
        data.prompt_key=prompt \
        data.truncation='left' \
        data.max_prompt_length=2048 \
        data.max_response_length=4096 \
        data.gen_batch_size=32 \
        data.train_batch_size=24 \
        actor_rollout_ref.rollout.n=5 \
        algorithm.adv_estimator=grpo \
        algorithm.filter_groups.enable=True \
        algorithm.filter_groups.max_num_gen_batches=10 \
        actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
        ...
```

### Before vs After Behavior (Real Output Logs)
* Before: Inconsistent Shape
```
(TaskRunner pid=114826) Training from scratch
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=1. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=2. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1)
(TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24
(TaskRunner pid=114826) num_gen_batches=3. Keep generating...
(TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
```
This caused shape inconsistency across steps, leading to downstream
errors during concatenation.

* After: Consistent (32,) Shape

```
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=1. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=2. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=3. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=4. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=5. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
(TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24
(TaskRunner pid=133725) num_gen_batches=6. Keep generating...
(TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,)
```
With the updated logic, the shape is consistently (32,).

* The issue was traced back to the `"conversations"` field in the Tool
N1 dataset. This key contains a list of human–gpt messages. In most
examples, it's a single-turn conversation (list with length 1), but in
some cases, it's a multi-turn conversation (list with length > 1).

### Design & Code Changes

The current `collate_fn` processes non-tensor values with:


1df03f3abf/verl/utils/dataset/rl_dataset.py (L62-L63)

While this generally works, it leads to a subtle issue:
If `val` is a list of lists and all inner lists happen to be of the same
length, NumPy will interpret it as a 2D array with shape (N, L).
However, in many RL scenarios, the structure of non-tensor data (e.g.
variable-length lists across batches) is not guaranteed to be uniform,
which means:

- One batch may produce shape `(N, L)`
- Another may produce `(N,)` where each element is a list of different
lengths
- Another may have shape `(N, L')`

This causes downstream errors like:
`ValueError: all the input arrays must have same number of dimensions,
but the array at index 0 has 2 dimension(s) and the array at index 1 has
1 dimension(s)`

Specifically, this occurs when multiple step-wise batches are
concatenated with:


1df03f3abf/recipe/dapo/dapo_ray_trainer.py (L240)

To enforce consistent 1D object arrays regardless of content, this PR
replaces the original line with:

```python
for key, val in non_tensors.items():
    non_tensors[key] = np.empty(len(val), dtype=object)
    non_tensors[key][:] = val
```
This ensures that`non_tensors[key]` always has shape (N,) which makes
concatenation in downstream logic safer.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-30 13:07:47 +08:00
2cccd7f09d [vllm,rollout] fix: vllm rollout lock file permission (#2805)
### What does this PR do?

This is an naive solution for [issue
2781](https://github.com/volcengine/verl/issues/2781). While it is not
an elegant implementation, it works fine for me.
> Why use `getpass.getuser()` instead of `os.getlogin()`? 
> The latter causes errors while running by ray actor/task. [Here is an
related issue](https://github.com/python/cpython/issues/84998).


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: qinghan <qinghan@dewu.com>
2025-07-30 13:02:53 +08:00
c3df0b5eb8 [perf] feat: Padding before batch post-process in agent-loop to save time (#2773)
### What does this PR do?

From issue here:
https://github.com/volcengine/verl/issues/2677

Try to pad the `prompt`, `response` & `mask` before batch
post-processing to save time
Main idea:
<img width="1978" height="916" alt="image"
src="https://github.com/user-attachments/assets/bf16d45b-9da8-4d07-aab4-d8773e5ab705"
/>

```python
# prompt_ids: left padded with zeros (e.g., [0,0,0,0,1,2,3,4])
# response_ids: right padded with zeros (e.g., [5,6,7,8,0,0,0,0])
# input_ids: concatenation of prompt + response
# Mask:
# For example, if the prompt is [1,2,3,4] and the response is [5,6,7,(tool start)8,9(tool end),10,11,12]
# - prompt_attention_mask: 0s for padding, 1s for tokens
#   e.g., [0,0,0,0,1,1,1,1]
# - response_attention_mask: 0s for padding, 1s for tokens
#   e.g., [1,1,1,1,1,1,1,1,1,1,1,0,0,0,0]
# attention_mask: concatenation of prompt_attention_mask and response_attention_mask
#   e.g., [0,0,0,0,1,1,1,1(prompt),1,1,1,1,1,1,1,1,1,1,1,0,0,0,0(response)]
# - response_mask: 1s for LLM generated tokens, 0 for tool response/padding tokens
#   e.g., [1,1,1,1,1,1,1,(tool start),0,0(tool end),1,1,0,0,0,0]
# - position_ids: sequential positions for tokens, starting at 0
#   e.g., [0,0,0,0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,0,0,0]         
```

### Test

Environment setup: follow this
[tutorial](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/agent_loop.md)
Test config in 4 * H100
```bash
#!/bin/bash
# run on 8xH100 with optimizations for stability
# make sure your current working directory is the root of the project

set -x

ulimit -n 65535

# 增加网络稳定性环境变量
export CUDA_HOME=/usr/local/cuda
export CUDA_VISIBLE_DEVICES=4,5,6,7

PROJECT_DIR="$(pwd)"
CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config"

python3 -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo' \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=256 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.rollout.n=16 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=32 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='gsm8k_async_rl' \
    trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16-agent-loop-v1' \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=20 \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
    trainer.total_epochs=15 \
    actor_rollout_ref.rollout.update_weights_bucket_megabytes=128 \
    actor_rollout_ref.rollout.trace.backend=weave \
    actor_rollout_ref.rollout.trace.token2text=True \
    actor_rollout_ref.rollout.mode=async \
    actor_rollout_ref.rollout.multi_turn.enable=true
```
Before(v1) & After(v2)

<img width="831" height="632" alt="image"
src="https://github.com/user-attachments/assets/033737e2-1b63-4b25-8b26-ab593db28a90"
/>

<img width="1674" height="1272" alt="image"
src="https://github.com/user-attachments/assets/296fbb37-430f-4f45-84c1-e003930a1896"
/>

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
2025-07-30 12:27:37 +08:00
4857997201 [tool] fix: Typo fix -- Rename to_openai_function_tool_schema to get_openai_tool_schema (#2806)
### What does this PR do?

Fixes a typo in the docstring of some tools.
`to_openai_function_tool_schema()` does not exist.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+to_openai_function_tool_schema
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

N/A

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-30 09:34:08 +08:00
977b7d9ae8 [recipe] feat: @register_policy_loss("geo_mean"); Geometric-Mean Policy Optimization (#2795)
### What does this PR do?

> This is the official implementaion of paper [***Geometric-Mean Policy
Optimization***](https://arxiv.org/abs/2507.20673).

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> The code has trained for 100 iterations, and is still running.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

A new policy loss function has been added into
"verl/trainer/ppo/core_algos.py"
```python
@register_policy_loss("geo_mean")
def compute_policy_loss_geo_mean(
    old_log_prob: torch.Tensor,
    log_prob: torch.Tensor,
    advantages: torch.Tensor,
    response_mask: torch.Tensor,
    loss_agg_mode: str = "token-mean",
    config: Optional[DictConfig | AlgoConfig] = None,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch. Tensor]:
    ...
```

We also added directory "examples/gmpo_trainer" for quick start.

### Design & Code Changes

> see above

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-29 22:17:57 +08:00
aec8cf40ce [recipe] feat: add QWen2.5-7b-instruct retool (#2800)
### What does this PR do?

- As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-29 17:50:31 +08:00
76298addd0 [recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process (#2739)
### What does this PR do?

Add sleep/wakeup mode for gen rm vllm service and add tqdm showing
process.

This capability is particularly beneficial when the model server shares
resources with a training workload on the same machine. It allows the
reward model service to be temporarily offloaded (to free up GPU memory)
during intensive training sessions and reloaded when the service is
required again.
2025-07-29 13:10:20 +08:00
d640f99219 [recipe] fix: fix issue when running split ppo (#2745) 2025-07-29 07:32:59 +08:00
d255783a0a [docker] feat: upgrade vllm to 0.9.1 (#2747) 2025-07-29 07:32:04 +08:00
H
f98ee1c697 [cfg] fix: fix failing rollout config test on main (#2771)
### What does this PR do?

The cpu unit test is broken when
https://github.com/volcengine/verl/pull/2757/files is merged.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

---------

Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-07-28 16:43:56 +08:00
35dc0e6490 [doc] fix: fix typo in agentic RL documentation (#2777)
### What does this PR do?
Fix a typo in agentic RL documentation.

* current 
`bash examples/data_preprocess/gsm8k_tool_agent_loop.py`

* fixed
`python examples/data_preprocess/gsm8k_tool_agent_loop.py`

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-28 16:20:51 +08:00
c9ccbd5c4b [recipe] fix: fix retool SFT dataset (#2764)
### What does this PR do?

- Fix retool data preprocessing (now tools requires to be a list)
- Use more common path to save dataset

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-28 10:03:28 +08:00
00ac37fe58 [misc] fix: Handle N-D arrays and complex objects in union_numpy_dict (#2768)
### What does this PR do?

This PR fixes a bug in `verl.protocol.union_numpy_dict` where it would
crash on NumPy arrays with more than 2 dimensions. It replaces the
underlying comparison logic with a robust, recursive function that can
handle N-D arrays, nested objects, `NaN` values, and circular
references.

This resolves issue #2766.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

A comprehensive unit test suite has been added to
`tests/test_protocol_on_cpu.py`. The new tests cover the following
scenarios, all of which now pass:
* Merging dictionaries with identical 3D (and higher) dimensional
arrays.
* Correctly failing when N-D arrays with the same shape but different
values are merged.
* Handling nested `object`-dtype arrays containing other arrays,
strings, and `None`.
* Correctly treating `NaN` values at the same position as equal,
mimicking pandas' behavior.
* Safely handling circular references without causing a
`RecursionError`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-27 17:24:43 +08:00
2e1a1a6603 [BREAKING] [rollout] chore: remove default rollout selection (#2757)
### What does this PR do?

As title

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-26 10:11:24 -07:00
ea4442470e [algo] refactor: don't special-case compute_policy_loss (#2701)
### What does this PR do?

currently the vanilla policy loss mode is special cased. this moves
vanilla onto the shared interface and stops speical-casing it.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: Fred <frederrx@amazon.com>
2025-07-26 10:09:42 -07:00
H
0f5ab5c854 [doc] feat: add retool blog (#2761)
### What does this PR do?

add link to the retool blog

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-07-26 13:13:55 +08:00
92e81cfcfd [perf] feat: add optional role selection in discrete mode for NPU Profiler (#2750)
### What does this PR do?

Currently, whether in `end-to-end` mode or `discrete` mode, all roles
are fully collected. As the sequence length continues to increase, the
volume of collected data becomes large, leading to slow parsing.
Therefore, we introduce a new feature in the NPU Profiler that allows
optional role selection in `discrete` mode, enabling quick collection of
specific roles.

We have added a new roles parameter in `npu_profile.yaml` to specify the
roles to be collected. The currently supported options are: `all`,
`rollout_generate`, `actor_compute_log_prob`, `actor_update` and
`ref_compute_log_prob`. Setting roles to `["all"]` means all roles will
be collected. Other options can be freely combined, for example:
`["actor_update", "ref_compute_log_prob"]`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-25 21:53:09 +08:00
f107800837 [rollout] feat: remove chat scheduler (#2725)
### What does this PR do?

Remove chat scheduler as describe in #2618
2025-07-25 21:46:35 +08:00
58d698e04b [trainer] refactor: Make sure to keep the type checking (#2634)
### What does this PR do?

Some codes in the `ppo/ray_trainer.py` fails static type checking (i.e.
`invalid type hints` or `function call with nullable variables`).
This PR fixes these issues to keep the static type checkers of IDE to
track the code syntax properly.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-24 22:32:07 -07:00
caec858ebb [doc] style: change resize handle from gradient to plain color (#2746) 2025-07-24 21:20:07 -07:00
f407887414 [CI] feat: add mypy to pre-commit (#2614) 2025-07-25 11:36:34 +08:00
dc8b5076c3 [megatron] feat: a bunch of optimzation on vram, sequence packing (#2678)
### What does this PR do?

add a bunch of optimizations for megatron training, including:
1. aggressive_empty_cache to avoid OOM on hybrid engine. Before this
sometimes the cache could use as much as 30GB so bring OOMs.
2. better sequence packing pre/post-process. Before this there are a few
times of d2h sync when pre/post-process the sequence packing.
3. make `override_ddp_config` compatible to mbridge.

The optimized implementations have replaced the old ones, no options
needed to enable them.


> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-25 10:34:33 +08:00
4879d619fc [docker] feat: upgrade to torch 2.7, sglang 0.4.8 (#2617)
### What does this PR do?

[docker] feat: upgrade to torch 2.7, sglang 0.4.8

Stage 2: vllm 0.9.1
Stage 3: mcore 0.13.0

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
2025-07-24 14:53:24 -07:00
bcd336fd46 [doc] feat: add resizable sidebar and improve layout (#2577)
## Summary
This PR adds a resizable sidebar feature and improves the documentation
layout for better user experience.

  ## Changes
- **Resizable sidebar**: Users can drag to resize the sidebar, with
preference saved in localStorage
- **Full-width layout**: Documentation now uses full screen width for
better readability
- **Responsive design**: Better layout adaptation for different screen
sizes
- **Navigation improvements**: Attempts to improve table of contents
navigation behavior

  ## Features
  - Drag handle on sidebar for resizing
  - Double-click to reset sidebar to default width
  - localStorage persistence for user preferences
  - Improved CSS for better visual experience

  ## Technical Details
  - Added `_static/custom.css` for styling improvements
  - Added `_static/js/resizable-sidebar.js` for functionality
  - Updated `conf.py` to include new CSS and JS files

  ## Testing
Tested on the documentation build with successful functionality for
sidebar resizing and layout improvements.
2025-07-24 14:46:38 -07:00
1df03f3abf [ci] fix: release ascend test time, fix one step off-policy CI (#2731)
### What does this PR do?

release ascend test time, recent PRs got cancelled but operation
successfully, fix one step off-policy CI.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-24 16:58:16 +08:00
a0248a8f17 [recipe] chore: add retool training script (#2732)
### What does this PR do?

Add retool training script.
2025-07-24 16:34:10 +08:00
8adcffa25a [ci] fix: checkpoint_convertor ci miss a hf model download (#2730)
### What does this PR do?

fix: checkpoint_convertor ci miss a hf model download

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-24 15:56:08 +08:00
88c084c4f3 [doc] feat: Add agent-lightning in the list of "awesome works using verl (#2726)
Add agent-lightning into `Awesome work using verl`

### What does this PR do?

This PR adds a recent work built upon verl into the "Awesome work using
verl" Section of the README.md file.
Add agent-lightning, a flexible and extensible framework that enables
seamless agent optimization for any existing agent framework, into
`Awesome work using verl`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-07-24 14:49:27 +08:00
dc3015e9af [tool] fix: geo3k create return str instead of tuple (#2714)
### What does this PR do?

change tool.create return from `instance_id, None` to `instance_id`

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-23 22:56:13 -07:00
73fc53f600 [megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification (#2484)
### What does this PR do?
Fixes gradient computation conflict in
`verl/workers/actor/megatron_actor.py` when entropy regularization is
enabled:
- **Root Cause**: The entropy calculation `entropy =
vocab_parallel_entropy(logits)` fails during backward propagation
because `log_probs = vocab_parallel_log_probs_from_logits(logits,
label)` performs in-place modifications on the logits tensor earlier in
the code. This corrupts the original computation graph needed for
gradient calculation.
- **Fix**: Decouples tensor dependencies by cloning logits before
entropy calculation to preserve the original computation graph while
maintaining existing log_probs computation flow.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

1. Run modified training script:
```bash
examples/ppo_trainer/run_qwen2-7b_math_gsm8k_megatron.sh \
--actor_rollout_ref.actor.entropy_coeff=0.01
```

2. The following error is observed (before repair):
<img width="1396" height="605" alt="image"
src="https://github.com/user-attachments/assets/0ed0f9f8-f4eb-41d3-9db8-c8f2163de910"
/>

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-24 13:37:18 +08:00
H
d57bfb02b3 [misc] chore: bump main branch version to v0.5.0.dev (#2718)
### What does this PR do?

bump main branch version to v0.5.0.dev

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-07-24 10:46:16 +08:00
0eed7124fc [sglang] fix: Adding strict naming sanity for sglang (#2719)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Thanks so much for pointing this out:

https://github.com/volcengine/verl/pull/2672#issuecomment-3105253661

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
2025-07-24 10:45:57 +08:00
1862f748e5 [ray] feat: RayWorkerGroup support set worker env (#2685)
### What does this PR do?

Support creating Ray worker with customized environment variable.

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: 陈齐翔 <chenqixiang.cqx@bytedance.com>
2025-07-24 10:07:35 +08:00
H
6a9a1b872d [ci] test: add CriticWorker unit test, make some util CPU friendly (#2717)
### What does this PR do?

add CriticWorker unit test, make some util CPU friendly

TODO:
- need to add option for attn_implementation. With this, the
actor/critic test can run on CPU nodes without problems.
- extend the test with sequence parallel & dynamic_bsz options

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Joel <wuxibin@bytedance.com>
Co-authored-by: Cheetah <1659275352@qq.com>
Co-authored-by: 杨睿 <yangruipis@163.com>
Co-authored-by: X. HU <huxiaobo@zju.edu.cn>
Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com>
Co-authored-by: Ziheng Jiang <ziheng@apache.org>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-23 15:36:10 -07:00
H
4de3ecf0f0 [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code (#2621)
As initially mentioned in
https://github.com/volcengine/verl/discussions/1941, having structured
configuration classes in verl makes argument passing easier for testing
and validation.

This is an extended thread on the current implementation of
configuration schema in verl. Related PRs:
-  https://github.com/volcengine/verl/pull/2117
- https://github.com/volcengine/verl/pull/2621 

# Motivation 
By moving from loose `omegaconfig.DictConfig`-based parameters to
structured dataclasses, we gain:
- Type safety & IDE support when accessing fields (e.g. cfg.optim.lr).
- Validation hooks via __post_init__ in each class.
- Immutable defaults with controlled mutability (e.g., an extra field).
- Seamless Hydra/OmegaConf integration and easy per-recipe extension.

# Core: BaseConfig

hydra natively provides support for converting DictConfig to dataclass,
but dataclass does not support accessing attribute via `get()`. We
introduce a base class to provide backward compatibility and make the
change less abrupt for existing users.

All config dataclasses inherit from BaseConfig, which:
- Implements collections.abc.Mapping → dict-like iteration/access.
- Freezes attributes once set, unless listed in _mutable_fields.
- Provides an `extra: dict[str, Any]` for unchecked extensions.

```python
@dataclass
class BaseConfig(collections.abc.Mapping):
    """Dict-like, frozen dataclass with opt-in mutability."""
    _mutable_fields: set[str] = {"extra"}
    extra: dict[str, Any] = field(default_factory=dict)

    def __setattr__(self, name: str, value):
        if name in self.__dict__ and name not in self._mutable_fields:
            raise FrozenInstanceError(f"Field '{name}' is frozen")
        super().__setattr__(name, value)

    # Mapping methods: get, __getitem__, __iter__, __len__ …

```

# Example Config Classes (verl/trainer/config)

Each sub-component of the trainer has its own dataclass, inheriting
BaseConfig.
```yaml:
critic:
  checkpoint:
    _target_: verl.trainer.config.CheckpointConfig
    save_contents: ["model","optimizer","extra"]
    load_contents: ["model","optimizer","extra"]
    async_save: false
```
Definition: 
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False

    def __post_init__(self):
        # validation checks go here after initialization


ckpt_cfg = CheckpointConfig(async_save=True)
print(ckpt_cfg.save_contents)
print(ckpt_cfg.get("save_contents", default_value))
print(ckpt_cfg["save_contents"])

# converting hydra-generated omegaconf.DictConfig to the dataclass config:
from verl.utils.config import omegaconf_to_dataclass
ckpt_cfg_from_cli = omegaconf_to_dataclass(config.critic.checkpoint)
```

# Extending existing config classes
Because now configs become structured, unexpected keys would raise
exceptions. To add new keys, there are two ways:
## Explicit class extensions:
```python
from verl.workers.config import FSDPActorConfig

@dataclass
class SPPOActorConfig(FSDPActorConfig):
    """Add SPPO-specific temperature/penalty."""
    sppo_eta: float = 1.0

```
When using yaml or from command line, update the target config class:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    _target_:  recipe.sppo.config.SPPOActorConfig # **new target dataclass required for extension **
    sppo_eta: 1.0  
```
or directly from command line:
```bash
python main_sppo.py \
  actor_rollout_ref.actor._target_=recipe.sppo.config.SPPOActorConfig \
  actor_rollout_ref.actor.sppo_eta=1.0
```

## Leverage the `extra` field
Adding more keys to the `extra` field of any dataclass that inherits
from `BaseConfig` also works. This way there's no need to define your
own dataclass in python:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    extra:
        sppo_eta: 1.0  
```

# Declaring mutable fields
For historical reasons some fields in the configs are mutated inplace in
the codebase such as batch size for data/sequence parallelism. We are in
the process of deprecating this kind of behavior. However, if you want
to intentionally mutate one field, specify it with the `_mutable_fields`
attr:
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    _mutable_fields = BaseConfig._mutable_fields | {"save_contents"} # mark save_contents as mutable.

    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False
```

# Other helpful resources
verl default trainer configs combines the following config files
together, specified in the `_defaults_` field:
https://github.com/volcengine/verl/blob/main/verl/trainer/config/ppo_trainer.yaml#L1-L36
- verl/trainer/config/ppo_trainer.yaml  # main config for entrypoint 
- verl/trainer/config/actor/dp_actor.yaml 
- verl/trainer/config/critic/dp_critic.yaml 
- verl/trainer/config/reward_model/dp_reward_model.yaml 
- verl/trainer/config/rollout/rollout.yaml 

To quickly peek the default full config in a single file, you can check
the auto-generated full config in
https://github.com/volcengine/verl/blob/main/verl/trainer/config/_generated_ppo_trainer.yaml

# Change log and impact on existing code
This PR converts the following fields to structured dataclass in the
training pipeline. More can be done in future PRs (contributions from
the community is welcome)
- [x] actor_rollout_ref.actor
- [x] critic 
- [ ] actor_rollout_ref.rollout
- [ ] actor_rollout_ref.ref
- [ ] reward_model
- [ ] data
- [ ] trainer

Changes needed for existing code that added new fields to config:
- see recipe/sppo for an example 
- `OmegaConf.to_container(self.config.model.get("override_config",
OmegaConf.create()))` now has to manually changed to
`self.config.model.get("override_config", {})`. Because
OmegaConf.to_container expects a DictConfig but
config.model.override_config is already a dict.

# Other Breaking Changes
critic.optim.lr for megatron changed from 1e-6 to 1e-5

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Joel <wuxibin@bytedance.com>
Co-authored-by: Cheetah <1659275352@qq.com>
Co-authored-by: 杨睿 <yangruipis@163.com>
Co-authored-by: X. HU <huxiaobo@zju.edu.cn>
Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com>
Co-authored-by: Ziheng Jiang <ziheng@apache.org>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-23 11:45:14 -07:00
H
8fdc4d3f20 [misc] chore: bump version to v0.5.0 (#2716) 2025-07-23 10:57:10 -07:00
e13863e463 [ci] fix: auto-download model in Megatron-related CI tests (#2698)
### What does this PR do?

Add a step that downloads the model needed for Megatron-related CI
tests.

### Test

See the CI result.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-23 10:49:09 -07:00
f926dc90b0 [sglang] fix: fix is_vlm issue (issue #2639) (#2667) 2025-07-23 10:45:57 -07:00
4ed106698b [megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS in ray error (#2709)
### What does this PR do?

Try avoiding repeated env vars in ray runtime env.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-23 18:57:57 +08:00
5bfb58e35d [recipe] fix: fix dapo cannot save the checkpoint of last step (#2619)
### What does this PR do?

This checkpoint fix the bug that in dapo recipe the dapo_ray_trainer
cannot save the checkpoint of last step.

### Checklist Before Starting

Similar PR
https://github.com/volcengine/verl/pull/2090

### Test

<img width="645" height="21" alt="image"
src="https://github.com/user-attachments/assets/cf501f2c-6b80-49aa-871a-3b066a2003c2"
/>
Can not save the last checkpoint. Only save the checkpoint with training
steps % save_freq=0


### Design & Code Changes

dapo_ray_trainer.py only record training steps variable but not record
generation steps. I add a variable gen_steps to record it.


### Others

Load checkpoint logic is also incorrect here.
<img width="624" height="340" alt="image"
src="https://github.com/user-attachments/assets/8469de9d-0fcd-47f3-8b74-f4ad7f155802"
/>

Progress bar initial value should be self.gen_steps instead of
self.train steps, thus we also need to fix load_checkpoint and
save_checkpoint. 0
2025-07-23 17:26:35 +08:00
e9072c58fa [ci] feat: CI request via Feishu (#2699)
### What does this PR do?

Add CI request via Feishu.

### Test

n/a

### API and Usage Example

n/a

### Design & Code Changes

n/a

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-07-23 14:54:15 +08:00
0404956290 [training_utils] fix: align tensorboard default dir for val_log_generation (#2696)
### What does this PR do?

align tensorboard default dir for val_log_generation

---------

Co-authored-by: wangxihuai <wangxihuai@meituan.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-23 14:09:58 +08:00
c95c9ef701 [fsdp,megatron,sglang] fix: Fix torch reduce to speed up update weights (#2692)
### What does this PR do?

**Speed up QWen3 MOE update weights from 110s to 37s**

Related to : https://github.com/sgl-project/sglang/pull/8267

Co-authored-by: CuiBo <82354186+SuperCB@users.noreply.github.com>
Co-authored-by: GeLee <865038696@qq.com>

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: CuiBo <82354186+SuperCB@users.noreply.github.com>
Co-authored-by: GeLee <865038696@qq.com>
2025-07-23 13:40:41 +08:00
OC
dc1599b7e4 [rollout] fix: bug in init_engine Method of AsyncSglangServer (#2664)
Fix error in AsyncSglangServer.init_engine when find works. The correct
logic should be based on:

gpu_per_node * nodes = dp_size * tp_size

Also added test steps reported from
https://github.com/volcengine/verl/issues/2633.
2025-07-23 13:09:37 +08:00
4792b70dd4 [megatron] fix: reset recompute_granularity and add backward compatibility fix (#2693)
### What does this PR do?

Reset `recompute_granularity` default to `None` to align with Megatron.
Add backward compatibility fix.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-23 11:16:23 +08:00
4c10dddf74 [fsdp] fix: use torch 2.7 state dict api for torch 2.6 to resolve OOM (#2606)
### What does this PR do?

for torch==2.6.0, distributed state dict is buggy and can leads to OOM
copy the fixed state dict api from torch==2.7.0 to verl/third_party.
It's convinent for users who cannot upgrade to torch==2.7.0

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
HYDRA_FULL_ERROR=1 CUDA_LAUNCH_BLOCKING=1 python3 -m verl.trainer.main_ppo       algorithm.adv_estimator=grpo       data.train_files=$HOME/data/gsm8k/train.parquet       data.val_files=$HOME/data/gsm8k/test.parquet       data.train_batch_size=512       data.max_prompt_length=1024       data.max_response_length=2048       data.filter_overlong_prompts=True       data.truncation='error'       data.image_key=images       actor_rollout_ref.model.path=Qwen/Qwen2.5-VL-32B-Instruct       actor_rollout_ref.model.use_remove_padding=True       actor_rollout_ref.model.enable_gradient_checkpointing=True       actor_rollout_ref.actor.strategy=fsdp2       actor_rollout_ref.actor.optim.lr=1e-6       actor_rollout_ref.actor.ppo_mini_batch_size=128       actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=10       actor_rollout_ref.actor.use_kl_loss=True       actor_rollout_ref.actor.kl_loss_coef=0.01       actor_rollout_ref.actor.kl_loss_type=low_var_kl       actor_rollout_ref.actor.entropy_coeff=0       actor_rollout_ref.rollout.name=vllm       actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=20       actor_rollout_ref.rollout.tensor_model_parallel_size=4       actor_rollout_ref.rollout.gpu_memory_utilization=0.6       actor_rollout_ref.rollout.enable_chunked_prefill=True       actor_rollout_ref.rollout.enforce_eager=False       actor_rollout_ref.rollout.free_cache_engine=False       actor_rollout_ref.rollout.n=5      actor_rollout_ref.ref.strategy=fsdp2       actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=20       actor_rollout_ref.ref.fsdp_config.param_offload=True       trainer.critic_warmup=0       trainer.logger=['console','tensorboard']       trainer.project_name='verl_grpo_example_geo3k'       trainer.experiment_name='qwen2_5_vl_32b_function_rm'       trainer.n_gpus_per_node=8      trainer.nnodes=1       trainer.save_freq=-1       trainer.test_freq=5       trainer.total_epochs=5
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

FSDP2 memory snapshot: cpu offloading works and peak memory is slightly
lower than FSDP1
<img width="1193" height="543" alt="Screenshot 2025-07-17 at 14 53 49"
src="https://github.com/user-attachments/assets/2d5b88b2-0d9e-40f7-ad75-f42b9acf1bab"
/>

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-22 19:54:33 -07:00
d20e5e07e1 [fsdp, ckpt] fix: Wrap GenerationConfig.from_pretrained with try-except to avoid crashes. (#2659)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Wrapping `GenerationConfig.from_pretrained` in a try-except block to
prevent crashes during checkpoint saving.
[Issue](https://github.com/volcengine/verl/issues/2658)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+GenerationConfig)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-23 10:18:35 +08:00
H
8888122a89 [megatron] fix: remove the demising model.enable_gradient_checkpointing flags in the script (#2691)
### What does this PR do?

They were removed in https://github.com/volcengine/verl/pull/2651 ... 
@ETOgaosion 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-23 09:25:30 +08:00
f252da34cf [megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS not taking effect (#2687)
### What does this PR do?

According to Kunlun Li 's detailed profiling work, envvar
`CUDA_DEVICE_MAX_CONNECTIONS=1` was not taking effect, the benefit of
this setting described here:
https://github.com/NVIDIA/Megatron-LM/issues/533#issuecomment-1760193239

Try put this variable in ray `runtime_env` to take effect. This will
make it a default option.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-22 20:51:12 +08:00
244481ac8f [misc] fix: main pre-commit and API change (#2675)
### What does this PR do?

Fix pre-commit error led by previous PR and cpu_unit test. Allow
recompute API change.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-22 15:01:20 +08:00
c5b189a1af [BREAKING][megatron] refactor: activation checkpointing APIs (#2651)
### What does this PR do?

Since we directly offer `override_transformer_config` option, we
directly use it to recompute activations. Default settings are the same
with `megatron.training`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-22 10:24:28 +08:00
72cae971d0 [sglang] fix: rename Sglang to SGLang following SGLang's fashion (#2672)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

As titled.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
2025-07-22 09:11:20 +08:00
d062314a18 [data, recipe] fix: remove redundant json parsing (#2671)
### What does this PR do?

> This PR fixes data preprocessing issues in MultiTurnSFTDataset.
Specifically, `json.loads` should not be called in `__getitem__`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/2233
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

After fixing the issue, the results for [ReTool's SFT
recipe](https://github.com/volcengine/verl/blob/main/recipe/retool/run_qwen3_4b_sp4.sh)
are as expected:
<img width="5056" height="2656" alt="W B Chart 7_21_2025, 2_09_33 PM"
src="https://github.com/user-attachments/assets/3252d8d2-7002-4a50-8329-0b0d4da1fa3e"
/>

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

N/A

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-22 09:09:10 +08:00
2bcc5d1212 [misc] fix: fix prompt and response key in gemma7b example (#2610)
### What does this PR do?

Fix the SFT gsm8k gemma7b example. Before this change create_sft_dataset
would error out.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-21 16:06:52 -07:00
e5f0b2aa80 [perf] feat: mistral and gemma3_text mfu compute support (#2622)
### What does this PR do?

Add mistral and gemma3_text mfu compute support

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-21 16:54:11 +08:00
ac826e0558 [tool] chore: Add log for AsyncRolloutRequest ID, and rollout viewr to support request id display and search (#2636)
### What does this PR do?

Add log for AsyncRolloutRequest ID in PPO ray_trainer and
sglang_rolllout. Update rollout viewr to support request id display and
search

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failur
2025-07-21 12:01:37 +08:00
3f6cd47926 [rollout,vllm] fix: A major issue in random sampling of vllm engine (#2646)
There is a optional config `+actor_rollout_ref.rollout.seed`, which is
used in


[fcb1e191b7/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py (L165-L185))

This config ensures identical initialization of vllm engine in
distributed systems.

However in 


[fcb1e191b7/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py (L202))

`class SamplingParam` unexpectedly adopts this `seed` param again when
`actor_rollout_ref.rollout.seed` is explicitly set.

In sampling param, this means the reproducibility during vllm inference.

This will cause serious problems, because in recent verl, the
`ray_trainer.py` will first flatten the input prompts, e.g.


[fcb1e191b7/verl/trainer/ppo/ray_trainer.py (L1160))

so if `+actor_rollout_ref.rollout.seed` is set, identical prompts will
receive identical responses, leading to a completely collapse of GRPO
training, as every advantage is zero, for example:

<img width="1104" height="858" alt="Screenshot 2025-07-20 002009"
src="https://github.com/user-attachments/assets/32eb1cc3-2ca2-41b9-9a9c-57b5dc557ed1"
/>
2025-07-21 12:00:28 +08:00
ac414d95c4 [recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node (#2645)
### What does this PR do?

- As title
- Achieves around 0.28 AIME'24 after 100 steps which takes around 1 day
on a H800 single node
- Note that we start from base model

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-20 18:49:21 -07:00
5d5ae81cdb [sglang] fix: update response handling and scoring method in GSM8K interaction (#2428)
### What does this PR do?

This PR corrects a mistake when calculating rewards during training with
the `gsm8k_w_interaction` setting.

- Changed the role check from "user" to "assistant" when extracting the
last message content.
- Simplified response assignment by removing unnecessary prefix checks.
- Updated scoring method from "flexible" to "strict" for improved
accuracy in GSM8K interactions.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

Docs update has been included in the PR.

### Design & Code Changes

No change for the design.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-21 08:06:46 +08:00
fcb1e191b7 [doc] fix: non-standardized path references (#2637)
Fix non-standardized path references.

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Fix non-standardized path references in
examples/grpo_trainer/run_moonlight16b_math_megatron.sh

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-20 18:49:16 +08:00
OC
7fc3029a1e [doc] fix: add options to enable agent loop (#2624)
### What does this PR do?

Add required options to enable agent loop in document.

### Checklist Before Starting

- [ x] Search for similar PRs. Paste at least one query link here: ...
- [ x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-20 06:03:06 +08:00
5d52d15fd3 [trainer] feat: Add FSDPCheckpointManager for SFTtrainer, support resume training, manage the number of CKPTS in keep (#2292)
[trainer] feat: Support resume from checkpoint, manage the number of
CKPTS in keep, compatible with previously saved CKPTS

### What does this PR do?

This PR adds checkpoint resume support to the FSDP SFT trainer using
`FSDPCheckpointManager`, enabling seamless continuation of training with
full state restoration — including model weights, optimizer, scheduler,
and training progress.

Introduces automatic checkpoint retention management, allowing control
over how many recent checkpoints to keep during training.


### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

To use the resume functionality, set the following configuration to your
trainer config:

```yaml
trainer:
  save_freq: 100
  max_ckpt_to_keep: 5  # Maximum number of checkpoints to keep, set to null to keep all
  # Resume mode: "auto", "disable", or "resume_path"
  # "auto": resume from last checkpoint if available
  # "disable": start from scratch
  # "resume_path": resume from a user-defined path
  resume_mode: auto
  # Path to resume training from (used when resume_mode is "resume_path" or "auto")
  resume_from_path: null
  checkpoint:
    # with 'hf_model' you can save whole model as hf format, now only use sharded model checkpoint to save space
    save_contents: ["model", "optimizer", "extra"]
    load_contents: ${trainer.checkpoint.save_contents}
```

Example Python usage:
```python
# Set these options to your existing script 
trainer.save_freq=100
trainer.resume_mode=auto  # "disable": start from scratch, "resume_path": resume from a user-defined path
trainer.resume_from_path=null  # "null" uses the latest ckpt when set resume_mode auto, or you can specifies the path
trainer.max_ckpt_to_keep=5  # limit number of saved checkpoints (null for unlimited)
trainer.checkpoint.save_contents=[model,optimizer,extra,hf_model]
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
2025-07-19 12:15:23 +08:00
69a467f934 [docker] fix: downgrade TransformerEngine version 2.2.1 to allow mcore image using rope fusion and provide another set of v0.5 image (#2611)
### What does this PR do?

Downgrade TransformerEngine version to allow mcore image using rope
fusion and provide another set of v0.5 image.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-18 17:23:19 +08:00
9d7cba4e12 [trainer] refactor: Training Engine Interface and Development Plan (#1977)
# [Refactor] Training Engine Interface and Development Plan

## Motivation  
See the original RFC for background:
https://github.com/volcengine/verl/issues/1371

Modernizing our training loop requires that we:

- **Decouple** training-backend implementation from algorithm code so
each can evolve independently
- **Unify** on a single, well-defined `Engine` interface across
FSDP/Megatron/etc backends
- **Enable** unit-testing of each backend implementation in isolation  
- **Guarantee** algorithm “roles” (Critic, Actor, Rollout, Ref) remain
completely engine-agnostic.

---

## Current Implementation  

This PR:
- Introduces an abstract `BaseEngine` class that defines a unified
training‐engine interface.
- Implements `FSDPEngine`, a concrete `BaseEngine` using PyTorch
FullyShardedDataParallel.
- Provides a `CriticWorker` based on `FSDPEngine` that plugs seamlessly
into existing PPO training code without any changes.


### Classic Training Loop with the New Interface

```python
# 1. Build and initialize engine
engine = FSDPEngine(config)
engine.init_model()
engine.set_loss_fn(loss_fn)

# 2. Training loop
for epoch in range(config.num_epochs):
    for batch in train_loader:
        # a) zero gradients
        engine.optimizer_zero_grad()

        # b) forward + backward
        with engine.train_mode():
            preds, loss, ctx = engine.forward_backward_step(
                batch,
                ctx,
                forward_only=False,
                preprocess_fn=preprocess_fn,
                postprocess_fn=postprocess_fn
            )

        # c) update and schedule
        grad_norm = engine.optimizer_step()
        current_lr = engine.lr_scheduler_step()

# 3. Evaluation
with engine.eval_mode():
    for micro_batch in data:
        preds, ctx = engine.forward_backward_step(
            micro_batch,
            ctx,
            forward_only=True,
            preprocess_fn=preprocess_fn,
            postprocess_fn=postprocess_fn
        )
```

### Detailed BaseEngine Interface
We now introduce an abstract base class, `BaseEngine`, which defines our
unified training-engine interface.

**Key enhancements over the original RFC:**
- **`train_mode()` / `eval_mode()`**  
Context managers to control parameter and activation load/offload at the
start and end of each loop.
- **`shard_data()` / `unshard_data()`**  
  APIs for partitioning and gathering data across devices or workers.  
- **`preprocess_fn` / `postprocess_fn` in `forward_backward_step()`**  
Hooks to apply custom transformations before and after each micro-batch
pass.

Below are the detailed signatures for each core method.

```python

class BaseEngine(object):
    """
    Abstract base class defining the interface for model training engines.

    Engine implementations must subclass BaseEngine and provide concrete behavior for all methods.
    """
    def __init__(self, config):
        """
        Initialize the BaseEngine.

        Args:
            config: Configuration object containing parameters for engine setup.
        """
        raise NotImplementedError

    def init_model(self):
        """
        Instantiate or load the model, optimizer, and learning rate scheduler.

        Should prepare all components necessary for training or evaluation.
        """
        raise NotImplementedError

    def train_mode(self):
        """
        Context manager entry for switching the engine and model into training mode.

        Usage:
            with engine.train_mode():
                # runs in training mode
        """
        raise NotImplementedError

    def eval_mode(self):        
        """
        Context manager entry for switching the engine and model into evaluation mode.

        Usage:
            with engine.eval_mode():
                # runs in evaluation mode
        """
        raise NotImplementedError

    def forward_backward_step(self, 
                              batch, 
                              ctx=None, 
                              forward_only=False, 
                              preprocess_fn=None, 
                              postprocess_fn=None):
        """
        Execute a forward pass (and optional backward pass) over a batch of data.

        Args:
            batch: Raw batch data (e.g., tensors or mappings) to process.
            ctx: Optional context dict passed to preprocess/postprocess functions.
            forward_only: If True, skip gradient computation and backward pass.
            preprocess_fn: Function(batch, ctx) -> (inputs, ctx), applied before model call.
            postprocess_fn: Function(outputs, ctx) -> (predictions, ctx), applied after model call.

        Returns:
            If forward_only:
                (predictions, ctx)
            Else:
                (predictions, loss, ctx)
        """
        raise NotImplementedError

    def optimizer_zero_grad(self):
        """
        Zero out gradients of all parameters before starting a new backward pass.
        """
        raise NotImplementedError

    def optimizer_step(self):
        """
        Perform an optimization step to update model parameters based on accumulated gradients.

        Returns:
            grad_norm (float): The norm of the gradients before clipping or update.
        """
        raise NotImplementedError

    def lr_scheduler_step(self):
        """
        Advance the learning rate scheduler by one step.

        Returns:
            current_lr (float or list[float]): Updated learning rate(s).
        """
        raise NotImplementedError

    def shard_data(self, data):
        """
        Shard or partition data for distributed training or parallel execution.

        Args:
            data: Data structure to be sharded across devices/workers.

        Returns:
            Sharded data in the same format as input.
        """
        raise NotImplementedError

    def unshard_data(self, data):
        """
        Reconstruct or gather sharded data back to a unified format.

        Args:
            data: Sharded data structure to reconstruct.

        Returns:
            Unsharded, combined data.
        """
        raise NotImplementedError
        

    def set_loss_fn(self, loss_fn):
        """
        Set the loss function to be used during training.

        Args:
            loss_fn: Callable(data, predictions, ctx) -> (loss_tensor, new_ctx)
        """
        raise NotImplementedError

    def to(self, device: str, model: bool = True, optimizer: bool = True):
        """
        Move model parameters, optimizer states, or both to the specified device.

        Args:
            device: Target device identifier (e.g., "cuda" or "cpu").
            model: If True, move the model.
            optimizer: If True, move the optimizer states.
        """
        raise NotImplementedError


    def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None):
        """
        Save model, optimizer, and scheduler states to a checkpoint.

        Args:
            local_path: Local filesystem path to save checkpoint.
            hdfs_path: Optional HDFS path to copy checkpoint.
            global_step: Integer training step number for naming.
            max_ckpt_to_keep: Maximum number of recent checkpoints to retain.
        """
        raise NotImplementedError


    def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_load=True):
        """
        Load model, optimizer, and scheduler states from a checkpoint.

        Args:
            local_path: Local filesystem path of the checkpoint.
            hdfs_path: Optional HDFS path where checkpoint is stored.
            del_local_after_load: Whether to delete local copy after loading.
        """
        raise NotImplementedError
```

### FSDPEngine Implementaion

A concrete `FSDPEngine` implements all methods using PyTorch
FullyShardedDataParallel, supporting all the features that FSDP DPCritic
Worker support:

- Multi-GPU/model sharding  
- Activation- and optimizer-offload  
- LoRA & sequence parallelism  
- Dynamic batch size and remove padding

### CriticWorker Implementation based on the FSDPEngine
- Unchanged public API 
- Each role calls only BaseEngine methods (init_model,
train_mode/eval_mode, forward_backward_step, etc.)
- No modifications needed in existing algorithms (e.g., PPOTraining)
- New roles can be plugged in identically to legacy code

## Development Plan
We’ll roll this out in three gated phases, controlled by a feature-flag
(`use_legacy_worker_impl`).

### Phase 1: Engine Development
> Flag: use_legacy_worker_impl = True (default)
> New interface under active development

- Refactor Critic, Actor, Rollout, Ref to use only BaseEngine APIs
- Design a hierarchical, immutable config system for engine/backends
- Ensure PPO training curves and final accuracy match legacy
implementation

### Phase 2: Migration
> Flag: use_legacy_worker_impl = False (default) – legacy path logs a
deprecation warning
> All new code targets the new interface; 2–3 months of
integration/stress testing

- Enforce new interface for all feature work
- Gather benchmarks, bug reports, and performance data

### Phase 3: Cleanup
> After Phase 2 validation:
- Remove legacy worker code and flags
- Finalize documentation, update changelogs, close deprecation notices

Please review this refactor and share any feedback or concerns!
Contributions are welcome.
2025-07-17 22:05:21 -07:00
223caf7022 [single_controller] fix: padding for kwargs (#2585)
### What does this PR do?

1. Fix bugs in func `_split_args_kwargs_data_proto_with_auto_padding`:
- Fix the padding_size calculation in kwargs to prevent additional
padding when `data_proto_len % chunks == 0`.
- Add the missing padding processing in kwargs.

2. Abstract the repetitive processing logic to simplify the code.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-07-18 10:10:49 +08:00
fb810355f3 [tool] fix: supports variable arguments for marked_timer (#2576)
### What does this PR do?

bugfix for npu marked_timer

`  File "xx/recipe/dapo/main_dapo.py", line 167, in run
    trainer.fit()
  File "xx/recipe/dapo/dapo_ray_trainer.py", line 134, in fit
    with marked_timer("gen", timing_raw, "red"):
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 301, in helper
    return _GeneratorContextManager(func, args, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 105, in __init__
    self.gen = func(*args, **kwds)
               ^^^^^^^^^^^^^^^^^^^
TypeError: marked_timer() takes 2 positional arguments but 3 were given`


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...

- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-17 13:35:36 -07:00
2b2aa9d3fd [tool] chore: introduce RolloutViewer TUI tools (#2469)
### What does this PR do?

Introduce a RolloutViewer TUI tools to visualize rollout and reward
dumped results easily, which supports:

-   async data loading, lightning open speed
-  ⌨️  full keyboard shortcut operation, you don't need a mouse
-  🔍  text search and highlight, you won't miss anything
- 📝 table or plain mode

usage:

```bash
python scripts/rollout_viewer.py ${trainer.rollout_data_dir}
```

 here is the main window screen shot:

<img width="2540" height="1416" alt="image"
src="https://github.com/user-attachments/assets/e34e5157-2880-4a21-afb2-73885d0dfb11"
/>



> We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI
Platform Technology Department , dedicated to developing
high-performance, easily-scalable distributed post-training engines.


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-17 13:30:41 -07:00
7459131411 [hardware] refactor: replace device_name with config.trainer.device (#2542)
### What does this PR do?

In some methods, the get_device() method is redundant, and we plan to
replace get_deivce with config.trainer.device

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-17 13:29:01 -07:00
2adedb77b4 [doc] chore: add agent loop design doc (#2598)
### What does this PR do?

Add Agent Loop design doc.
2025-07-17 13:27:27 -07:00
H
332c7d53c1 [cfg] refactor: add flatten megatron trainer config generation and verification script (#2582)
### What does this PR do?

- Added CONFIG_SPECS array: "config_name:output_file:config_arg" format
- Now generates both _generated_ppo_trainer.yaml and
_generated_ppo_megatron_trainer.yaml
- Maintains identical output format and verification behavior

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-07-17 08:08:45 -07:00
H
0b62a6ece1 [cfg] feat: add critic config class (#2583)
Added CriticConfig, MegatronCriticConfig, and FSDPCriticConfig
dataclasses with a clear inheritance hierarchy for critic model
configuration, and updated YAML files to support direct dataclass
instantiation.

## Changes
- Introduced dataclasses for critic configs, all inheriting from
BaseConfig.
- Added _target_ fields to critic YAML files for compatibility with
omega_conf_to_dataclass.
  - Added unit tests to verify config instantiation and inheritance.


##  Special notice
both megatron and fsdp critic contains the following config:
- model
- optimizer

however, the config names in these two configs are not yet consistent.
In this PR, they are retreated as `dict[str, Any]` for flexibility. We
shall introduce model config and optimizer config are they are
consolidated.

I've also removed kl_cntrol from megatron critic config, they're not
used @ETOgaosion

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-17 15:59:47 +08:00
40d638c63b [doc] fix: typo in perf_tuning.rst (#2590)
### What does this PR do?

typo in perf_tuning doc
2025-07-17 15:58:34 +08:00
648e3c95cc [doc] fix: fix some contents for one step off policy (#2591)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

fix some contents for one step off policy

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Co-authored-by: ArronHZG <hou.zg@foxmail.com>
2025-07-17 15:54:06 +08:00
1775bd638f [trainer] fix: maybe_filter_out_long_prompts on image and video (#2553)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

The `filter_out_long_prompts` function incorrectly used the `messages`
variable when it should have used `doc`. This led to prompts with images
or videos not being filtered correctly based on length.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

The variable messages is of type list[dict], for example:
```python
[
  {"type": "text", "content": "xxx"},
  {"type": "image", "content": "xxx"}
]
```
The variable doc is a dict, for example:
```python
{
  "data_source": xxx.
  "prompt": xxx,
  "images": xxx,
}
```
We need to retrieve the image or video column from the dataset, load the
actual images or videos, and then pass them into the tokenizer to obtain
the sequence length.

Using messages here is incorrect — both the type and semantics are
inappropriate. We should be using doc instead.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-17 14:17:20 +08:00
H
d51c52f754 [ci] chore: add codeowner for role/engine (#2587)
### What does this PR do?

add codeowner for role/engine

cc @ZihengJiang
2025-07-16 22:05:04 -07:00
64601e418c set use_kl_in_reward=True in reinforce_plus_plus (#2580)
set use_kl_in_reward=True in reinforce_plus_plus
2025-07-17 12:10:54 +08:00
503ea75f53 [trainer, fsdp, vllm, recipe] feat: one step off async training recipe (#2231)
### What does this PR do?

This PR provides a simple implementation of one step off async training
with fsdp and vllm backend.

We conducted three different experiments with qwen2.5_3b model on 8 A100
GPUs:
1. baseline:  all models are colocated
2. standalone rollout: rollout model runs on 4 GPUs and other models run
on remaining 4GPUs
3. one step off: the same model placement as the second experiment, but
with one step off async training

The pictures below demonstrate the results of these experiments:
<img
src="https://github.com/user-attachments/assets/1df6af46-2242-48e7-a937-a817b278e644"
width="30%" height="auto"><img
src="https://github.com/user-attachments/assets/bd5c1345-466a-478f-b0d3-95d9a8706496"
width="30%" height="auto"><img
src="https://github.com/user-attachments/assets/4cf76800-6763-4468-8b1f-b8be9d0fef51"
width="30%" height="auto">


In these experiments, baseline has the highest throughput, but we think
it is just because we didn't find the best configure for one step off
async training.

The exciting point is that our nccl based weights updating for rollout
model has great performance. The latency is showed below:
<img
src="https://github.com/user-attachments/assets/388e5736-ef84-4cf0-a586-6543cefb91be"
width="30%" height="auto">
At most of time, the latency is under 300ms, which is negligible for
RLHF. Although it is only implemented with fsdp and vllm now, we think
it is not complex to extend it to the other backend.


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

To use this feature, `hybrid_engine` option must be disabled to separate
actor model and rollout model into difference GPU cluster.
`rollout.n_gpus` option has been added to configure file to indicate how
many GPUs rollout model would be occupied. The script below is an
example to train `qwen2.5_3b` with 8 GPUs.

```shell

python3 -m recipe.async.async_main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.shuffle=False \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=3e-6 \
    actor_rollout_ref.hybrid_engine=False \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.rollout.n_gpus=4 \
    actor_rollout_ref.rollout.load_format=safetensors \
    actor_rollout_ref.rollout.layered_summon=True \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.val_before_train=True \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2.5_3b_grpo_async_one_step_off' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=-1 \
    trainer.total_epochs=15 $@
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. nccl based weights updating for rollout model.
5. one step off async trainer.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: arron <hou.zg@foxmail.com>
Co-authored-by: lalala-2 <yrzr12345678@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-07-16 19:45:53 -07:00
H
ef3fffc3a2 [trainer] refactor: no need to call load_reward_manager in compute_reward_async (#2557)
### What does this PR do?

Simply make changes in https://github.com/volcengine/verl/pull/1406
backward compatible. We'll remove the args for config & tokenizer in
next version.
Credit to @emergenz

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Franz Srambical <franz.srambical@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-17 09:52:36 +08:00
f0964b6650 [rollout] fix: fix bug for remax when the rollout mode is async (#2574)
### What does this PR do?

> fix bug for remax when the rollout mode is async, as metioned in
https://github.com/volcengine/verl/issues/2551
2025-07-16 22:45:09 +08:00
3f63715a96 [doc] fix: fix non-existing tag of base image in docs (#2569)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This pull request fixes the non-existing tag of base image in the docs.

`verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4-te2.3` =>
`verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4`

Only
[`verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4`](https://hub.docker.com/layers/verlai/verl/base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4/images/sha256-8338539fa36dd8780a9d09eef019f339aa2715f49ac3b6cf738d9ffdba00d75f)
and
[`verlai/verl:base-cu124-cudnn9.8-torch2.6-fa2.7.4-te2.3`](https://hub.docker.com/layers/verlai/verl/base-cu124-cudnn9.8-torch2.6-fa2.7.4-te2.3/images/sha256-6559fd00b049c43fb3eafc1a90ed7464b83653dd79d5c455b1a678dbdb88b3cd)
exist on the Dockerhub. Guess the previous one is the correct one
according to the commit history.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/search?q=repo%3Avolcengine%2Fverl+base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4&type=pullrequests
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

N/A

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

N/A

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: rudeigerc <rudeigerc@gmail.com>
2025-07-16 15:59:40 +08:00
96b730bbed [megatron] fix: wrong response_mask for megatron + sglang mutli-turn (#2543)
### What does this PR do?

when multi-turn is enabled , we need to mask the observation response
from input_ids, which is not generated by the model. so we should use
`reponse_mask` instead of `attention_mask` for loss calculation

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-16 14:27:07 +08:00
OC
da2ab088d9 [doc] fix: correct link in agentic RL doc (#2567)
fixed an invalid link in the doc.
2025-07-15 23:26:02 -07:00
152c599303 [perf] feat: Clip gsm8k solution string to optimize reward calculation (#2568)
### What does this PR do?

Huapeng: For regular expression matching, sometimes it cost too long for
reward calculation, so clip the last 300 chars to speed up.



<img width="1974" height="1120" alt="image"
src="https://github.com/user-attachments/assets/a339110c-c527-466c-aa83-5efa099b6ba8"
/>


Similar code(DAPO):
https://github.com/BytedTsinghua-SIA/DAPO/blob/main/eval/math_dapo.py#L278


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 22:51:44 -07:00
7aabfc437b [rollout] feat: add ReactAgentLoop based on LangGraph (#2463)
### What does this PR do?

This is an initial effort to integrate LangGraph into agent loop:
1. add a LangGraph react agent loop implementation
2. add math expression example to demonstrate react agent loop usage.

### Design & Code Changes

New components
- ChatModel: [custom chat
model](https://python.langchain.com/docs/how_to/custom_chat_model/)
using LangChain abstractions, implementing following abstract method:
  - bind_tools:  bind tools to the model
  - _generate:  native async generate chat completion message

- ReactAgentLoop: [LangGraph react
agent](https://langchain-ai.github.io/langgraph/agents/overview/) which
can use tools to perform tasks.

<img width="593" height="467" alt="image"
src="https://github.com/user-attachments/assets/d629b170-03c5-4810-a6b0-4dc27a285c0e"
/>

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-16 13:41:04 +08:00
6e21c0a625 [megatron] feat: support distributed megatron model converter and merger (#2281)
### What does this PR do?


- support distributed mcore model converter and merger, especially for
huge models like dpskv3 671B
- fix model merger bugs for dpskv3, related to
https://github.com/volcengine/verl/pull/2125

background:
https://github.com/volcengine/verl/pull/2125#issuecomment-2993276556
<img width="1189" height="371" alt="image"
src="https://github.com/user-attachments/assets/a317b928-963a-41e5-ae81-d4b6aa669516"
/>


> We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI
Platform Technology Department , dedicated to developing
high-performance, easily-scalable distributed post-training engines.


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-16 13:36:33 +08:00
1a89141222 [training_utils] fix: uneven support in split (#2560)
### What does this PR do?

As discussed in #2524, split should support uneven cases to avoid crash
in edge cases.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Unit test added.

### API and Usage Example

This PR avoids crashes like:

```
assert len(self) % split_size == 0, (
```

### Design & Code Changes

N/A

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-16 13:29:27 +08:00
OC
e300d0f099 [doc] feat: add document for agentic RL related features (#2563)
### What does this PR do?

add a document to describe new features in Agentic RL scenario.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

n/a

### API and Usage Example

n/a


### Design & Code Changes

n/a

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-16 12:51:16 +08:00
3f0773259c [tool] fix: correctly convert 'None' to null in sandbox fusion _process_single_case (#2409)
### What does this PR do?

Currently, `stdin_data` is passed into `_process_single_case` as None in
[`sandbox_fusion_tools`](https://github.com/volcengine/verl/blob/main/verl/tools/sandbox_fusion_tools.py#L179).

In
[`_process_single_case`](https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/sandbox_fusion/utils.py#L301),
we will call `str(None)` which erroneously converts it to `'None'` (a
string) when stdin should be empty.

```python
                api_response, error_msg = call_sandbox_api(
                    sandbox_fusion_url=sandbox_fusion_url,
                    code=current_generation_code,
                    stdin=str(stdin_data),
                    compile_timeout=timeout,
                    run_timeout=timeout,
                    memory_limit_mb=memory_limit_mb,
                    language=language,
                )
```

This PR adds a check for if `stdin_data` is None so that it doesn't get
converted and passed into stdin.


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Design & Code Changes

Add a line of logic to check whether or not `stdin_data` is None.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 20:53:39 -07:00
5f687b211d [sglang] fix: adding missing param for sgl async unit test (#2561)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Sorry for the carelessness that do not pass the unit test at
`tests/workers/rollout/test_sglang_async_rollout_w_interaction.py`.


https://github.com/volcengine/verl/actions/runs/16306898259/job/46054785740

Just fix it in the `get_rollout_config` function.

The e2e training is correct. Just fix the unit test.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
2025-07-15 20:22:43 -07:00
H
218298720f [ci] chore: add single-controller reviewer (#2554)
### What does this PR do?

add single-controller reviewer so changes are automatically notified. 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

cc @hongpeng-guo
2025-07-16 08:59:45 +08:00
f0d4c76ed6 [sglang] feat: update weights in batch with FSDP (#2559)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Thanks so much to @Yangruipis and @zhuzilin, we implemented the
group-wise weights update for SGLang in FSDP.

We are still testing the speed up in megtron and FSDP.

For megatron: https://github.com/volcengine/verl/pull/2418


At sgl, we're currently exploring two approaches to optimize resharding:

1. **Grouped calls to `update weights from tensor`**: Previously, we
called this endpoint for each tensor individually. We're now grouping
tensors to reduce the CPU overhead of these calls.
2. **Single large data buffer update**: We're investigating whether we
can form a single large data buffer to update a group of tensors all at
once. This would reduce the number of times the IPC handler is opened
and closed.

For the first approach, we're implementing it separately in Megatron and
FSDP. I'm starting by merging the FSDP implementation, and then I'll
create a common interface for Megatron. We're still evaluating the
second approach to see if it's feasible.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
2025-07-15 16:57:20 -07:00
1fe5daf7f1 [sglang, megatron, perf] feat: speed up megatron sglang weight update by 10x (#2418)
### What does this PR do?

optimize the performance of sglang+megatron weight update refer to the
bucketing implementation of
[`THUDM/slime`](fb7605cc5f/slime/ray/ppo_actor.py (L452)).

|model| bucket size MB |boost |
| ---- | ----- | ---- |
| Moonlight16B @ 8xH20 | 512MB | 175s -> 18s |
|DeepseekV3 671B @ 512xH20| 512MB | ONGOING |


releated to issues https://github.com/volcengine/verl/issues/2419 ,
https://github.com/sgl-project/sglang/issues/6762
https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/169

similar fixes for FSDP: https://github.com/volcengine/verl/pull/2499 


> We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI
Platform Technology Department , dedicated to developing
high-performance, easily-scalable distributed post-training engines.


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
2025-07-15 14:46:45 -07:00
a63243b0dd [fsdp] fix: change geo3k model name from non-vl to vl (#2555)
### What does this PR do?

Fix geo3k script `model_name` from non vl model to vl model

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 12:07:42 -07:00
H
166d91a62e [trainer] refactor: minor code cleanup (#2537)
### What does this PR do?

clean up entrypoint and train loop

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Rely on existing tests.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-15 09:24:49 -07:00
2c0ae781d9 [ray] fix: strip [] for ipv6 address (#2545)
### What does this PR do?

Strip square brackets of ipv6 address `[::1]`, torch `MASTER_ADDRESS`
doesn't need it.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 20:29:45 +08:00
2dea2598a1 [data] fix: Add missing init files in verl experimental data folders (#2548)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Upon import of version from main we get this error due to the missing
`__init__.py` files.

```
 from verl.experimental.dataset.sampler import AbstractSampler
ModuleNotFoundError: No module named 'verl.experimental.dataset'
```
The pr in https://github.com/volcengine/verl/pull/2381 forgot to add
these files.

In this PR I followed what's in existing files and added the missing
`__init__.py` files.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 20:29:29 +08:00
10f4eb8cfc [misc] chore: fix typo in function name (#2525)
### What does this PR do?

fix typo `gather_outpus_and_unpad` -> `gather_outputs_and_unpad`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-07-15 19:06:20 +08:00
473d8ff0c1 [env] fix: bump tensordict to 0.9.1 (#2541)
### What does this PR do?

Bump to tensordict 0.9.1 and ban 0.9.0 per discussions in #2460.

This bug: https://github.com/pytorch/tensordict/issues/1374 has an
impact on dp_actor, making it crash because of the wrong batch size.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 19:04:07 +08:00
bbd1288353 [data, megatron] feat: add dynamic batching computational workload balance (#2452)
### What does this PR do?

To improve computational workload balance when using
`use_dynamic_batch`.
Sort the resulting micro-batches by their sum of squared sequence
lengths (approximate the computation cost of attention) in descending
order. This can help reduce imbalance in data parallelism and pipeline
parallelism.


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/2381
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

`using_dynamic_batch_balance` (The line with the suffix `sort` in the
figure below) can get better MFU in Qwen2.5-Math-7 DAPO.
<img width="835" alt="MFU"
src="https://github.com/user-attachments/assets/bc56711c-3d5f-4e91-83e7-29a65f195e57"
/>

More comprehensive [Experiment
Report](https://api.wandb.ai/links/ai4env/tw0zfh5o)


### API and Usage Example

modify
[`./recipe/dapo/test_dapo_7b_math_megatron.sh`](https://github.com/volcengine/verl/blob/main/recipe/dapo/test_dapo_7b_math_megatron.sh)
```bash
python3 -m verl.trainer.main_ppo \
    --config-path=config \
    --config-name='ppo_megatron_trainer.yaml' \
   ...
    actor_rollout_ref.actor.use_dynamic_bsz=True \   
    actor_rollout_ref.actor.use_dynamic_bsz_balance=True \
   ...
```

### Design & Code Changes

Specific changes: sort the micro batch by their computation workload
(approximate by the Attention) after the partition of dynamic batch.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 14:17:28 +08:00
83d6a80ac0 [fsdp] fix: vlm dynamic batch & unify dynamic batch api (#2524)
### What does this PR do?

The use of `len(data)` is incorrect since the data is a **dict** if we
enable dynamic batch for VLMs. It will return the number of keys in the
dict instead of the number of batch samples.


0b508ab803/verl/workers/actor/dp_actor.py (L540-L542)


0b508ab803/verl/workers/actor/dp_actor.py (L432-L434)

It can work correctly with pure-text LLMs because the data here is a
**tensordict** that has a `len` API.


0b508ab803/verl/workers/actor/dp_actor.py (L441-L443)

To solve this problem, we use `response_mask.shape[0]` to get the number
of samples in dynamic batch.

Nevertheless, I think the current implementation isn't elegant because
the underlying object processed here can be either a **dict** or a
**tensordict**. So I unify the APIs of dynamic batch and provide two
functions: `prepare_dynamic_batch` and `restore_dynamic_batch`. They can
be used for both computing log probs and updating actor. They remove the
redundant code and make a clean view for the fsdp workers.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Test with Qwen2.5-VL-3b dynamic batch on the Geo3k dataset

```bash
python examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k

python -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/geo3k/train.parquet \
    data.val_files=$HOME/data/geo3k/test.parquet \
    data.train_batch_size=512 \
    data.max_prompt_length=1024 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-VL-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=6144 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=20 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=6144 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_nightly_ci' \
    trainer.experiment_name='qwen2_5_vl_3b_function_rm' \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15
```

Results: orange: before this PR, blue: after this PR

<img width="4432" height="1290" alt="image"
src="https://github.com/user-attachments/assets/abce366a-98f9-4d97-8a33-9c8a2818c362"
/>


### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
def prepare_dynamic_batch(data: DataProto, max_token_len: int) -> tuple[list[DataProto], list[list[int]]]:
    """
    Prepare a batch for dynamic batching.

    Args:
        data (DataProto): The input data.
        max_token_len (int): The maximum token length for dynamic batching.

    Returns:
        Tuple[List[DataProto], List[List[int]]]: A tuple containing a list of DataProto objects
        and a list of index lists.
    """
    ...

def restore_dynamic_batch(data: torch.Tensor, batch_idx_list: list[list[int]]) -> torch.Tensor:
    """
    Restore a batch from dynamic batching.

    Args:
        data (torch.Tensor): The input data.
        batch_idx_list (List[List[int]]): The list of index lists.

    Returns:
        torch.Tensor: The restored data.
    """
    ...
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 14:07:41 +08:00
H
2c407f231f [cfg] fix: fix _generated_ppo_trainer.yaml pre-commit error on main (#2534)
### What does this PR do?

- Run scripts/generate_trainer_config.sh to update auto-generated config
- Adds missing trace configuration fields (backend, token2text)
- Fixes pre-commit hook failure

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-14 19:42:20 -07:00
517cc23c9d [megatron] feat: allow override DistributedDataParallelConfig (#2523)
### What does this PR do?

Allow to override `DistributedDataParallelConfig` for custom
configurations.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-15 09:09:52 +08:00
53ec813847 [ray] refactor: Use public method to get node IP (#2521)
### What does this PR do?

1. Currently, verl uses `ray._private.services.get_node_ip_address()` to
get the node IP. However, it's better to avoid using functions under
`_private`. Instead, we should use the public API
`ray.util.get_node_ip_address()`. Both are equivalent:
c6e2080a96/python/ray/util/__init__.py (L6).

2. Update some methods in `class WorkerHelper` to be `@staticmethods`
because they don't rely on the class's state.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test



### API and Usage Example

No

### Design & Code Changes

No

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2025-07-15 09:09:11 +08:00
H
d0c7bbbc05 [cfg] refactor: support +extra.any_key usage for the base dataclass config in verl (#2502)
### What does this PR do?

This PR makes update to the base config in verl:
- support +extra.any_key usage for the base config in verl.
- allow selective subfields to be frozen
- add a auto-generated config yaml file
`verl/trainer/config/_generated_ppo_trainer.yaml` for reference purpose,
in case the nested inheritance structure makes the config information
too scattered

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

- added frozen field tests

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

Now you can pass `--xx.profiler.extra.any_new_key=any_plain_value` in
command line to a dataclass inheriting `verl.BaseConfig`. This way we
can still pass dataclass configs inside verl but allow some flexiblity
in accepting new keys from users' adhoc usage.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Lin <haibin@Lins-Laptop.hsd1.wa.comcast.net>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 09:06:56 +08:00
OC
def5b28e3d [rollout] feat: support mlflow in rollout trace (#2440)
Implemented mlflow as rollout trace backend. Comparing to weave, mlflow
is a lite weight solution and can be deployed on-premises easily.

### API and Usage Example

docs/advance/rollout_trace.rst
2025-07-15 05:18:40 +08:00
141b1d3251 [recipe] fix: DAPO rewards using sandbox fusion (#2496)
### What does this PR do?

Fix some bugs/outdated code so that we can use sandbox fusion for DAPO.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

Use `load_reward_manager` in `verl.trainer.ppo.reward` instead of
duplicating the code there.

Also, set `acc` in `reward_extra_info` when the returned result is only
a float number (e.g. sandbox fusion).

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-14 20:10:48 +08:00
0b508ab803 [single_controller] fix: replace unittest.mock.patch with context manager for env var handling (#2498)
### What does this PR do?

Fixes a critical issue in the Ray worker initialization where
environment variables were not being properly preserved when using
`unittest.mock.patch`. This could lead to environment variables being
unexpectedly deleted after worker initialization. The fix replaces the
`patch` usage with a proper context manager for safer environment
variable management.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+environment+variables+patch
- [x] Format the PR title as `[single_controller] fix: replace
unittest.mock.patch with context manager for env var handling`

### Test

This change can be tested through the CI system since it affects core
worker initialization functionality. The following test scenarios should
be covered:
- Worker initialization with pre-existing environment variables
- Worker initialization without pre-existing environment variables
- Multiple worker initializations in sequence
- Error cases during worker initialization

### API and Usage Example

No API changes. Internal implementation change only. The fix uses a new
context manager:

```python
@contextmanager
def temp_env_var(key: str, value: str):
    """Context manager for temporarily setting an environment variable."""
    original = os.environ.get(key)
    os.environ[key] = value
    try:
        yield
    finally:
        if original is None:
            os.environ.pop(key, None)
        else:
            os.environ[key] = original

# Usage in worker initialization
with temp_env_var("DISABLE_WORKER_INIT", "1"):
    worker = user_defined_cls(*args, **kwargs)
```

### Design & Code Changes

Changes made:
1. Removed dependency on `unittest.mock.patch`
2. Added new `temp_env_var` context manager for safe environment
variable handling
3. Updated worker initialization code in two locations to use the
context manager:
   - In `WorkerDict.__init__` for regular worker initialization
   - In `FusedWorker.__init__` for fused worker initialization
4. Ensures environment variables are properly restored even if
initialization fails

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
- [x] Apply pre-commit checks
- [ ] Add / Update the documentation - N/A (internal implementation
change)
- [x] Add unit tests to verify environment variable handling in worker
initialization
- [ ] Request CI via Slack channel
2025-07-14 16:44:41 +08:00
fbec86d7fe [BUG] fix bug for #2506, when passing as response_mask to policy_loss_fn (#2513)
### What does this PR do?

[BUG] advantages is incorrectly passed as response_mask to
policy_loss_fn in dp_actor.py #2506
fix https://github.com/volcengine/verl/issues/2506
2025-07-14 13:27:48 +08:00
a31a8f251f [doc] fix: quickstart example can't work on zsh (#2509)
### What does this PR do?

I followed the instructions at
https://verl.readthedocs.io/en/latest/start/quickstart.html to run the
PPO example on my devbox, which uses zsh. However, I got the error zsh:
no matches found: `trainer.logger=[console]` because `[]` is interpreted
as a glob pattern in zsh.

```
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
 data.train_files=$HOME/data/gsm8k/train.parquet \
 data.val_files=$HOME/data/gsm8k/test.parquet \
 data.train_batch_size=256 \
 data.max_prompt_length=512 \
 data.max_response_length=256 \
 actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 actor_rollout_ref.actor.optim.lr=1e-6 \
 actor_rollout_ref.actor.ppo_mini_batch_size=64 \
 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
 actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
 actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
 critic.optim.lr=1e-5 \
 critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 critic.ppo_micro_batch_size_per_gpu=4 \
 algorithm.kl_ctrl.kl_coef=0.001 \
 trainer.logger=['console'] \
 trainer.val_before_train=False \
 trainer.n_gpus_per_node=1 \
 trainer.nnodes=1 \
 trainer.save_freq=10 \
 trainer.test_freq=10 \
 trainer.total_epochs=15 2>&1 | tee verl_demo.log
zsh: no matches found: trainer.logger=[console]
```

This PR has 3 changes:
* `trainer.logger=['console']` -> `trainer.logger=console`
* `trainer.logger=['console','wandb']` ->
`trainer.logger='["console","wandb"]'`
* `trainer.logger=['console','tensorboard']` ->
`trainer.logger='["console","tensorboard"]'`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* `trainer.logger=console` (zsh)
<img width="898" height="564" alt="image"
src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80"
/>

* ``trainer.logger='["console","wandb"]'`` (zsh)
<img width="870" height="565" alt="image"
src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1"
/>

* `trainer.logger=console` (bash)
  ```bash
ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m
verl.trainer.main_ppo \
  >  data.train_files=$HOME/data/gsm8k/train.parquet \
  >  data.val_files=$HOME/data/gsm8k/test.parquet \
  >  data.train_batch_size=256 \
  >  data.max_prompt_length=512 \
  >  data.max_response_length=256 \
  >  actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  actor_rollout_ref.actor.optim.lr=1e-6 \
  >  actor_rollout_ref.actor.ppo_mini_batch_size=64 \
  >  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
  >  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
  >  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  >  actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
  >  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
  >  critic.optim.lr=1e-5 \
  >  critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  critic.ppo_micro_batch_size_per_gpu=4 \
  >  algorithm.kl_ctrl.kl_coef=0.001 \
  >  trainer.logger=console \
  >  trainer.val_before_train=False \
  >  trainer.n_gpus_per_node=1 \
  >  trainer.nnodes=1 \
  >  trainer.save_freq=10 \
  >  trainer.test_freq=10 \
  >  trainer.total_epochs=15 2>&1 | tee verl_demo.log
2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray
instance. View the dashboard at 127.0.0.1:8265
(TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID:
1799248
(TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint':
{'load_contents': ['model',
(TaskRunner pid=1799248) 'optimizer',
(TaskRunner pid=1799248) 'extra'],
(TaskRunner pid=1799248) 'save_contents': ['model',
(TaskRunner pid=1799248) 'optimizer',
(TaskRunner pid=1799248) 'extra']},
  ```

* `trainer.logger='["console","wandb"]'` (bash)
  ```bash
ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m
verl.trainer.main_ppo \
  >  data.train_files=$HOME/data/gsm8k/train.parquet \
  >  data.val_files=$HOME/data/gsm8k/test.parquet \
  >  data.train_batch_size=256 \
  >  data.max_prompt_length=512 \
  >  data.max_response_length=256 \
  >  actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  actor_rollout_ref.actor.optim.lr=1e-6 \
  >  actor_rollout_ref.actor.ppo_mini_batch_size=64 \
  >  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
  >  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
  >  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  >  actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
  >  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
  >  critic.optim.lr=1e-5 \
  >  critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  critic.ppo_micro_batch_size_per_gpu=4 \
  >  algorithm.kl_ctrl.kl_coef=0.001 \
  >  trainer.logger='["console","wandb"]' \
  >  trainer.val_before_train=False \
  >  trainer.n_gpus_per_node=1 \
  >  trainer.nnodes=1 \
  >  trainer.save_freq=10 \
  >  trainer.test_freq=10 \
  >  trainer.total_epochs=15 2>&1 | tee verl_demo.log
2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray
instance. View the dashboard at 127.0.0.1:8265
(TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID:
1805000
(TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint':
{'load_contents': ['model',
(TaskRunner pid=1805000) 'optimizer',
(TaskRunner pid=1805000) 'extra'],
(TaskRunner pid=1805000) 'save_contents': ['model',
(TaskRunner pid=1805000) 'optimizer',
(TaskRunner pid=1805000) 'extra']},
  ```

### API and Usage Example

No

### Design & Code Changes

No

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2025-07-14 13:26:32 +08:00
4d0f4d056e [doc] feat: update npu profiler doc and script (#2514)
### What does this PR do?

Since the profiler has removed the individual configurations for
`actor`, `rollout`, and `ref`, and now uses a unified configuration
under `actor_rollout_ref.profiler`, the documentation and scripts for
the NPU profiler need to be updated accordingly.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-14 11:10:27 +08:00
92758d681c [env] fix: Change the permissions of install_vllm_sglang_mcore.sh from 644 to 755 to allow execution (#2508)
### What does this PR do?

I followed the instructions at
https://verl.readthedocs.io/en/latest/start/install.html#install-dependencies
to install verl. The guide asks me to run the script
`scripts/install_vllm_sglang_mcore.sh`, but its permission is set to
644.

```
# Make sure you have activated verl conda env
# If you need to run with megatron
bash scripts/install_vllm_sglang_mcore.sh
# Or if you simply need to run with FSDP
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
```

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Here are the steps I followed to update the permission.
```sh
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl) ✗ ./scripts/install_vllm_sglang_mcore.sh
zsh: permission denied: ./scripts/install_vllm_sglang_mcore.sh
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl) ✗ ll scripts/install_vllm_sglang_mcore.sh
-rw-rw-r-- 1 ubuntu ubuntu 2.4K Jul 13 05:04 scripts/install_vllm_sglang_mcore.sh
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl) ✗ chmod +x scripts/install_vllm_sglang_mcore.sh
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl) ✗ ./scripts/install_vllm_sglang_mcore.sh
1. install inference frameworks and pytorch they need
Looking in links: https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python
Collecting sglang==0.4.6.post1 (from sglang[all]==0.4.6.post1)
...
```

### API and Usage Example

No

### Design & Code Changes

No

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2025-07-13 15:36:11 -07:00
11e0cf752e [misc] refactor: remove deprecated codes (#2494)
### What does this PR do?

After PR https://github.com/volcengine/verl/pull/2257, I think vllm_mode
is no longer used

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).


cc @eric-haibin-lin
2025-07-13 15:34:31 -07:00
8e0b9bd9e5 [recipe] chore: Remove the duplicate definition of class Role (#2503)
### What does this PR do?

`spin_trainer.py` defines `class Role` which is totally the same as
`class Role` defined in `ray_trainer.py`.

* `spin_trainer.py`

4aa02fe166/recipe/spin/spin_trainer.py (L55-L66)

* `ray_trainer.py`

4aa02fe166/verl/trainer/ppo/ray_trainer.py (L67)

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test


https://github.com/volcengine/verl/blob/main/.github/workflows/e2e_spin.yml

### API and Usage Example

No

### Design & Code Changes

No

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2025-07-13 18:56:02 +08:00
4aa02fe166 [trainer] fix: Allow FSDP2 when doing strategy check (#2497)
### What does this PR do?

Allow FSDP2 when doing strategy check

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

For `strategy` field, now both "fsdp" and "fsdp2" are considered valid.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-12 16:31:31 -07:00
eac4863ad7 [env] feat: safely bump py version to 3.10 (#2421)
### What does this PR do?

This PR safely bumps python version to 3.10 for two reasons:
1.
[`removeprefix`](https://docs.python.org/3.9/whatsnew/3.9.html#new-string-methods-to-remove-prefixes-and-suffixes)
was introduced in python 3.9
588f9728f3/verl/single_controller/ray/base.py (L498-L505)
2.
[`match`](https://docs.python.org/3.10/whatsnew/3.10.html#simple-pattern-match-to-a-literal)
was introduced in python 3.10
588f9728f3/verl/tools/utils/tool_registry.py (L81-L92)



### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-12 16:29:39 -07:00
6519220006 [trainer] fix: use .keys() to check 'response_mask' in TensorDict (#2491) 2025-07-12 14:48:13 +08:00
75f2abf0a5 [sglang] fix: Only flush cache on TP rank=0. (#2455)
### What does this PR do?

> We should call `flush_cache` in the same way it's done in the
`_req_level_generate_sequences` function; otherwise, it will cause an
error when TP16 is enabled.
<img width="575" alt="image"
src="https://github.com/user-attachments/assets/ab569ffe-22d1-402c-a58d-741253794a54"
/>


### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-11 20:54:17 -07:00
f0b4abaefc [fsdp] fix: Change the data in the update_actor function from to.('cpu') to to.(get_device_id()) (#2477)
### What does this PR do?

>
When training the Qwen3-32B model by using the DAPO algorithm in a
dual-NPU environment, an error occurred during the update actor phase
where the partition was found to be empty. We found that the
data.to("cpu") operation in the update_actor function differed from the
data handling methods in other functions. Rolling it back to
data.to(get_device_id()) successfully resolved the error. Further
verification confirmed that keeping the data on the device side does not
trigger OOM issues. Therefore, we implemented this modification.​

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Co-authored-by: 王凯宇 <wangkaiyu11@h-partners.com>
2025-07-12 10:53:05 +08:00
590a62ae45 [training_utils] feat: log_generations_to_swanlab use table (#2489)
### What does this PR do?

Enhance the model output logging section in the
`log_generations_to_swanlab` function to improve visualization.


![20250712-030651](https://github.com/user-attachments/assets/370f3a04-d1c0-4441-bcc4-ddeb27be5e85)

demo link:
https://swanlab.cn/@ZeyiLin/verl_examples/runs/e9hgu4yx78ra74bh1346v/chart#YWh6djBw-MloyeE8ybkk=

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-12 08:56:25 +08:00
c3e953cf44 [docker] feat: provide images with deepep (#2480)
### What does this PR do?

Provide images with deep-ep.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-11 21:12:49 +08:00
d9a6a31c8d [megatron] feat: fused kernel lightweight (#2210)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Integrate @Jianbing-D 's fused kernel into megatron side. Memory saving
amount may need some further check.

### Test

Fused kernel e2e tests

#### Alignment

<img width="780" alt="image"
src="https://github.com/user-attachments/assets/b6929f6d-f98d-49a8-a714-2627f2cb7264"
/>

#### Performance

Gray line is no fused kernel.

<img width="384" alt="image"
src="https://github.com/user-attachments/assets/8b19e227-6450-4300-9bf4-0ba6a07cbab0"
/>

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>
2025-07-11 15:55:41 +08:00
ada82bb719 [doc] feat: update documentation of nsight profiling (#2470)
### What does this PR do?

Update Nsight profiling documentation accordingly

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

Merge config actor_rollout_ref.(actor, ref, rollout).profiler to
actor_rollout_ref.profiler

### Design & Code Changes

Only documentation update

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-11 11:38:28 +08:00
1dfc1359da [perf] feat: add range tag to start/stop profile; clean actor_rollout_ref.profiler (#2456)
### What does this PR do?

I found the cost of workers start/stop profile is not negligible, there
are big gap between steps which is annoying. So I add range tag to them,
making it clear.

Another change, I realize that `actor_rollout_ref` needs only one
`profiler` config, and needn't redundant for each role.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.


### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-11 10:12:56 +08:00
49fe461fb8 [doc] chore: add documentation for truncation: middle option (#2462)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

RLHF dataset has gotten a middle option of truncation without annotation
and this option is not mentioned in the docs (See PR #1488 ). The
annotations are added in this pull request.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Only docs and function comments update, no test needed.

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: ChitandaErumanga <yinyuqi0001@163.com>
2025-07-11 08:11:53 +08:00
H
01624c6da7 [doc] fix: colocation documentation updates (#2465)
### What does this PR do?

Update docs and awesome work.
2025-07-11 08:11:17 +08:00
de38ed4218 [env] feat: upgrade tensordict version (#2460)
### What does this PR do?

Upgrade tensordict to latest

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-10 16:12:02 -07:00
a8d9d25574 [misc] feat: add py.typed file to verl/ (#2467)
### What does this PR do?

Adds a [pep 561](https://peps.python.org/pep-0561/) marker file to
express that verl supports types. Now, when I typecheck my package which
imports `verl` I no longer have to add `# type: ignore` after `import
verl`.

 
 

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Co-authored-by: Fred <frederrx@amazon.com>
2025-07-10 16:11:23 -07:00
1f3f0a5309 [misc] fix: add *.yaml to pyproject due to modular config (#2468)
### What does this PR do?

Add all yaml file in configs to wheel building. Since current config
loading is quite modular, we need to add those files to wheel to avoid
loading issue

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-10 16:10:32 -07:00
H
269bb4a4bc [doc] chore: add ICML meetup and upcoming feat (#2431)
### What does this PR do?

Update readme
2025-07-10 21:49:51 +08:00
7b523663e3 [hardware] fix: enable sleep mode on ASCEND NPU (#2459)
### What does this PR do?

We found that there is an OOM issue when running the Qwen2.5-VL model,
In the current version, it is necessary to set
actor_rollout_ref.rollout.free_cache_engine=True to enable sleep mode.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-10 20:37:31 +08:00
c26b0f2906 [misc] refactor: Replace deepcopy with tensor.clone (#2442)
### What does this PR do?

Optimize tensor copying in `MegatronPPOActor` by replacing copy.deepcopy
with torch.Tensor.clone, which should improve performance slightly.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-10 15:18:34 +08:00
H
fc8acdc607 [cfg] refactor: split fsdp/megatron specific configs, consolidate shared ones for reward_model and critic (#2433) 2025-07-09 22:00:39 -07:00
ab11fff33d [trainer, data] feat: Dynamic Data Generation (#2312)
### What does this PR do?

Add interface to support dynamic data generation which will allow us to
create new tasks between each step of training.

To elaborate, this PR is refactoring the code and providing an interface
to make it easier to implement other dynamic data generation algorithms.
In particular, we want to have the model propose new tasks based on
which tasks currently do or don't succeed. This has been shown to be
useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205,
https://openreview.net/pdf?id=oVKEAFjEqv,
https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335.

Basic example that could be useful: 

Imagine wanting to generate variations on the hardest tasks for the
current training loop. We implement this as a LLM API call as a custom
data generator followed by a custom sampler that selects the desirable
datapoints as they're generated.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: is:pr
is:open data generation
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

`bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh`

more details in Usage Example section below.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

1. Change the yaml to enable 
```
--- a/verl/trainer/config/ppo_trainer.yaml
+++ b/verl/trainer/config/ppo_trainer.yaml
@@ -93,11 +93,11 @@ data:
 
     # The path to the file containing your customized data generation class.
     # E.g. 'verl.utils.dataset.datagen'
-    path: null 
+    path: 'verl.utils.dataset.datagen'
 
     # The class name of the data generation class within the specified file.
     # E.g. NoOpDataGen
-    name: null 
+    name: 'NoOpDataGen'
```

The noop dataset just reappends the first datapoint at the end. You can
see that this correctly happened by printing out the size of the dataset
each epoch:

```
(TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668
(TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset.
Training Progress:   0%|          | 0/435 [00:00<?, ?it/s]
(WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster]
(WorkerDict pid=74307)   tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster]
(TaskRunner pid=71298) filter dataset len: 1
(TaskRunner pid=71298) new dataset len: 7474
(TaskRunner pid=71298) 
Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s]
(TaskRunner pid=71298) 7474
(TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599
(TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset.
Training Progress:   0%|          | 1/435 [02:32<18:24:31, 152.70s/it]
(TaskRunner pid=71298) filter dataset len: 1
(TaskRunner pid=71298) new dataset len: 7475
```

Note the original dataset length is 7473 for `gsm8k_w_tool`



### High-Level Design

> Demonstrate the high-level design if this PR is complex.

n/a


### Specific Changes

> List the specific changes.

- Add an abstract datagen class that's used in ray_trainer.py to add
data to the dataset
- We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset
- We add `append_dataframe` to RLHFDataset
- Add util for getting type from file.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-07-09 13:25:28 -07:00
b3aed0d6c3 [sglang] fix: Fix qwen2vl weight keys issue (#2434)
### What does this PR do?
Reapply https://github.com/volcengine/verl/pull/1880

A earlier PR: https://github.com/volcengine/verl/pull/2365 accidentally
removed the weight key conversion:

##### Why it wasn't caught by CI?

Because all CI are based on transformers 4.51, while the issue only
happens for transformer 4.52



### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-09 10:24:52 -07:00
526098d664 [Hardware] feat: Support AMD (ROCMm Kernel) - Update Dockerfile/Docker Image (#2390)
### What does this PR do?

> Update Dockerfile/Docker Image

### Checklist Before Starting
- [X] Search for similar PRs. 
- [X] Format the PR title (This will be checked by the CI)

### Test
>  Done

### API and Usage Example

>  Usage example(s) 

[AMD_toturial](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst).


### Design & Code Changes

>  Dockerfile/Docker Image dependency:
ROCm: 6.3.4 (patch version)
Pytoch: 2.7.0
vllm: >=0.8.5
sglang: >=v0.4.6.post4
megatron-lm: TransformerEngine==1.14.0, megatron-core==0.12.0
Ray: >=2.45

Also allow VLM training

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-09 10:05:43 -07:00
b5e711eab5 [perf] feat: add npu profiler for FSDP backend (#2194)
### What does this PR do?

Add verl profiling support for NPU on FSDP backend

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

There should be no functional changes and performance changes.

### API and Usage Example

Add `verl.utils.profiler.mstx_profile` which implements the
`verl.utils.profiler.profile` interfaces when torch_npu is available.
 
### High-Level Design

This PR references the design of Nsight Systems profiling and implements
`mstx_profile` using the torch_npu interface to enable data collection
on NPU devices.

### Specific Changes

`verl.utils.profiler.mstx_profile` implements the general profiling
interface in `verl.utils.profiler.profile`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-09 19:57:36 +08:00
H
cccc2ef2c9 [cfg] refactor: make the rollout & ref configs more modular (#2410)
### What does this PR do?

move rollout and ref configs to standalone files. cc @ETOgaosion 
for dp_ref/rollout, default values are added to the yaml if
actor_rollout_ref.actor does not exist, so that the yaml can be loaded
independently.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Relying on existing tests.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-08 21:49:43 -07:00
ad33564f84 [sglang] fix: Bug in megatron+sglang TP16 update_weights. (#2336)
### What does this PR do?

> We observe the following when using Megatron + Sglang + TP16:
<img width="1236" alt="image"
src="https://github.com/user-attachments/assets/875d83e6-325a-41c4-b778-81b457b508a1"
/>

After investigation, we found that this was caused by the **cudaipc**
mechanism not supporting cross-machine access. We have resolved and
fixed this bug.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-08 21:07:09 -07:00
H
4def91d511 [data] refactor: move sampler api to experimental (#2381) 2025-07-09 09:56:04 +08:00
004da732d3 [rollout] fix: huggingface model config max_position_embeddings assertion for model with extended context length (#737)
### What does this PR do?

Fix hf config max_position_embeddings assertion error when using rope
type yarn.
When using extra rotary position embedding methods to extend a model's
context window, a new max_position_embeddings should be calculated using
the extend scaling factor provided in the config.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

```python
        if not model_hf_config.rope_scaling:
          ...
        else:
          rope_scaling_factor = model_hf_config.rope_scaling.get("factor", 1.0)
          
          assert model_hf_config.max_position_embeddings * rope_scaling_factor >= config.prompt_length + config.response_length, (
              f"model context length should be greater than total sequence length, got rope_scaling_factor={rope_scaling_factor} 
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. New test case: A simple test case to reproduce
the error of failed assertion when the model is extended using yarn.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: HL <linhaibin.eric@gmail.com>
Co-authored-by: shi yaorui <shiyaorui@dp.tech>
2025-07-08 17:14:23 -07:00
H
a4033afb45 [ci] feat: add docstring checker script and comprehensive docstrings (#2378)
### What does this PR do?

Added a few files where docstring is enforced. We may expand it further
in the future.
```
        "verl/trainer/ppo/ray_trainer.py",
        "verl/trainer/main_ppo.py", 
        "verl/trainer/ppo/reward.py",
        "verl/utils/reward_score/__init__.py",
        "verl/trainer/ppo/core_algos.py",
        "verl/experimental/agent_loop/agent_loop.py",
        "verl/workers/sharding_manager/fsdp_vllm.py",
        "verl/workers/sharding_manager/fsdp_ulysses.py"
```

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-09 05:48:48 +08:00
588f9728f3 [ci] fix: forbid ci on forks (#2412)
### What does this PR do?
Verl does not prohibit forked branches from initiating CI tasks.
Normally, tasks on regular runners will disappear due to the absence of
a runner with the same name, but this is not the case for mlp runners.
Although CI tasks from these forked branches will fail authentication
during subsequent execution, they still generate a large number of
requests for mlp. For this reason, we have set it to "not run on forks".

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-08 19:16:29 +08:00
OC
ec4433cd5d [misc] feat: trace rollout generation and tool calls using weave (#2345)
### What does this PR do?

Provide rollout generation and tool calls details in wandb weave to help
debugging agentic RL.

2 new interfaces:
1. rollout_trace_attr contextmanager: used to mark
sample_index、step、rollout_n and experience name for a trajectory.
2. rollout_trace_op decorator:mark the method to trace. It must be a
method of an instance.


related issue https://github.com/volcengine/verl/issues/2188

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

<img width="1910" alt="截屏2025-07-03 下午4 09 58"
src="https://github.com/user-attachments/assets/ff30bbca-f9c8-434f-a3c2-0e333d16fa68"
/>


<img width="1895" alt="截屏2025-07-03 下午4 11 27"
src="https://github.com/user-attachments/assets/0b9ed8db-58a7-4769-88fb-bda204dc9fc8"
/>



### API and Usage Example
options:
+trainer.rollout_trace.backend=weave: only wandb weave is support in
this PR. Leave the reset of trace tool to the community.
+trainer.rollout_trace.token2text=False: whether append decoded text in
result of run method.

### High-Level Design

n/a

### Specific Changes

Only works for async rollout from agent loop. No effect for sync
rollout.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-08 17:17:46 +08:00
578501e2f8 [sglang] fix: Import Error in the latest sglang (#2275)
### What does this PR do?

The following import is not supported in sglang >= 0.4.8
```
from sglang.srt.openai_api.protocol import Tool
```
https://github.com/sgl-project/sglang/releases/tag/v0.4.8:
> The `sglang/srt/openai_api` directory has been removed and replaced
with `sglang/srt/entrypoints/openai`.

So replaced with
```
try:
    from sglang.srt.entrypoints.openai.protocol import Tool
except ImportError:
    from sglang.srt.openai_api.protocol import Tool
```

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-07-07 20:07:15 -07:00
ee6542248b [sglang] fix: only wake up weights on infer_tp 0 (#2403) 2025-07-07 13:24:56 -07:00
H
3f929af747 [cfg] refactor: make actor config more modular (#2379) 2025-07-08 00:22:03 +08:00
1e7c545eef [tool] fix: Add MCP usage documentation (#2261) 2025-07-07 08:21:32 -07:00
cb3dcc6f2c [ci] feat: use action (#2393)
### What does this PR do?
In https://github.com/volcengine/verl/pull/1979, this PR migrates CI
tasks to the vemlp. However, the early version's setup and cleanup steps
exposed too much procedural code, which we have encapsulated in
https://github.com/volcengine/vemlp-github-runner. For specific usage,
refer to the documentation in `.github/workflows/README.md`of this pr.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-07 17:17:19 +08:00
26e26d1e91 [sglang, rollout, doc] fix: update sglang rollout generate doc (#2385)
### What does this PR do?

This PR updates the documentation for the sglang rollout’s
`generate_sequences` by separating single-turn and multi-turn
explanations for improved readability.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-07 16:10:18 +08:00
26d3a03b02 [misc] refactor: replace pkg_resources with importlib.metadata (#2392)
### What does this PR do?

- pkg_resources is deprecated and will be removed as early as
2025-11-30. This patch switches to importlib.metadata to avoid future
compatibility issues and suppress warnings.

### Checklist Before Starting

- [X] Search for similar PRs.
- [X] Format the PR title as `[{modules}] {type}: {description}`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-07 15:02:04 +08:00
fc35956543 [BREAKING][rollout] feat: repeat DataProto when n>1 in driver instead of rollout workers (#2324)
### What does this PR do?

Before this PR, when `generate_sequences` with sampling param n>1,
DataProto repeat is quit diverge.
- validation: DataProto is repeated by `n` in driver, then chunked and
dispatched to rollout workers.
- training
- batch mode: DataProto is chunked and dispatched to rollout workers,
then repeated in rollout workers
- server mode: DataProto is repeated by `n` in driver, then chunked and
dispatched to rollout workers.

In batch mode, the `chunk-dispatch-repeat` pattern restricts GRPO
training where we have more GPUs than batch_size. For example,
`batch_size=128, n=16, world_size=256`:
- `chunk-dispatch-repeat`: DataProto(batch_size=128) can't be chunked to
256 shards.
- `repeat-chunk-dispatch`: after repeat, DataProto(batch_size=2048) can
be successfully chunked.

After this PR, always repeat DataProto in driver whether it's validate
or training, batch mode or server mode.

> [!IMPORTANT]
> This change breaks almost all recipes and projects using verl GRPO as
submodules.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-07-07 14:57:01 +08:00
OC
4c37c97495 [rollout] fix: sglang async fail with Multi-stage Awake feature (#2365)
### What does this PR do?

Fix a regression from https://github.com/volcengine/verl/pull/1911,
because the PR did not change the sglang async branch.

CI did not catch this error because it only run 1 step, but this error
happen in the second test. So I update the testcases to run 2 steps.

To reproduce the bug, run test:
TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash
tests/special_e2e/ppo_trainer/run_function_reward.sh

It fail with:
```
(WorkerDict pid=1257286) Total steps: 2, num_warmup_steps: 0
(WorkerDict pid=1257286) Actor use_remove_padding=True
(WorkerDict pid=1257286) Actor use_fused_kernels=False
(AsyncSglangServer pid=1260392) FastAPI listen on [192.168.111.48:40451](http://192.168.111.48:40451/)
(WorkerDict pid=1257286) terminate called after throwing an instance of 'c10::Error'
(WorkerDict pid=1257286)   what():  CUDA error: an illegal memory access was encountered
(WorkerDict pid=1257286) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(WorkerDict pid=1257286) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(WorkerDict pid=1257286) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(WorkerDict pid=1257286)
(WorkerDict pid=1257286) Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
(WorkerDict pid=1257286) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbf6036c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/))
(WorkerDict pid=1257286) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fbf60315a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/))
(WorkerDict pid=1257286) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fbf6080d918 in
```



### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20an%20illegal%20memory%20access%20was%20encountered
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
```
(TaskRunner pid=1647269) step:2 - global_seqlen/min:13075 - global_seqlen/max:14837 - global_seqlen/minmax_diff:1762 - global_seqlen/balanced_min:14231 - global_seqlen/balanced_max:14232 - global_seqlen/mean:14231.5 - actor/entropy:2.0606913566589355 - critic/vf_loss:8.7157882153
```
### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-07 13:56:07 +08:00
5cbad83792 [trainer] fix: Use safe masked mean/sum to handle NaN values outside the mask (#2377)
### What does this PR do?

- for numerical stability, handle nan outside the mask when calculating
masked_mean and masked_sum

> We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI
Platform Technology Department , dedicated to developing
high-performance, easily-scalable distributed post-training engines.


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-07 07:45:29 +08:00
891c873827 [sglang, rollout] refactor: use torch.Tensor in async rollout schemas (#2362) 2025-07-06 13:15:35 -07:00
2a01b21331 [ci] fix: PR title check supports module names with underscore (training_utils) (#2383) 2025-07-06 07:32:54 -07:00
H
c71fa392c1 [doc] feat: add July events (#2382) 2025-07-06 15:17:42 +08:00
281ecd4cc1 [doc] fix: Fix document config.rst (#2369)
### What does this PR do?

> Fix document config.rst: the parameter“gemma” -> “gamma”.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/2322
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-05 09:26:42 -07:00
e9b38dc382 Revert "[misc] fix: invalid escape sequence '\*'" (#2376)
Reverts volcengine/verl#2375
2025-07-05 21:08:40 +08:00
H
cbeb3f4dae [rollout] fix: fix hf rollout and add single gpu test (#2371) 2025-07-05 18:51:26 +08:00
50ba712dee [misc] fix: invalid escape sequence '\*' (#2375)
### What does this PR do?

```log
verl/utils/dataset/rl_dataset.py:38: SyntaxWarning: invalid escape sequence '\*'
```

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-05 18:49:15 +08:00
9cc307767b [ray] refactor: Seperate the constants into different file (#2025)
### What does this PR do?

Move the ray runtime env constant into separate file to clean up the
code.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-04 18:50:44 -07:00
H
9b0e327ecd [doc] fix: add show source option (#2370)
### What does this PR do?

Enable API docs with [source code] option
2025-07-04 17:20:12 -07:00
715724c88f [tool] feat: Add support for tools that generate multimodal data (#2146)
### What does this PR do?

This PR adds support for tools to create and return multimodal data
(images and videos) during rollout. It enhances the framework to
properly handle multimodal inputs that are dynamically generated by
tools during multi-turn conversations.

### Key Features

- Tools can now return images and videos as part of their response
- Added support for processing multimodal inputs in the rollout system
- Introduced a new configuration option `return_multi_modal_inputs` to
control how multimodal inputs are processed
- Updated documentation with examples of how to implement tools that
generate multimodal data

### API and Usage Example

```python
async def execute(self, ...) -> Tuple[str | Dict[str, Any], float, dict]:
    # Process images or videos
    from verl.utils.dataset.vision_utils import process_image, process_video

    img1 = process_image(img1)
    video1 = process_video(video1)

    # Return multimodal data
    return {"image": [img1, ...], "video": [video1, ...], "text": "..."}, 0, {}
```

In your dataset config, set:
```yaml
data:
  return_multi_modal_inputs: False
```

### Specific Changes

- Enhanced `AsyncRolloutRequest` to handle multimodal data from tools
- Updated `add_tool_response_messages` to process multimodal content
- Added documentation for multimodal tool support in the RST docs
- Fixed configuration in example YAML files
- Added proper handling of multimodal inputs in the rollout system

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.
2025-07-04 16:32:22 -07:00
1b891dc0fb [cfg] fix: pickleing error in multiprocessing in the reward_fn (#2239)
### What does this PR do?

> Fix "Can't pickle local object" error when using custom reward
functions in multiprocessing


When I import `compute_score` from my own Python file, the `file_path`
is not None and it will call the `wrapped_fn ` method.
You can view the relevant code at the following:

e96f0fbf44/verl/trainer/ppo/reward.py (L25-L57)

When I set` reward_model.reward_manager=prime`, it will call the
`ProcessPoolExecutor` to use asyncio, leading to the error:

`"Can't pickle local object
'get_custom_reward_fn.<locals>.wrapped_fn'".`

Root Cause:

The nested closure `wrapped_fn` created in `get_custom_reward_fn() `is
unpicklable for the following reasons:

- Python's `pickle` cannot serialize local functions (functions defined
inside another function).

- The closure dynamically captures variables (`raw_fn` and
`reward_kwargs`) from its outer scope.

This breaks multiprocessing workflows (e.g., `SubprocVecEnv`,
`multiprocessing.Pool)` that rely on pickling.

---------

Co-authored-by: zelongwang <wang@zelongs-MacBook-Pro.local>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-04 16:06:09 -07:00
dbd4ff189b [data] feat: add interface for user-defined curriculum sampler (#2314)
### What does this PR do?

This PR introduces a flexible interface that allows users to plug in
their own Sampler implementations. This is particularly useful for
advanced training strategies like curriculum learning, where the
sampling policy evolves over time to progressively present the model
with increasingly difficult tasks.

Curriculum learning can significantly accelerate training convergence
and improve generalization, especially in complex domains. By decoupling
the Sampler, users can implement task- or environment-specific
curricula—for instance, starting with simpler examples and gradually
incorporating harder ones, or adapting sampling based on the model’s
competence.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...

There have been previous attempts to add specific samplers for
curriculum learning ( [search
1](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+sampler)
[search
2](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+curriculum)
) but they commit to specific implementations of the curriculum.

This PR just adds an interface so that users can supply their own
implementation. This approach was suggested in [this
comment](https://github.com/volcengine/verl/pull/759/files#r2030220009).

- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)



### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```bash
./examples/sglang_multiturn/run_qwen0.5b_gsm8k_multiturn_curriculum.sh
```

This is a run using the example sampler implementation. here is the diff
of that script with an existing one in case we decide to omit the
example runner in this PR.


```bash

diff ./examples/sglang_multiturn/run_qwen0.5b_gsm8k_multiturn_curriculum.sh ./examples/sglang_multiturn/run_qwen3-4b_gsm8k_multiturn.sh

15,17c15
<     data.curriculum.curriculum_class="RandomCurriculumSampler" \
<     data.curriculum.curriculum_class_path="verl.utils.dataset.curriculum_sampler" \
<     data.dataloader_num_workers=0 \
---
>     data.train_batch_size=256 \
20d17
<     data.train_batch_size=256 \
24c21
<     actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
---
>     actor_rollout_ref.model.path=Qwen/Qwen3-4B \


```

 

### Specific Changes

This PR exposes a new interface so that users can implement their own
`Sampler`. I also provide a trivial implementation of this interface -
`RandomCurriculumSampler` as an example / for test purposes.

### Checklist Before Submitting



- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Fred <frederrx@amazon.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
2025-07-04 10:53:31 -07:00
H
c936ec7d5c [trainer, cfg] feat: add BaseConfig for all dataclass configs. Introduce dataclass for algorithm related configs (#2147)
### What does this PR do?

This PR introduces a BaseConfig class that bridges dataclass and hydra's
DictConfig in the codebase. In this PR, the algorithm related configs
and profiler related configs are instantiated as dataclass upfront for
both main_ppo and main_dapo. The config related changes are expected to
be backward compatible (supporting xx_config.get() API)

Besides, this PR also moves the profiler related files under
verl.utils.debug to verl.utils.profiler.xx. The
`verl.utils.debug.performance.py` is kept for backward compatibility
purpose and we'll drop it in later versions.

Main principle:
- users are not forced to use dataclass configs. All changes are
backward compatible.
- dataclass configs are converted upfront on a per entrypoint basis.
Here we target main_ppo.py and main_dapo.py, and the other recipes'
entrypoints are left intact.
- the new dataclass are intentionally set to be frozen. Configs should
not be mutable. Whenever a new field is needed, we should make a copy of
the config for a new one.
- whenever a dataclass config is introduced, we encourage having simple
cpu-based unit tests to test the basic functionality of functions that
rely on it (e.g. the grpo adv estimation in core_algorithm.py). and then
also update all type annotation for the impacted functions.
- in the yaml file, `_target_` field should be specified for dataclass
conversion. e.g. `_target_: verl.xxx.XXConfig`

The PR is built on top of @liuzhenhai93 's contribution.

### Checklist Before Describing the Details

- [x] Searched for similar PR(s).
- [x] PR title is in the format of: `[modules] type: Title`
  - modules: `trainer, cfg`
  - type: `feat`

### Test

- Added comprehensive unit tests in
`tests/trainer/config/test_algorithm_config_on_cpu.py`,
`test_base_config_on_cpu.py`
- Tests cover dataclass creation, nested configuration handling,
backward compatibility, and integration with core algorithms
- All tests pass successfully, validating the functionality and
integration with existing code

### High-Level Design

The design introduces three dataclasses:
1. **`KLControlConfig`**: Handles KL control parameters (type, kl_coef,
horizon, target_kl)
2. **`PFPPOConfig`**: Manages preference feedback PPO parameters
(reweight_method, weight_pow)
3. **`AlgorithmConfig`**: Main algorithm configuration containing all
fields from the YAML config

The conversion uses the existing `verl.utils.omega_conf_to_dataclass`
utility to seamlessly convert from OmegaConf DictConfig to typed
dataclasses.


### API and Usage Example

The API maintains backward compatibility while providing type-safe
access:

```python
# Before (DictConfig)
if config.algorithm.use_kl_in_reward:
    kl_penalty = config.algorithm.kl_penalty
    kl_coef = config.algorithm.kl_ctrl.get("kl_coef", 0.001)

# After (Dataclass) - Type-safe with IDE support
algorithm_config = omega_conf_to_dataclass(config.algorithm)
if algorithm_config.use_kl_in_reward:
    kl_penalty = algorithm_config.kl_penalty  # Type-safe access
    kl_coef = algorithm_config.kl_ctrl.kl_coef  # Nested config access

# Backward compatibility maintained
gamma = algorithm_config.get("gamma", 1.0)  # Still works


# other cases
profiler_config = omega_conf_to_dataclass(config)
self.assertEqual(profiler_config.discrete, config.discrete)
self.assertEqual(profiler_config.all_ranks, config.all_ranks)
self.assertEqual(profiler_config.ranks, config.ranks)
assert isinstance(profiler_config, ProfilerConfig)
with self.assertRaises(AttributeError):
    _ = profiler_config.non_existing_key
assert config.get("non_existing_key") == profiler_config.get("non_existing_key")
assert config.get("non_existing_key", 1) == profiler_config.get("non_existing_key", 1)
assert config["discrete"] == profiler_config["discrete"]
from dataclasses import FrozenInstanceError

with self.assertRaises(FrozenInstanceError):
    profiler_config.discrete = False

```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --show-diff-on-failure --color=always --all-files`
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

**Note**: This change is fully backward compatible and does not break
any existing APIs. The dataclass provides the same interface as the
original DictConfig while adding type safety and better structure.

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-04 08:12:09 -07:00
5c39b51b4b [hardware] feat: support ray actor sharing situation on ASCEND NPU (#2341)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Support ray actor sharing with other actors on ASCEND NPU.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

Not related.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

Not related.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

Not related.

### Specific Changes

1. Define global var `VISIBLE_DEVICE_PREFIX` in `verl/utils/device.py`
to get `CUDA` or `ASCEND_RT` prefix automatically.
2. Add support for ASCEND NPU when calling `RayClassWithInitArgs` object
with param `sharing_with` specified.

433544f0be/verl/single_controller/ray/base.py (L206-L214)
3. No function params names changed for consistancy.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-04 14:50:52 +08:00
8883b29d86 [trainer] fix: pre-commit broken by #2354 (#2358)
### What does this PR do?

fix: pre-commit broken by #2354

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-04 14:50:34 +08:00
0d2af476b6 [rollout] fix: #1646 stop words for sglang rollout (#1991)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

as title,fix #1646

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-07-03 23:44:44 -07:00
ebb21b7fc7 [docker] refactor: Migrate images to verlai, support latest flash attention and newer CUDA versions in future (#2085)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Migrate images to verlai, upgrade CUDA support to 12.6 and support
latest flash attention

```txt
docker
├── README.md
├── verl0.4-cu124-torch2.6-fa2.7.4
│   ├── Dockerfile.app.sglang.vllm.mcore0.12
│   ├── Dockerfile.app.sglang.vllm.mcore0.13.preview
│   ├── Dockerfile.app.vllm.mcore0.12
│   ├── Dockerfile.app.vllm.mcore0.13.preview
│   ├── Dockerfile.base
│   └── README.md
├── verl0.5-cu126-torch2.7.1-fa2.8.0
│   ├── Dockerfile.app.sglang.mcore0.12
│   ├── Dockerfile.app.sglang.mcore0.13.preview
│   ├── Dockerfile.base.fi0.2.6
│   └── README.md
└── verl0.5-preview-cu128-torch2.7.1-fa2.8.0
    ├── Dockerfile.app.sglang.megatron
    ├── Dockerfile.base.fi0.2.6
    └── README.md
```

- verlai/verl
  - verl0.4
    - base
    - app.sglang.vllm.mcore
    - app.vllm.mcore
  - verl0.5
    - base
    - app.sglang.mcore
    - app.vllm.mcore [may not support now, for debug]
  - verl0.5-preview
    - base
    - app.sglang.mcore
    - app.vllm.mcore [may not support now, for debug]


### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-07-04 14:32:02 +08:00
a53fb3089e [ckpt] fix: edit esi doc (#2354)
This PR addresses the "ESI" comprehension issue left by the previous PR
(https://github.com/volcengine/verl/pull/2192).
This PR refines `ppo_trainer.yaml` by expanding the esi_redundant_time
comment to define ESI (Elastic Server Instance) and draw a parallel to a
training plan. In `ray_trainer.py`, it clarifies ESI-related
checkpoint-saving conditions. These edits boost code readability and
maintainability.
2025-07-04 13:34:12 +08:00
18c6ffcf08 [megatron] fix: optimizer scheduler misalignment with FSDP (#2303)
### What does this PR do?

Fix learning rate divergence with FSDP, megatron.training's default lr
decay policy is linear, but FSDP has not supported this, so return back
to `constant`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-04 10:39:40 +08:00
212d81463c [perf] feat: support entropy checkpointing without rmpad or sp (#2342)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Support entropy checkpointing without rmpad or sp

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

Not related.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

Not related.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

Not related.

### Specific Changes

Add support for entropy checkpointing without remove_padding or SP

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-04 08:32:03 +08:00
OC
aba26845f7 [tool] fix: avoid exception when sandbox return None (#2346)
### What does this PR do?

result.strip() may raise exception when it is None. Fixed by return None
for metrics and score, because they are not available yet for code
sandbox.



### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+sandbox+
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-04 08:31:37 +08:00
0332866857 [algo] feat: mask out observation token in GAE (#2337)
### What does this PR do?

Mask out observation tokens in GAE for multi turn training.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-04 08:17:25 +08:00
H
11ee5125d2 [ci] chore: add gemini code assistant config (#2349)
### What does this PR do?

add gemini code assistant config. The example code reviews are in
https://github.com/eric-haibin-lin/verl/pull/17#pullrequestreview-2977403886.
The threshold is set to high to avoid too many review comments.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-03 16:53:33 -07:00
7db7f32446 [megatron, fsdp, doc] feat: implement GPG loss. Add GPG advantage estimator implementation. (#2057)
…and integrate into PPO training scripts and core algorithms

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?
Implement GPG loss (GPG: A Simple and Strong Reinforcement Learning
Baseline for Model Reasoning) which can achieve comparable performance
in less training time.

### Test
some training records:

![image](https://github.com/user-attachments/assets/e82c5913-94e2-47bf-96b3-b42eac546a18)

![image](https://github.com/user-attachments/assets/2ec4cf1b-a9ee-48d0-b9c5-cbeade1b3a1b)


### Specific Changes

> List the specific changes.
Add doc of GPG in docs/algo/gpg.md
Add the addvantage estimation function of gpg in
verl/trainer/ppo/core_algos.py.
Add compute_gpg_loss function of gpg in verl/ trainer/ppo/core_algos.py.
Add a conditional branch to determine whether to use the GPG loss in
verl/workers/actor/dp_actor.py and megatron_actor.py
Add example scripts of GPG in examples/gpg_trainer. 

### Usage Example

```shell
# Add code snippet or script demonstrating how to use this 
bash examples/gpg_trainer/run_qwen2-7b_math.sh
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-03 15:22:19 -07:00
bc2cc6b34b [rollout] feat: Allow customization of async server class (#2326)
### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-03 17:52:20 +08:00
433544f0be [megatron] feat: use mbridge as megatron adaptor (#2064)
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-07-03 12:49:51 +08:00
0ea96a2673 [cfg] chore: add non-negative expected_len assertion (#2330)
#### Summary
Added a assertion when the overlong buffer configuration is invalid,
specifically when `overlong_buffer_len > max_response_length` which
causes `expected_len` to be negative.

#### Problem
When `overlong_buffer_len` is greater than `max_response_length`, the
calculated `expected_len` becomes negative:
```
expected_len = max_resp_len - overlong_buffer_len  # Results in negative value
```
This causes all reasonable response lengths to be penalized.

#### Solution
Added a `assert self.max_resp_len >= self.overlong_buffer_cfg.len` in
DAPORewardManager


#### Changes Made
1. File: `verl/workers/reward_manager/dapo.py`
2025-07-03 10:57:26 +08:00
4a846aa8f5 [hardward] chore: Enable Generation of Wheel File During Docker Build (#2332)
### What does this PR do?

The PR enhances the Dockerfile.rocm by generating a Python wheel (.whl)
as a part of Docker build process.
Changes introduced:
- Add python setup.py bdist_wheel immediately after pip install -e .
--no-deps
 - The wheel is created inside the container under the dist/ directory

Co-authored-by: HIREMATH <rhiremat@ctr2-alola-ctrl-01.amd.com>
2025-07-02 13:10:51 -07:00
1a4b9779ec [cfg] fix: Security Enhancement Block Dangerous Modules in Sandbox Environment (#2170)
### What does this PR do?

> This PR enhances security in our sandbox environment by disabling
access to potentially dangerous Python modules.

1. Added blocking for subprocess and ctypes modules by setting them to
None in sys.modules
3. Prevents execution of system commands via subprocess.run(),
subprocess.Popen(), etc.
4. Blocks low-level system access through ctypes which could bypass
Python security restrictions

some built-in functions that can be destructive like below
```
import subprocess 
subprocess.run("rm -rf *", shell=True)
```
```
import ctypes
libc = ctypes.CDLL(None)
libc.system(b"rm -rf /*") 
```

---------

Co-authored-by: zelongwang <wang@zelongs-MacBook-Pro.local>
2025-07-02 22:30:33 +08:00
29f50e7dbe [recipe] feat: add retool recipe (#2233)
Add retool training recipe described in [ReTool: Reinforcement Learning
for Strategic Tool Use in LLMs](https://arxiv.org/abs/2504.11536).
2025-07-02 20:05:43 +08:00
2a25e31d29 [doc] feat: FSDP forward prefetch and entropy memory optimizations (#2322)
### What does this PR do?

@eric-haibin-lin As this comment says
https://github.com/volcengine/verl/pull/1927#issuecomment-3018262885,
add FSDP forward prefetch and entropy calculation memory optimization to
performance tuning guide.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-02 19:21:48 +08:00
1bdf4d2bc7 [hardware, recipe, ci] feat: Support fsdp peft sft on npu (#2240)
### What does this PR do?

- Support fsdp peft sft on npu.
- Add CI actions to maintain peft sft and sequence parallelism function
on npu.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

Run examples/sft/gsm8k/run_qwen_05_peft_sp2_npu.sh on gpu and npu:
```xshell
torchrun --standalone --nnodes=1 --nproc_per_node=8 \
     -m verl.trainer.fsdp_sft_trainer \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.prompt_key=extra_info \
    data.response_key=extra_info \
    optim.lr=1e-4 \
    data.prompt_dict_keys=['question'] \
    +data.response_dict_keys=['answer'] \
    data.micro_batch_size_per_gpu=64 \
    model.partial_pretrain=Qwen/Qwen2.5-0.5B-Instruct \
    trainer.default_local_dir=$save_path \
    trainer.project_name=gsm8k-sft \
    trainer.experiment_name=gsm8k-sft-qwen-2.5-0.5b-instruct \
    trainer.logger=['console'] \
    trainer.total_epochs=2 \
    trainer.default_hdfs_dir=null $@ \
    model.lora_rank=32 \
    model.lora_alpha=16 \
    model.target_modules=all-linear \
    model.strategy=fsdp \
    ulysses_sequence_parallel_size=2 \
    use_remove_padding=true
```

Mean absolute error of train loss:

![train_loss_mae](https://github.com/user-attachments/assets/f0c436ae-4d92-44c9-bca8-0b7cde1f4cfe)


### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

Enable sp:
```xhell
--ulysses_sequence_parallel_size=2
--use_remove_padding=true
```

NPU does not support sdpa2, so we need to set model.strategy:
```
--model.strategy=sdpa
```


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-02 15:06:06 +08:00
82d1ef5af2 [sglang] feat: Repeat sampling parameter n into requests of GRPO in SGLang (#2258)
### What does this PR do?

For a large-scale GRPO with a huge sampling parameter n (say 128 or
more), we take the sampling times n out to directly duplicate the
requests.

This is beneficial if our n is relatively large. But we need to check
the order of the input and output requests.

We create unique UIDs for each prompt to enable grouping in GRPO
advantage computation

1. Each prompt gets a unique UID that is repeated n times along with the
prompt
2. After generation, responses are aligned with prompts using UID
matching
3. In GRPO advantage computation, UID groups responses from the same
prompt

Note that we only enable this for sglang `_req_level`, i.e., in ma
ulti-turn setting GRPO.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Will add experiments on the change of rollout time.

### API and Usage Example

N/A

### High-Level Design

N/A

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>
2025-07-02 09:27:24 +08:00
becdb56795 [CI] fix: replace private model in CI test (#2295)
### What does this PR do?

The CI test remote GenRM uses a private model
`dyyyyyyyy/Qwen2.5-1.5B-GenRM-QueryOnly`.
For the stability of the CI, this model has been uploaded to the
official HF repository, i.e., `verl-team/GenRM-CI-Test-1.5B`, and the
model invocation in the CI test has been updated accordingly.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

See `What does this PR do?`

### High-Level Design

See `What does this PR do?`

### Specific Changes

See `What does this PR do?`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-01 15:40:06 +08:00
211984b66f [doc] fix: Update ascend_quick_start.rst (#2293)
### What does this PR do?

Fix doc

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-01 14:45:22 +08:00
ba026ef332 [ci, doc] fix: fix transformers version dependency on Ascend NPU (#2291)
### What does this PR do?

Flash Attention2 in `transformers==4.53.0` is not work on Ascend NPU due
to [PR line
here](3457e8e73e/src/transformers/modeling_flash_attention_utils.py (L109C5-L109C23))
in `transformers`.

In order to not affect `e2e_ascend` CI, we have to set
`transformers==4.52.4` for Ascend NPU situation by force now.

Corresponding bugfix in `transformers` will be conducted as soon as
possible, after newer transformers version containing bugfix released,
we will update the transformers version dependency in verl again.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

Not related

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

Not related

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

Not related

### Specific Changes

`transformers` version in `requirement-npu.txt` and document

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-01 13:45:17 +08:00
H
b66901505f [doc] chore: add contribution guide (#2290)
### What does this PR do?

add contribution guide 
TODO: add one specific doc for the workflow of adding new models 

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
2025-07-01 09:55:40 +08:00
H
00a10a8ef3 [ci] refactor: reduce ruff line-length from 300 to 120 (#2287)
### What does this PR do?

Previously the ruff line-len is too large, making it hard for users to
view code. If we keep the config, manually created short lines will be
formatted to long lines as well. This PR contains 3 commits:
- df4bbfca62f41d972c48c8a76088ae2ac29691cf set line len to 120 and run
pre-commit auto-format
- 9d03f183edd9fff4e22215cacacf62c06b7b41d3 let devin fix the multi-line
code
- 9fc8d436f5007535fad3dc49983b01d0d457be9c skip lint for
test_sglang_async_rollout_sf_tools.py. manually adjust format for
rope_utils.py
- last two commits:
  1. merge with main
2. run lint after merge. add test_sglang_async_rollout_sf_tools.py and
scripts/legacy_model_merger.py to lint.exclude

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR relies on CI for testing.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-01 09:54:40 +08:00
0508af25b6 [doc] feat: more resources (#2284)
### What does this PR do?

Add some resources about verl to the documentation.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 13:50:44 -07:00
024a8b8578 [ckpt, doc] chore: add backward compatibility for model merger and sync docs (#2251)
### What does this PR do?

This PR add missing doc changes in
https://github.com/volcengine/verl/pull/2125:
- Synchronize checkpoint content and verl.model_merger with the latest
code
- Add content on how to merge checkpoints in the quick start
documentation to help users understand how to merge checkpoints

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 18:42:59 +08:00
8b33abd84f [megatron] feat: add megatron memory log (#2272)
### What does this PR do?

Log memory footprints in wandb during running like FSDP does.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 15:27:02 +08:00
6d9ac2f7b8 [algo] fix: correctly aggregate kl metrics in PPO actor (#2259)
### What does this PR do?

This PR fix an issue in dp_actor where `actor/kl_loss` and
`actor/kl_coef` were being continuously overwritten during the
micro-batch processing loop.

Previously, the long-lived `metrics` dictionary was updated directly,
causing the value for these metrics to reflect only the final
micro-batch of any given step, rather than an aggregation of all
micro-batches within that step.

This change refactors the logic to align the collection of all metrics,
now `kl_loss` is collected for each micro-batch, the same as other
metrics like `pg_loss`.


> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 15:26:07 +08:00
7ac0d98f09 [trainer, vllm] feat: add lora exclude_modules to support VL model lora training (#2182)
### What does this PR do?

Regarding multimodal models, vLLM currently only supports adding LoRA to
language model. We can use exclude_modules in lora config to exclude the
ViT part from applying lora finetuning. Anyway, just prepare this
feature for any possible use.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+lora+exclude+is%3Aopen.
you will see my closed pr
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
Qwen2.5-VL-7B + GRPO + geo3k:
- blue: full parameters
- yellow: all-linear + exclude visual, lora rank 64
- red: all-linear w/o exclude visual, lora rank 64
- purple: ["q_proj","gate_proj"], + exclude visual, lora rank 16

- The red directly failed as expected with KeyError:
'blocks.0.attn.qkv.base_layer.weight'. Any mismatching in module names
will also fail directly, so successful runs validate the correctness.
- The val generations of lora VLM all look normal.
- Add "Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with
rmpad using function rm" test to e2e_ppo_trainer_vllm_vlm


![企业微信截图_1750837404244](https://github.com/user-attachments/assets/336261f0-5260-45e2-8312-86eb1ae375a5)

![企业微信截图_1750837525244](https://github.com/user-attachments/assets/eafae66e-6b61-4db4-853b-a3a0425be2aa)

![企业微信截图_17508374562057](https://github.com/user-attachments/assets/f01b098a-b383-4cc6-8f14-d51978121b59)

![企业微信截图_17508374786794](https://github.com/user-attachments/assets/75b4d566-cb63-4b63-9b85-300e02711739)

![企业微信截图_17508374937879](https://github.com/user-attachments/assets/8ed2979d-30b7-4d4f-85ad-0fee6aded619)


### API and Usage Example

For Qwen2.5VL, set
`actor_rollout_ref.model.exclude_modules='.*visual.*'`
It should be similar for other VLMs, e.g.
`actor_rollout_ref.model.exclude_modules='.*vision_tower.*'` for
kimi-vl.
To avoid failure for special architectures, specifying
actor_rollout_ref.model.target_modules is recommended over setting
actor_rollout_ref.model.target_modules=all-linear


### High-Level Design

The main conflict is that unlike Peft which only adds base_layer to
validated target_modules, vllm adds base_layer to all-linaer modules
(q/k/v/gate/up/down) of LLM with lora applied. When dealing with modules
to be stacked in vllm (qkv, gate_up), base_layer must be added to their
module name, which can unexpectedly involve all-linear modules of the
visual architecture, as is in current
`FSDPVLLMShardingManager.update_params.replace_lora_wrapper`.
My solution is prioritizing lora exclude_modules to ensure that lora and
base_layer will not be added to ViT, while the rest cases should remain
unchanged.

### Specific Changes

- add exclude_modules field to ppo_trainer.yaml
- adapt check_target_module_exists from Peft for standard
target/exclude_modules checking
- refactor replace_lora_wrapper in sharding_manager/fsdp_vllm.py for
correctly matching base_layer modules

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 12:15:48 +08:00
H
52065c6405 [BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support (#2257)
### What does this PR do?

This PR removes support for vLLM versions 0.5.4 and 0.6.3 from the verl
repository, completing a comprehensive cleanup of legacy
version-specific code branches. The changes simplify the codebase by
eliminating conditional logic and version-specific implementations,
requiring users to upgrade to vLLM 0.7.0 or later (recommended: vLLM
0.8.3+).

**Key Changes:**
- Deleted legacy rollout implementations (`fire_vllm_rollout.py`,
`vllm_rollout.py`, `test_vllm_hf_loader.py`)
- Removed version-specific directories (`vllm_v_0_5_4`, `vllm_v_0_6_3`) 
- Simplified sharding managers by removing `customized_vllm` flag
conditionals
- Updated configuration files to remove deprecated options
(`use_fire_sampling`)
- Cleaned up documentation and environment variable exports

### Checklist Before Starting

- [x] Search for similar PRs: No similar PRs found for this specific
cleanup
- [x] Format the PR title as `[BREAKING][vllm, rollout, worker]
refactor: Remove vLLM 0.5.4 and 0.6.3 support`
  - Modules: `vllm`, `rollout`, `worker` (primary affected components)
  - Type: `refactor` (code cleanup and simplification)
  - Breaking: Yes, requires vLLM version upgrade

### Test

This PR has been validated through:
- **CI Pipeline**: All existing tests pass with vLLM 0.7.0+ (27 checks
pending/running)
- **Version Detection**: New version check logic properly rejects vLLM
0.5.4/0.6.3 with clear error messages
- **Merge Conflict Resolution**: Successfully resolved complex conflicts
during main branch merge
- **Pre-commit Checks**: All linting and formatting requirements
satisfied

### API and Usage Example

**Breaking Changes:**
- **vLLM Version Requirement**: Minimum supported version is now 0.7.0
(recommended: 0.8.3+)
- **Removed Configuration Options**: `use_fire_sampling` no longer
available in config files
- **Environment Variables**: `VLLM_ATTENTION_BACKEND=XFORMERS` exports
removed (not needed for vLLM 0.7.0+)

**Migration Guide:**
```bash
# Before: vLLM 0.5.4/0.6.3 with custom flags
pip install vllm==0.6.3
export VLLM_ATTENTION_BACKEND=XFORMERS

# After: vLLM 0.8.3+ with V1 API
pip install vllm>=0.8.3
export VLLM_USE_V1=1  # Recommended for optimal performance
```

**Updated Configuration:**
```yaml
# generation.yaml - removed use_fire_sampling option
rollout:
  name: vllm_rollout
  # use_fire_sampling: False  # <- REMOVED
  
# Use standard vLLM rollout without legacy options
```

### High-Level Design

```mermaid
graph TB
    subgraph "Before: Multi-Version Support"
        A1[vLLM Version Check] --> B1{Version 0.5.4?}
        A1 --> B2{Version 0.6.3?}
        A1 --> B3{Version 0.7.0+?}
        B1 --> C1[Legacy vllm_v_0_5_4 Code]
        B2 --> C2[Legacy vllm_v_0_6_3 Code]
        B3 --> C3[Modern vLLM Code]
    end
    
    subgraph "After: Simplified Support"
        A2[vLLM Version Check] --> B4{Version >= 0.7.0?}
        B4 -->|Yes| C4[Modern vLLM Code Only]
        B4 -->|No| C5[Clear Error Message]
    end
```

### Specific Changes

**Deleted Files:**
- `verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py`
- `verl/workers/rollout/vllm_rollout/vllm_rollout.py` 
- `tests/workers/rollout/rollout_vllm/test_vllm_hf_loader.py`
- `verl/third_party/vllm/vllm_v_0_5_4/` (entire directory)
- `verl/third_party/vllm/vllm_v_0_6_3/` (entire directory)
- `pytest.ini`

**Modified Core Files:**
- `verl/third_party/vllm/__init__.py`: Simplified version detection with
clear error messages
- `verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py`: Removed
cache engine management and version conditionals
- `verl/workers/sharding_manager/fsdp_vllm.py`: Dropped
`customized_vllm` flag logic
- `verl/workers/sharding_manager/megatron_vllm.py`: Simplified weight
loading and cache management

**Configuration Updates:**
- `verl/trainer/config/generation.yaml`: Removed `use_fire_sampling`
option
- `verl/trainer/config/ppo_trainer.yaml`: Removed `use_fire_sampling`
option
- `tests/special_sanity/check_api_docs.py`: Removed `LLMEngine` from
whitelist

**Documentation Updates:**
- `docs/start/install.rst`: Updated to recommend vLLM 0.8.3+ with
`VLLM_USE_V1=1`
- `docs/perf/perf_tuning.rst`: Updated performance recommendations
- Removed 42+ `VLLM_ATTENTION_BACKEND=XFORMERS` exports from bash
scripts

**Reverted Changes:**
- `.github/workflows/vllm.yml`: Restored original container image names
- `docs/faq/faq.rst`: Restored original apptainer commands
- `docs/ascend_tutorial/ascend_quick_start.rst`: Reverted all
modifications
- `examples/tuning/*/`: Restored original `nproc_per_gpu` settings

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --all-files --show-diff-on-failure --color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs):
Updated install and performance tuning docs
- [x] Add unit or end-to-end test(s): Existing CI tests validate the
changes; legacy-specific tests were removed as intended
- [x] **CI Request**: Once PR is ready, message will be sent to
`ci-request` channel in verl Slack workspace

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-06-29 19:27:22 -07:00
72429f21b7 [rollout] feat: add zeromq vllm distributed executor (#2246)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 09:24:44 +08:00
H
2805ce9137 [doc, ci] fix: fix sandbox doc and enhance CI trigger filter and doc error checking (#2267)
### What does this PR do?

- fix sandbox doc 
- enhance CI trigger filter and doc error checking
- add a rule to check PR description

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 08:02:03 +08:00
86ef66ebe6 [trainer] fix: fix split placement (#2227) 2025-06-29 12:42:51 -07:00
afee3acb5d [rollout] fix: Make free_cache_engine option workable in latest vLLM/SGLang (#1464)
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

Make `free_cache_engine` option workable in latest vLLM/SGLang

### High-Level Design

It looks like `actor_rollout_ref.rollout.free_cache_engine` control
option only works for vLLM version 0.5.4 and 0.6.3, the sleep / wake up
mode in vLLM engine, as well as release / resume memory occupation in
SGLang is enabled by default and there's no way to turn them off.

While always alllowing inference engine to free cache can be ideal, it's
unfortunately not supported on some devices, such as AMD Mi250x, since
it doesn't support virtual memory management:
https://github.com/vllm-project/vllm/pull/12695#issuecomment-2633919751

So we would need to be able to turn it off so that verl can run on those
devices.

In addition, it looks like we no longer need to enforce eager in latest
vLLM when we choose to free cache, so this PR also lifted this
restriction.

### Additional Info.

- **Issue Number**: None
- **Training**: both FSDP and Megatron
- **Inference**: both vLLM and SGLang

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if neccessary.

---------

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-06-29 08:09:54 -07:00
072725c385 [trainer, recipe] feat: add support for external generative reward models (#2121)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support External Generative Reward Model.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-29 14:42:14 +08:00
H
7559a6a938 [doc] fix: add time info for each doc, assert sphinx warning in CI (#2255)
### What does this PR do?

add time info for each doc, assert sphinx warning in CI.
The time info is helpful for the community to identify docs that may be
too old before it's actually removed or updated.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-06-29 11:58:35 +08:00
H
bd1be62df0 [ci] fix: fix cpu dataset git download error (#2256) 2025-06-28 16:06:01 -07:00
fda87b8046 [worker] fix: OOM on first iteration in multi-turn RL (#2253)
### What does this PR do?

Fix issue #2189.

This bug was introduced in #1911, which relocated
`resume_memory_occupation` in resharding phase before calling
`get_torch_device().empty_cache()`.

Calling `resume_memory_occupation` without emptying cache before will
cause OOM on resharding phase of the first iteration, which prevents the
example `run_qwen2.5-3b_gsm8k_multiturn` to run.

Re-adding `get_torch_device().empty_cache()` solves the problem, and
allows the example to run again.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-28 17:35:33 +08:00
H
a306434806 [doc] chore: version bumped to v0.4.1.dev and doc fixes (#2226)
v0.4.1 is released and bump the version number to v0.4.1.dev
2025-06-27 20:14:23 -07:00
OC
ce6a7b8449 [rollout] fix: use flashattn3 backend in sglang to avoid error in tool call (#2244)
### What does this PR do?

Fix error found in https://github.com/volcengine/verl/issues/2242

In none-hopper gpu, llm can not use tools because sglang use flashinfer
in default on this type of hardware.
Changed backend to flashattn3 to avoid this error

### Test

ROLLOUT_NAME=sglang pytest -svvv
tests/experimental/agent_loop/test_basic_agent_loop.py
### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-28 09:35:53 +08:00
8ba2f27cb2 [misc] chore: pin transformers under 4.53 (#2241)
### What does this PR do?

Transformers 4.53 does not work with the current vLLM for Qwen2-VL
models:
https://github.com/vllm-project/vllm/issues/19833#issuecomment-3011175952

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-27 18:12:18 +08:00
e96f0fbf44 [model] fix: separate minicpmo data (#2212)
### What does this PR do?

This PR moves the data process code of minicpm-o to recipes to avoid
breaking the current function

Fixes #2178

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-27 14:20:40 +08:00
b816d17056 [sglang] feat: Add multi-interaction registry support and testing (#2184)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
2025-06-27 14:00:37 +08:00
d8ecba318f [ckpt] feat: support esi execution environment (#2192)
Volcengine provides users with ESI(Elastic Instance). We supported the
reserved instances for the vemlp, and when an ESI is about to expire,
the logic to save the checkpoint (CKPT) will be triggered to reduce
training data loss.
We also support ESI for AWS.
2025-06-27 11:07:27 +08:00
466ef1ad47 [misc] fix: add license (#2230)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-27 11:06:31 +08:00
790a8a29c5 [rollout] feat: add agent loop (#2124)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Add AgentLoopBase and AgentLoopManager for agentic rollout.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design
New Components:
- **AgentLoopBase**: abstract class represents the loop of a single
prompt's rollout, in the loop, agent may chat with OpenAI compatible LLM
server and interact with various environments.
- **AsyncLLMServerManager**: send chat completion requests to multiple
LLM servers, providing load balance and sticky session.
- **AgentLoopManager**: get a batch of prompts from dataloader and split
to multiple AgentLoopWorker
- **AgentLoopWorker**: for each prompt, create a AgentLoopBase instance,
run loop task.

<img width="885" alt="image"
src="https://github.com/user-attachments/assets/1f949719-c000-4b94-9ee2-c8a8ff71b109"
/>


### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-27 09:29:29 +08:00
b2235f0a55 [recipe] fix: unsupported operand type(s) for |: 'dict' and 'DictConfig' (#2217)
### What does this PR do?
#### Fix https://github.com/volcengine/verl/issues/2216
#### 1 Fix Config Reference in entropy_trainer.yaml
#### 2 Fix TypeError When Merging `reward_kwargs` and
`cfg_reward_kwargs`
### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

#### 1 Fix Config Reference in entropy_trainer.yaml
- Modified File : `recipe.entropy.config.entropy_trainer.yaml`
- Change: 
```yaml
- reward_model.reward_kwargs.overlong_buffer_cfg: $reward_model.overlong_buffer
+ reward_model.reward_kwargs.overlong_buffer_cfg: ${reward_model.overlong_buffer}
```
- Purpose : Ensures OmegaConf correctly resolves the reference as a
DictConfig object instead of interpreting it as a string.
#### 2 Fix TypeError When Merging `reward_kwargs` and
`cfg_reward_kwargs`
- Modified File : `recipe.entropy.main_entropy.py`
- Change :
```yaml
- reward_fn = load_reward_manager(config, tokenizer, num_examine=0, **(merge_dict(reward_kwargs, cfg_reward_kwargs)))
+ reward_fn = load_reward_manager(config, tokenizer, num_examine=0, **OmegaConf.merge(OmegaConf.create(reward_kwargs), cfg_reward_kwargs)) 
```
- Purpose : Use OmegaConf.merge() to safely merge dict and DictConfig
types.
> Background :
> The DAPORewardManager class accesses the `enable` attribute from
`overlong_buffer_cfg`.
> This fails if `overlong_buffer_cfg` is a regular dict.

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-26 17:27:01 -07:00
ed0f308acb [ckpt] fix: conditionally import fsdp/megatron backend (#2224)
### What does this PR do?

`verl/model_merger/__main__.py` allows the user to specify either the
FSDP backend or the Megatron backend, but it forces the user to have
both backends installed. This change moves those imports under the
backend conditional, relieving that requirement.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=merger+is%3Aopen+
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: hard to check these
conditionals in the CI environment, since both dependencies are in the
runner's image
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-26 16:40:47 -07:00
ff750e2472 [trainer] fix: indentation error leading to critic_output.get() failure (#2143)
### What does this PR do?

This PR addresses an `IndentationError` that was causing the
`critic_output.get()` call to fail when `self.use_critic` was false.

### Checklist Before Starting

- [x] Search for similar PRs. [The PR cause the
problem](https://github.com/volcengine/verl/pull/281)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> None. This is just a simple bug fix involving a few lines of code.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> This is just a simple bug fix involving a few lines of code.

### Specific Changes

> This is just a simple bug fix involving a few lines of code.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 14:50:09 -07:00
OC
7b824077d5 [misc] feat: support ValidationGenerationsLogger in vemlp_wandb (#2191)
### What does this PR do?

Implement vemlp_wandb in ValidationGenerationsLogger in order to write
validation log into it.
2025-06-26 20:34:25 +08:00
4f1ece8bed [recipe] fix: parameter order in RayPRIMETrainer super().__init__() call (#2172)
### What does this PR do?
<!-- > Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review. -->

- Fixes incorrect parameter order in `RayPRIMETrainer.__init__()` when
calling `super().__init__()`.
- The missing `processor` parameter was causing all subsequent
positional arguments to be passed to wrong parameters, leading to
`reward_fn` being passed as `processor` and `val_reward_fn` being passed
as `reward_fn`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+PRIME
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

<!-- > Demonstrate how the API changes if any, and provide usage
example(s) if possible.

```python
# Add code snippet or script demonstrating how to use this
``` -->
- No breaking changes to existing API

### High-Level Design

<!-- > Demonstrate the high-level design if this PR is complex. -->
- Simple parameter alignment fix, no design changes

### Specific Changes

<!-- > List the specific changes. -->
- Added `reward_fn=my_reward_fn` and `val_reward_fn=my_val_reward_fn` to
the `super().__init__()` call in `RayPRIMETrainer.__init__()` to
maintain correct parameter alignment with parent class RayPPOTrainer
- Ensures `reward_fn` and `val_reward_fn` are passed to their intended
parameters instead of being shifted due to missing processor argument

### Checklist Before Submitting

<!-- > [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review. -->

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 19:37:36 +08:00
a9e3a8fa41 [model] fix: make vlm patch forward compatible (#2215)
### What does this PR do?

Fixes #2213

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 19:35:53 +08:00
85bacb1ccc [trainer] fix: Add __init__.py to verl.trainer.config (#2214)
### What does this PR do?

Add `__init__.py` for verl.trainer.config, so that it can be discussed
by hydra via searchpath `pkg://`.

In my use case, I want to implement my own trainer with my own config,
similar to DAPO did. I noticed that when DAPO inherits the config, it
directly uses the relative path of VERL. This is not applicable in my
case. My code base is another separated directory, I can't know for sure
where VERL is installed in my environment.

Usage of `pkg://` of
[hydra](https://hydra.cc/docs/advanced/search_path/) looks suitable in
my case. However, it complains:

```
lib/python3.10/site-packages/hydra/_internal/config_loader_impl.py:216: UserWarning: provider=hydra.searchpath in main, path=verl.trainer.config is not available.
  warnings.warn(
```

This is because `__init__.py` does not exist. In hydra documentation, it
states specifically:

> pkg:// points to an importable Python module, with . being the
separator. __init__.py files are needed in directories for Python to
treat them as packages.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+config+init
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

This PR supports usages like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config  # Originally it has to be file:///path/to/verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

my_custom_server:
  port: 9999

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
```

### High-Level Design

N/A

### Specific Changes

Adds an `__init__.py`.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: not applicable.
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Yuge Zhang <scotyugochang@gmail.com>
2025-06-26 16:36:28 +08:00
43a5ab3378 [trainer] fix: add missing qwen2_moe flops counter (#2190)
### What does this PR do?

Add missing qwen2_moe flops counter, shall be the same as original
qwen3-moe counter.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 13:01:40 +08:00
02549d99cc [data] fix: fix the type of parquet_files in SFTDataset (#2203)
### What does this PR do?

Fix the type of parquet_files in sft_dataset.py. When sending a list of
files, the type of parquet_files is ListConfig, not List[str].

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 08:46:31 +08:00
3b3e597042 [megatron] feat: Support of dist checkpoint (#2125)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support of dist checkpoint in saving, loading and model merger.

### Test

Algorithm:

<img width="783" alt="image"
src="https://github.com/user-attachments/assets/9a200b47-5937-426a-8da6-c601d2d8328f"
/>

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-25 17:17:29 +08:00
411751e610 [model] feat: Add MiniCPM-o 2.6 support (#2178)
@RanchiZhao We reverted the previous commit because we find a critical
bug in #1833
https://github.com/volcengine/verl/pull/1833/files#diff-e06e73d3a7775a502b7aea91103e7911f6597eb48e4b898db558766cdd41daf9R119-R121

The indentation size of the if-else block is incorrect
2025-06-25 16:30:20 +08:00
c5d4d90af7 [doc] fix: Fix a typo in the profiler's document (#2141)
### What does this PR do?
Fix a typo in the profiler's document, `use_profiler` should be
`use_profile`.


9b7bb69ea3/verl/utils/debug/profile.py (L49)


> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-25 11:09:59 +08:00
fc6ebc9ebe [megatron,vllm] fix: megatron vllm async rollout server (#2122)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

fix megatron vllm async rollout, releated to
https://github.com/volcengine/verl/pull/2008 and
https://github.com/volcengine/verl/issues/2001

> We are from the Large Model Post-Training Team of **📕 Xiaohongshu's AI
Platform Technology Department** , dedicated to developing
high-performance, easily-scalable distributed post-training engines.

### Test

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-25 10:23:21 +08:00
dc805c7897 [ci] fix: enable e2e ppo trainer test (#2174)
### What does this PR do?

Fix bugs introduced by https://github.com/volcengine/verl/pull/2113

Do not skip the e2e tests when pushing changes to the main branch

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-24 20:16:17 +08:00
68d62518ce [misc] fix: fix timer importance error in split_placement (#2169)
### What does this PR do?

fix timer importance error in split_placement, should use `from
verl.trainer.ppo.ray_trainer import marked_timer`, but got `from
verl.trainer.ppo.ray_trainer import _timer` now.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### High-Level Design

Not related.

### Specific Changes

fix timer importance error in split_placement, should use `from
verl.trainer.ppo.ray_trainer import marked_timer`, but got `import
_timer` now.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-24 17:06:12 +08:00
24707f6d4e [model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" (#2176)
Reverts volcengine/verl#1833
2025-06-24 16:44:57 +08:00
e1039aed4f [model] feat: Add MiniCPM-o 2.6 support (#1833)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add MiniCPM-o 2.6 multimodal model support to VERL framework for
vision-language RL training.

### Specific Changes

- **verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py**: Add
MiniCPM-o weight loading
- **verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py**: Add
MiniCPM-o weight loading
- **verl/utils/dataset/vision_utils.py**: Enhanced vision data
processing
- **verl/utils/dataset/rl_dataset.py**: Multimodal dataset support
- **verl/utils/flops_counter.py**: Vision model FLOPS calculation
- **verl/workers/actor/dp_actor.py**: Multimodal model compatibility
- **examples/grpo_trainer/run_minicpmo2_6.sh**: Complete training
example

### Usage Example

```bash
# Train MiniCPM-o 2.6 with GRPO
bash examples/grpo_trainer/run_minicpmo2_6.sh
```

### Test

- [x] Local testing with MiniCPM-o 2.6 on geo3k dataset
- [x] Verified weight loading for both vLLM versions
- [x] Training script validation

### Checklist Before Submitting

- [x] Read the Contribute Guide
- [x] Apply pre-commit checks (will fix in follow-up if needed)
- [ ] No breaking API changes
- [ ] Documentation updates (if needed)
- [x] Rely on existing unit tests

---------

Co-authored-by: RanchiZhao <ranchizhao@example.com>
2025-06-24 14:56:50 +08:00
08be380f95 [worker] feat: allow dist shared file-system initialization (#2154)
### What does this PR do?

Allow torch.distributed.init_process_group to fetch "DIST_INIT_METHOD"
from os.environ to accelerate single node initialization.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Before using shared file-system initialization


![企业微信截图_17506726709312](https://github.com/user-attachments/assets/cce62dab-dbea-496e-bb60-5cd4e88f8809)

After export DIST_INIT_METHOD='file:///tmp/torch_dist'


![企业微信截图_17506729178154](https://github.com/user-attachments/assets/6ed23d76-dda8-44fc-8cb8-5596da0c606d)

### API and Usage Example

Simply add ```export DIST_INIT_METHOD='file:///tmp/some_file'``` to your
script, and remember to ```rm -rf /tmp/some_file``` before your next
run.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: very simple to
reproduce
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-24 13:16:09 +08:00
2a6212385a [rollout] feat: Support Multi-stage Awake for SGLang (#1911)
Co-authored with: MrAta (immrata@gmail.com)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

### Motivation

In RL Ecosystem which use colocate design like
[verl](https://github.com/volcengine/verl/tree/main), we need to offload
training model and load serving model & KV Cache frequently.


#### Background
- Currently SGLang is using
[torch_memory_saver](https://github.com/fzyzcjy/torch_memory_saver) to
pause and resume.
- [torch_memory_saver](https://github.com/fzyzcjy/torch_memory_saver) is
a open source repo that provided easy to use api to hack **cudaMalloc**
and **cudaFree** to make sure the virtual address could be consistent
after pause and resume, which is critical to ensure CUDA Graph work.
- CUDA Graph is critical to make sure SGLang runs faster in decoding
phases.


#### Here is the current behavior of VERL + SGLang


![Image](https://github.com/user-attachments/assets/e87e7dd6-f223-4de6-8f07-915eb2030ea8)

1. During Training, we have training model and optimizer state in the
GPU Memory, and once training is done, we will offload optimizer state
to cpu and keep the model weights in GPU, which is needed in Update
Weight.
2. During Update Weight, we awake the SGLang engine, so those paused
memory of Model Weights and KV Cache will come back. Then we update
model from training model to serving model on the fly using the api:
`update_weights_in_tensor`
3. After Model being updated, we delete the training model from GPU
Memory.


Above design works pretty well so far, however, this would waste a big
chunk of GPU Memory during rollout, which could cause a few issues we've
seen so far:
- **Small KV Cache**: We need to use relative lower number of mem
fraction ratio (e.g: 0.6), hence our KV Cache has less tokens. Given KV
Cache has less tokens, we will hit `RuntimeError: Prefill out of memory.
Try to lower your batch size.` when we try prefill large number of
requests.
- **Out of Memory**: If we use mem fraction ratio 0.8 and run RL for 32B
model on 8 H100, it will OOM during update weight


#### Challenge
- `torch_memory_saver` currently only supports Singleton, hence SGLang
will pause and resume KV Cache + Weights together, they are treated as
the same group of memory controlled by the singleton
`torch_memory_saver` instance

#### Proposal

![Image](https://github.com/user-attachments/assets/7fda9638-0dc2-4c14-bc64-cd20616f350f)

1. During Training, we do the same
2. During Update Weight Stage 1, we awake the model weights from SGLang
and then update weights
3. During Update Weight Stage 2, we delete the training model weights
from GPU Memory
4. Awake the SGLang's KV Cache



![Image](https://github.com/user-attachments/assets/f3dab327-dc2e-4ed8-88d7-15e383f77d25)


### Benefit
With above feature, we can train larger model with same GPU, we can also
make training/rollout more efficient given we can allocate larger KV
Cache

### Solution: Keep using Singleton and provide tag based pause/resume

- [x] Support tag based resume/pause:
https://github.com/fzyzcjy/torch_memory_saver/pull/20
- [x] Support Multiple Stage Awake in SGLang:
https://github.com/sgl-project/sglang/pull/7099
- [ ] Support Multiple Stage Awake in verl:
https://github.com/volcengine/verl/pull/1911

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

![Screenshot 2025-06-19 at 12 16
19 PM](https://github.com/user-attachments/assets/a95dd57e-43e1-4f28-8a84-003ec5c043fc)
![Screenshot 2025-06-19 at 12 13
14 PM](https://github.com/user-attachments/assets/f1f4a8a8-1845-4fad-9424-5526d4154dd0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-06-23 14:03:35 -07:00
d69528fe38 [rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True (#2156)
### What does this PR do?

> fix : batch size and the size of raw_prompt unmatching when setting
`data.return_raw_chat=True`

fix bug when using `data.return_raw_chat=True` in GRPO algorithm with
reward model:
` File
"/ossfs/workspace/repository/verl/verl/single_controller/ray/base.py",
line 625, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
File
"/ossfs/workspace/repository/verl/verl/single_controller/base/decorator.py",
line 534, in inner
    return func(*args, **kwargs)
File "/ossfs/workspace/repository/verl/verl/workers/fsdp_workers.py",
line 634, in generate_sequences
    output = self.rollout.generate_sequences(prompts=prompts)
File "/ossfs/workspace/repository/verl/verl/utils/debug/performance.py",
line 78, in f
    return self.log(decorated_function, *args, **kwargs)
File "/ossfs/workspace/repository/verl/verl/utils/debug/performance.py",
line 88, in log
    output = func(*args, **kwargs)
File
"/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py",
line 116, in decorate_context
    return func(*args, **kwargs)
File
"/ossfs/workspace/repository/verl/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py",
line 346, in generate_sequences
    return DataProto(batch=batch, non_tensor_batch=non_tensor_batch)
  File "<string>", line 6, in __init__
File "/ossfs/workspace/repository/verl/verl/protocol.py", line 214, in
__post_init__
    self.check_consistency()
File "/ossfs/workspace/repository/verl/verl/protocol.py", line 325, in
check_consistency
assert val.shape[0] == batch_size, f"key {key} length {len(val)} is not
equal to batch size {batch_size}"
AssertionError: key raw_prompt length 128 is not equal to batch size
640`


### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-23 20:56:12 +08:00
2ac410f001 [fsdp] feat: support fsdp2 save hugging face model (#2138)
### What does this PR do?

Support FSDP2 save HF model. Previously only supported FSDP1, and FSDP2
will lead to error in https://github.com/volcengine/verl/issues/1703.

Fix https://github.com/volcengine/verl/issues/1703.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-23 15:15:36 +08:00
644aaa76bc [sglang] feat: add multimodal input to multiturn async rollout (#2014)
### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
2025-06-22 15:43:46 -07:00
e67ee86f8b [tool] feat: Add memory limit configuration for sandbox fusion (#2105) 2025-06-22 11:06:00 -07:00
c7aa5e845d [sglang] feat: Support async multi-turn rollout with simulation feedback in sglang (#1630) 2025-06-22 09:47:14 -07:00
dff6b96843 [ray] feat: add a test to demonstrate how to perform p2p communication inside wor… (#2131)
…ker group

### What does this PR do?

As title

### Checklist Before Describing the Details

- [ ] Searched for similar PR(s).
- [ ] PR title is in the format of: `[modules] type: Title`
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data,
cfg`
  - type is in `feat, fix, refactor, chore, test`
- multiple modules are seperated by `,` or space, such as `[megatron,
fsdp, doc] feat: xxx`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --show-diff-on-failure --color=always --all-files`
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-22 09:45:58 -07:00
H
ade658f48e [doc] fix: fix index rendering (#2127)
### What does this PR do?

fix the rendering

### Checklist Before Describing the Details

- [x] Searched for similar PR(s).
- [x] PR title is in the format of: `[modules] type: Title`
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data,
cfg`
  - type is in `feat, fix, refactor, chore, test`
- multiple modules are seperated by `,` or space, such as `[megatron,
fsdp, doc] feat: xxx`



### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --show-diff-on-failure --color=always --all-files`
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-22 09:45:44 -07:00
9b7bb69ea3 [BREAKING][ci] feat: add CI request channel & improve PR template (#2126) 2025-06-21 20:33:31 -07:00
76f63cffa5 [fsdp] refactor: set actor's strategy as default for critic and ref (#2130)
### What does this PR do?

Set actor's strategy as the default strategy for critic, ref and reward
model. In principle, all actors should use the same strategy. With this
change, we can set `STRATEGY=fsdp2` in `run_function_reward.sh` and all
models can use fsdp2 as strategy, instead of setting it for each role
individually.

### Checklist Before Describing the Details

- [x] Searched for similar PR(s).
- [x] PR title is in the format of: `[modules] type: Title`
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data,
cfg`
  - type is in `feat, fix, refactor, chore, test`
- multiple modules are seperated by `,` or space, such as `[megatron,
fsdp, doc] feat: xxx`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --show-diff-on-failure --color=always --all-files`
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-21 23:42:43 +08:00
9bc360aa97 [worker] feat: add support for dynamic batch size of multimodal data (#2049)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Add support for dynamic batch size (data packing) of multimodal dataset.

Add an example script
`examples/grpo_trainer/run_qwen2_5_vl-7b_seq_balance.sh`.
### Test

The console log from training Qwen2.5-VL-7B with PPO on the Geo3K
dataset (`examples/grpo_trainer/run_qwen2_5_vl-7b_seq_balance.sh`). The
experiment was conducted on a single node
with 8 NVIDIA A800 GPUs.  

```
[2025-06-17 02:42:10] (WorkerDict pid=13539) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch [repeated 7x across cluster]
[2025-06-17 02:42:10] (WorkerDict pid=13361) Model config after override: Qwen2_5_VLConfig {
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "architectures": [
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "Qwen2_5_VLForConditionalGeneration"
[2025-06-17 02:42:10] (WorkerDict pid=13361)   ],
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "attention_dropout": 0.0,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "eos_token_id": 151645,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "hidden_act": "silu",
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "hidden_size": 3584,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "image_token_id": 151655,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "initializer_range": 0.02,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "intermediate_size": 18944,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "max_position_embeddings": 128000,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "max_window_layers": 28,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "model_type": "qwen2_5_vl",
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "num_attention_heads": 28,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "num_hidden_layers": 28,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "num_key_value_heads": 4,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "pad_token_id": 151643,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "rms_norm_eps": 1e-06,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "rope_scaling": {
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "mrope_section": [
[2025-06-17 02:42:10] (WorkerDict pid=13361)       16,
[2025-06-17 02:42:10] (WorkerDict pid=13361)       24,
[2025-06-17 02:42:10] (WorkerDict pid=13361)       24
[2025-06-17 02:42:10] (WorkerDict pid=13361)     ],
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "rope_type": "default",
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "type": "default"
[2025-06-17 02:42:10] (WorkerDict pid=13361)   },
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "rope_theta": 1000000.0,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "sliding_window": 32768,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "tie_word_embeddings": false,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "torch_dtype": "bfloat16",
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "transformers_version": "4.51.0",
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "use_cache": true,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "use_sliding_window": false,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "video_token_id": 151656,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "vision_config": {
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "depth": 32,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "fullatt_block_indexes": [
[2025-06-17 02:42:10] (WorkerDict pid=13361)       7,
[2025-06-17 02:42:10] (WorkerDict pid=13361)       15,
[2025-06-17 02:42:10] (WorkerDict pid=13361)       23,
[2025-06-17 02:42:10] (WorkerDict pid=13361)       31
[2025-06-17 02:42:10] (WorkerDict pid=13361)     ],
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "hidden_act": "silu",
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "hidden_size": 1280,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "in_channels": 3,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "in_chans": 3,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "intermediate_size": 3420,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "model_type": "qwen2_5_vl",
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "num_heads": 16,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "out_hidden_size": 3584,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "patch_size": 14,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "spatial_merge_size": 2,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "spatial_patch_size": 14,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "temporal_patch_size": 2,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "tokens_per_second": 2,
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "torch_dtype": "float32",
[2025-06-17 02:42:10] (WorkerDict pid=13361)     "window_size": 112
[2025-06-17 02:42:10] (WorkerDict pid=13361)   },
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "vision_end_token_id": 151653,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "vision_start_token_id": 151652,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "vision_token_id": 151654,
[2025-06-17 02:42:10] (WorkerDict pid=13361)   "vocab_size": 152064
[2025-06-17 02:42:10] (WorkerDict pid=13361) }
[2025-06-17 02:42:10] (WorkerDict pid=13361) 
[2025-06-17 02:42:10] (WorkerDict pid=13361) Monkey patch FlashAttention2.forward in Qwen2.5VL
[2025-06-17 02:42:10] (WorkerDict pid=13361) Monkey patch _flash_attention_forward in transformers.models.qwen2_5_vl.modeling_qwen2_5_vl
[2025-06-17 02:42:10] (WorkerDict pid=13361) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch
[2025-06-17 02:42:10] (WorkerDict pid=13541) Monkey patch FlashAttention2.forward in Qwen2.5VL
[2025-06-17 02:42:10] (WorkerDict pid=13541) Monkey patch _flash_attention_forward in transformers.models.qwen2_5_vl.modeling_qwen2_5_vl
[2025-06-17 02:42:10] (WorkerDict pid=13541) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch
[2025-06-17 02:42:10] (WorkerDict pid=13361) Qwen2_5_VLForConditionalGeneration contains 8.29B parameters
[2025-06-17 02:42:10] (WorkerDict pid=13361) wrap_policy: functools.partial(<function _or_policy at 0x7f8504485b40>, policies=[functools.partial(<function transformer_auto_wrap_policy at 0x7f8504485a20>, transformer_layer_cls={<class 'transformers.models.qwen2_5_vl.modeling_qwen2_5_vl.Qwen2_5_VLDecoderLayer'>, <class 'transformers.models.qwen2_5_vl.modeling_qwen2_5_vl.Qwen2_5_VLVisionBlock'>})])
[2025-06-17 02:42:10] (WorkerDict pid=13361) Total steps: 60, num_warmup_steps: 0
[2025-06-17 02:42:10] (WorkerDict pid=13361) Actor use_remove_padding=True
[2025-06-17 02:42:10] (WorkerDict pid=13361) Actor use_fused_kernels=False
[2025-06-17 02:42:10] (WorkerDict pid=13543) Monkey patch FlashAttention2.forward in Qwen2.5VL [repeated 6x across cluster]
[2025-06-17 02:42:10] (WorkerDict pid=13543) Monkey patch _flash_attention_forward in transformers.models.qwen2_5_vl.modeling_qwen2_5_vl [repeated 6x across cluster]
[2025-06-17 02:42:10] (WorkerDict pid=13543) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch [repeated 6x across cluster]
[2025-06-17 02:42:10] (WorkerDict pid=13361) WARNING 06-16 18:40:12 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f830d065330>
[2025-06-17 02:42:10] (WorkerDict pid=13540) NCCL version 2.21.5+cuda12.4
Training Progress:   0%|          | 0/60 [00:00<?, ?it/s]
[2025-06-17 02:42:18] (WorkerDict pid=13539) /**********/envs/verl/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 5x across cluster]
[2025-06-17 02:42:18] (WorkerDict pid=13539)   warnings.warn( [repeated 5x across cluster]
Training Progress:   2%|▏         | 1/60 [04:09<4:05:26, 249.60s/it]
[2025-06-17 02:46:27] (WorkerDict pid=13537) /**********/envs/verl/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 2x across cluster]
[2025-06-17 02:46:27] (WorkerDict pid=13537)   warnings.warn( [repeated 2x across cluster]
Training Progress:   3%|▎         | 2/60 [08:04<3:52:47, 240.81s/it]
(TaskRunner pid=9331)
Training Progress:   5%|▌         | 3/60 [11:53<3:43:33, 235.33s/it]
(WorkerDict pid=13540) kwargs: {'n': 5, 'logprobs': 0, 'max_tokens': 2048, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False}
(WorkerDict pid=13539) WARNING 06-16 18:40:12 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f97d7cc92d0> [repeated 7x across cluster]
(WorkerDict pid=13542) NCCL version 2.21.5+cuda12.4 [repeated 2x across cluster]
(TaskRunner pid=9331) Using LocalLogger is deprecated. The constructor API will change
(WorkerDict pid=13539) kwargs: {'n': 5, 'logprobs': 0, 'max_tokens': 2048, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 5x across cluster]
(TaskRunner pid=9331) step:1 - global_seqlen/min:194004.000 - global_seqlen/max:215990.000 - global_seqlen/minmax_diff:21986.000 - global_seqlen/balanced_min:203335.000 - global_seqlen/balanced_max:203336.000 - global_seqlen/mean:203335.125 - actor/entropy:0.467 - training/rollout_probs_diff_max:0.378 - training/rollout_probs_diff_mean:0.005 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.001 - actor/kl_coef:0.010 - actor/pg_loss:-0.005 - actor/pg_clipfrac:0.001 - actor/ppo_kl:-0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.230 - perf/mfu/actor:0.323 - perf/max_memory_allocated_gb:62.271 - perf/max_memory_reserved_gb:81.812 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.394 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.394 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.008 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.008 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:380.995 - response_length/max:2048.000 - response_length/min:25.000 - response_length/clip_ratio:0.007 - prompt_length/mean:254.428 - prompt_length/max:996.000 - prompt_length/min:102.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:66.493 - timing_s/reshard:1.879 - timing_s/gen:70.929 - timing_s/reward:3.603 - timing_s/old_log_prob:34.632 - timing_s/ref:33.643 - timing_s/adv:0.095 - timing_s/update_actor:95.425 - timing_s/step:238.697 - timing_per_token_ms/gen:0.073 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.021 - timing_per_token_ms/update_actor:0.059 - perf/total_num_tokens:1626681.000 - perf/time_per_step:238.697 - perf/throughput:851.856
(WorkerDict pid=13537) kwargs: {'n': 5, 'logprobs': 0, 'max_tokens': 2048, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 2x across cluster]
(TaskRunner pid=9331) step:2 - global_seqlen/min:190581.000 - global_seqlen/max:220843.000 - global_seqlen/minmax_diff:30262.000 - global_seqlen/balanced_min:209057.000 - global_seqlen/balanced_max:209058.000 - global_seqlen/mean:209057.500 - actor/entropy:0.458 - training/rollout_probs_diff_max:0.415 - training/rollout_probs_diff_mean:0.005 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.001 - actor/kl_coef:0.010 - actor/pg_loss:0.017 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.252 - perf/mfu/actor:0.327 - perf/max_memory_allocated_gb:62.280 - perf/max_memory_reserved_gb:85.205 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:2.000 - training/epoch:0.000 - critic/score/mean:0.403 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.403 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.016 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.016 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:390.521 - response_length/max:2048.000 - response_length/min:18.000 - response_length/clip_ratio:0.009 - prompt_length/mean:262.783 - prompt_length/max:996.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:63.223 - timing_s/reshard:2.093 - timing_s/gen:70.164 - timing_s/reward:3.706 - timing_s/old_log_prob:30.945 - timing_s/ref:30.190 - timing_s/adv:0.088 - timing_s/update_actor:96.829 - timing_s/step:232.303 - timing_per_token_ms/gen:0.070 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.018 - timing_per_token_ms/update_actor:0.058 - perf/total_num_tokens:1672460.000 - perf/time_per_step:232.303 - perf/throughput:899.936
(TaskRunner pid=9331) step:3 - global_seqlen/min:197140.000 - global_seqlen/max:212951.000 - global_seqlen/minmax_diff:15811.000 - global_seqlen/balanced_min:205956.000 - global_seqlen/balanced_max:205957.000 - global_seqlen/mean:205956.250 - actor/entropy:0.418 - training/rollout_probs_diff_max:0.319 - training/rollout_probs_diff_mean:0.005 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.005 - actor/kl_coef:0.010 - actor/pg_loss:0.065 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.199 - perf/mfu/actor:0.332 - perf/max_memory_allocated_gb:62.414 - perf/max_memory_reserved_gb:85.205 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:3.000 - training/epoch:0.000 - critic/score/mean:0.392 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.392 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.004 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.004 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:379.654 - response_length/max:2048.000 - response_length/min:20.000 - response_length/clip_ratio:0.003 - prompt_length/mean:263.959 - prompt_length/max:776.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:60.097 - timing_s/reshard:2.019 - timing_s/gen:69.763 - timing_s/reward:3.414 - timing_s/old_log_prob:30.005 - timing_s/ref:30.284 - timing_s/adv:0.090 - timing_s/update_actor:93.705 - timing_s/step:227.641 - timing_per_token_ms/gen:0.072 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.018 - timing_per_token_ms/update_actor:0.057 - perf/total_num_tokens:1647650.000 - perf/time_per_step:227.641 - perf/throughput:904.741
(TaskRunner pid=9331)
Training Progress:   7%|▋         | 4/60 [15:41<3:37:00, 232.51s/it]
(TaskRunner pid=9331) step:4 - global_seqlen/min:190149.000 - global_seqlen/max:224987.000 - global_seqlen/minmax_diff:34838.000 - global_seqlen/balanced_min:207060.000 - global_seqlen/balanced_max:207061.000 - global_seqlen/mean:207060.250 - actor/entropy:0.429 - training/rollout_probs_diff_max:0.299 - training/rollout_probs_diff_mean:0.004 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.002 - actor/kl_coef:0.010 - actor/pg_loss:0.036 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.210 - perf/mfu/actor:0.330 - perf/max_memory_allocated_gb:62.977 - perf/max_memory_reserved_gb:87.430 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:4.000 - training/epoch:0.000 - critic/score/mean:0.406 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.406 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.019 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.019 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:392.973 - response_length/max:2048.000 - response_length/min:25.000 - response_length/clip_ratio:0.010 - prompt_length/mean:254.090 - prompt_length/max:996.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:64.229 - timing_s/reshard:2.136 - timing_s/gen:71.688 - timing_s/reward:3.684 - timing_s/old_log_prob:28.621 - timing_s/ref:28.663 - timing_s/adv:0.088 - timing_s/update_actor:94.804 - timing_s/step:227.898 - timing_per_token_ms/gen:0.071 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.017 - timing_per_token_ms/update_actor:0.057 - perf/total_num_tokens:1656482.000 - perf/time_per_step:227.898 - perf/throughput:908.567
(TaskRunner pid=9331) test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True}
(TaskRunner pid=9331) validation generation end
(TaskRunner pid=9331) [prompt] system
(TaskRunner pid=9331) You are a helpful assistant.
(TaskRunner pid=9331) user
(TaskRunner pid=9331) Chords $\overline{A C}$ and $\overline{D F}$ are equidistant from the center. If the radius of $\odot G$ is 26 find $A C$ You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}.
(TaskRunner pid=9331) assistant
(TaskRunner pid=9331)
(TaskRunner pid=9331) [response] <think>Let's start by analyzing the given information and the diagram. We know that chords \(\overline{AC}\) and \(\overline{DF}\) are equidistant from the center \(G\). This means that the perpendicular distances from \(G\) to \(\overline{AC}\) and \(\overline{DF}\) are the same.
(TaskRunner pid=9331)
(TaskRunner pid=9331) First, let's consider the radius of the circle, which is 26. The perpendicular distance from the center \(G\) to a chord is the length of the segment from \(G\) to the midpoint of the chord. This distance is also the length of the segment from \(G\) to the chord, which is the same for both chords \(\overline{AC}\) and \(\overline{DF}\).
(TaskRunner pid=9331)
(TaskRunner pid=9331) Since the chords are equidistant from the center, the perpendicular distance from \(G\) to \(\overline{AC}\) is the same as the perpendicular distance from \(G\) to \(\overline{DF}\). This distance is 10, as given in the problem.
(TaskRunner pid=9331)
(TaskRunner pid=9331) Now, let's use the Pythagorean theorem in the right triangle formed by the radius, the perpendicular distance, and half the length of the chord. The radius is 26, the perpendicular distance is 10, and half the length of the chord is \( \frac{AC}{2} \).
(TaskRunner pid=9331)
(TaskRunner pid=9331) The Pythagorean theorem states:
(TaskRunner pid=9331) \[ 26^2 = 10^2 + \left( \frac{AC}{2} \right)^2 \]
(TaskRunner pid=9331) \[ 676 = 100 + \left( \frac{AC}{2} \right)^2 \]
(TaskRunner pid=9331) \[ 576 = \left( \frac{AC}{2} \right)^2 \]
(TaskRunner pid=9331) \[ \frac{AC}{2} = \sqrt{576} \]
(TaskRunner pid=9331) \[ \frac{AC}{2} = 24 \]
(TaskRunner pid=9331) \[ AC = 48 \]
(TaskRunner pid=9331)
(TaskRunner pid=9331) So, the length of \(AC\) is 48.</think>
(TaskRunner pid=9331) \boxed{48}
(TaskRunner pid=9331) [ground_truth] 48
(TaskRunner pid=9331) [score] 1.0
(TaskRunner pid=9331)
Training Progress:   8%|▊         | 5/60 [20:34<3:53:09, 254.36s/it]
(TaskRunner pid=9331)
Training Progress:  10%|█         | 6/60 [24:24<3:41:25, 246.02s/it]
(TaskRunner pid=9331) step:5 - global_seqlen/min:196253.000 - global_seqlen/max:210637.000 - global_seqlen/minmax_diff:14384.000 - global_seqlen/balanced_min:205432.000 - global_seqlen/balanced_max:205432.000 - global_seqlen/mean:205432.000 - actor/entropy:0.383 - training/rollout_probs_diff_max:0.349 - training/rollout_probs_diff_mean:0.004 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.003 - actor/kl_coef:0.010 - actor/pg_loss:-0.022 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.218 - perf/mfu/actor:0.327 - perf/max_memory_allocated_gb:62.977 - perf/max_memory_reserved_gb:87.430 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - val-aux/hiyouga/geometry3k/reward/mean@1:0.450 - val-aux/hiyouga/geometry3k/reward/mean@24:0.550 - val-aux/hiyouga/geometry3k/reward/std@24:0.450 - val-aux/hiyouga/geometry3k/reward/best@2/mean:0.637 - val-aux/hiyouga/geometry3k/reward/best@2/std:0.238 - val-aux/hiyouga/geometry3k/reward/worst@2/mean:0.367 - val-aux/hiyouga/geometry3k/reward/worst@2/std:0.229 - val-aux/hiyouga/geometry3k/reward/best@4/mean:0.789 - val-aux/hiyouga/geometry3k/reward/best@4/std:0.255 - val-aux/hiyouga/geometry3k/reward/worst@4/mean:0.137 - val-aux/hiyouga/geometry3k/reward/worst@4/std:0.153 - val-aux/hiyouga/geometry3k/reward/best@8/mean:0.964 - val-aux/hiyouga/geometry3k/reward/best@8/std:0.118 - val-aux/hiyouga/geometry3k/reward/worst@8/mean:0.097 - val-aux/hiyouga/geometry3k/reward/worst@8/std:0.056 - val-aux/hiyouga/geometry3k/reward/best@16/mean:1.000 - val-aux/hiyouga/geometry3k/reward/best@16/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@16/mean:0.064 - val-aux/hiyouga/geometry3k/reward/worst@16/std:0.022 - val-aux/hiyouga/geometry3k/reward/best@24/mean:1.000 - val-aux/hiyouga/geometry3k/reward/best@24/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@24/mean:0.100 - val-aux/hiyouga/geometry3k/reward/worst@24/std:0.000 - val-aux/hiyouga/geometry3k/reward/mean@14:0.550 - val-aux/hiyouga/geometry3k/reward/std@14:0.450 - val-aux/hiyouga/geometry3k/reward/best@14/mean:1.000 - val-aux/hiyouga/geometry3k/reward/best@14/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@14/mean:0.100 - val-aux/hiyouga/geometry3k/reward/worst@14/std:0.000 - val-aux/hiyouga/geometry3k/reward/mean@2:0.548 - val-aux/hiyouga/geometry3k/reward/std@2:0.210 - val-aux/hiyouga/geometry3k/reward/mean@3:0.455 - val-aux/hiyouga/geometry3k/reward/std@3:0.309 - val-aux/hiyouga/geometry3k/reward/best@3/mean:0.664 - val-aux/hiyouga/geometry3k/reward/best@3/std:0.192 - val-aux/hiyouga/geometry3k/reward/worst@3/mean:0.231 - val-aux/hiyouga/geometry3k/reward/worst@3/std:0.235 - val-aux/hiyouga/geometry3k/reward/mean@6:0.475 - val-aux/hiyouga/geometry3k/reward/std@6:0.437 - val-aux/hiyouga/geometry3k/reward/best@6/mean:0.958 - val-aux/hiyouga/geometry3k/reward/best@6/std:0.174 - val-aux/hiyouga/geometry3k/reward/worst@6/mean:0.105 - val-aux/hiyouga/geometry3k/reward/worst@6/std:0.061 - val-core/hiyouga/geometry3k/reward/mean@26:0.612 - val-aux/hiyouga/geometry3k/reward/std@26:0.454 - val-core/hiyouga/geometry3k/reward/best@26/mean:1.000 - val-core/hiyouga/geometry3k/reward/best@26/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@26/mean:0.012 - val-aux/hiyouga/geometry3k/reward/worst@26/std:0.032 - val-aux/hiyouga/geometry3k/reward/mean@8:0.438 - val-aux/hiyouga/geometry3k/reward/std@8:0.420 - val-aux/hiyouga/geometry3k/reward/mean@5:0.460 - val-aux/hiyouga/geometry3k/reward/std@5:0.400 - val-aux/hiyouga/geometry3k/reward/best@5/mean:0.856 - val-aux/hiyouga/geometry3k/reward/best@5/std:0.255 - val-aux/hiyouga/geometry3k/reward/worst@5/mean:0.135 - val-aux/hiyouga/geometry3k/reward/worst@5/std:0.134 - val-aux/hiyouga/geometry3k/reward/mean@9:0.300 - val-aux/hiyouga/geometry3k/reward/std@9:0.374 - val-aux/hiyouga/geometry3k/reward/best@9/mean:0.908 - val-aux/hiyouga/geometry3k/reward/best@9/std:0.272 - val-aux/hiyouga/geometry3k/reward/worst@9/mean:0.100 - val-aux/hiyouga/geometry3k/reward/worst@9/std:0.000 - val-aux/hiyouga/geometry3k/reward/mean@4:0.100 - val-aux/hiyouga/geometry3k/reward/std@4:0.000 - training/global_step:5.000 - training/epoch:1.000 - critic/score/mean:0.388 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.388 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.010 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.010 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:384.739 - response_length/max:2048.000 - response_length/min:18.000 - response_length/clip_ratio:0.007 - prompt_length/mean:257.236 - prompt_length/max:996.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:63.336 - timing_s/reshard:2.105 - timing_s/gen:69.227 - timing_s/reward:3.572 - timing_s/old_log_prob:29.942 - timing_s/ref:29.623 - timing_s/adv:0.087 - timing_s/update_actor:94.945 - timing_s/testing:51.987 - timing_s/step:279.773 - timing_per_token_ms/gen:0.070 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.018 - timing_per_token_ms/update_actor:0.058 - perf/total_num_tokens:1643456.000 - perf/time_per_step:279.773 - perf/throughput:734.280
(TaskRunner pid=9331) step:6 - global_seqlen/min:200473.000 - global_seqlen/max:216599.000 - global_seqlen/minmax_diff:16126.000 - global_seqlen/balanced_min:207366.000 - global_seqlen/balanced_max:207367.000 - global_seqlen/mean:207366.250 - actor/entropy:0.346 - training/rollout_probs_diff_max:0.239 - training/rollout_probs_diff_mean:0.004 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.004 - actor/kl_coef:0.010 - actor/pg_loss:0.013 - actor/pg_clipfrac:0.001 - actor/ppo_kl:-0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.257 - perf/mfu/actor:0.328 - perf/max_memory_allocated_gb:62.977 - perf/max_memory_reserved_gb:87.430 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:6.000 - training/epoch:1.000 - critic/score/mean:0.443 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.443 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.010 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.010 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:381.082 - response_length/max:2048.000 - response_length/min:22.000 - response_length/clip_ratio:0.005 - prompt_length/mean:266.938 - prompt_length/max:996.000 - prompt_length/min:102.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:60.989 - timing_s/reshard:1.788 - timing_s/gen:67.473 - timing_s/reward:3.320 - timing_s/old_log_prob:30.357 - timing_s/ref:31.241 - timing_s/adv:0.090 - timing_s/update_actor:95.860 - timing_s/step:228.698 - timing_per_token_ms/gen:0.069 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.019 - timing_per_token_ms/update_actor:0.058 - perf/total_num_tokens:1658930.000 - perf/time_per_step:228.698 - perf/throughput:906.726
(TaskRunner pid=9331)
Training Progress:  12%|█▏        | 7/60 [28:12<3:32:05, 240.11s/it]
(TaskRunner pid=9331)
Training Progress:  13%|█▎        | 8/60 [31:55<3:23:25, 234.72s/it]
(TaskRunner pid=9331)
Training Progress:  15%|█▌        | 9/60 [35:50<3:19:30, 234.71s/it]
... ...
```

### Usage Example

```bash
bash examples/grpo_trainer/run_qwen2_5_vl-7b_seq_balance.sh
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Rely on existing unit tests on CI that covers the code path.
- [ ] New CI unit test(s) are added to cover the code path.
2025-06-21 20:49:24 +08:00
H
0fd4d0ff6a [cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig (#2117)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Previously, most of individual components in verl takes omega conf dict
as one of the input, making it tedious to setup unit tests. Now verl is
gradually introducing dataclass for each sub module for configuration,
with `verl.utils.omega_conf_to_dataclass` to make the conversion easier.
This PR also provide example unit tests on how standalone classes with
config as the input should be tested before using them end-to-end.
Finally, this PR also renames WorkerProfiler to DistProfiler for
clarity.

### Test

Test cases for configuration utilities on CPU.
1. Test basic OmegaConf to dataclass conversion for simple nested
structures
2. Test nested OmegaConf to dataclass conversion for complex
hierarchical configurations
3. Verify all configuration values are correctly converted and
accessible

Test suite for NsightSystemsProfiler functionality
    1. Initialization: Verify profiler state after creation
    2. Basic Profiling: Test start/stop functionality
    3. Discrete Mode: Test discrete profiling behavior
4. Annotation: Test the annotate decorator in both normal and discrete
modes
5. Config Validation: Verify proper config initialization from OmegaConf


### Usage Example

> Provide usage example(s) for easier usage.

```python
def omega_conf_to_dataclass(config: Union[DictConfig, dict], dataclass_type: Type[Any]) -> Any:
    """
    Convert an OmegaConf DictConfig to a dataclass.

    Args:
        config: The OmegaConf DictConfig or dict to convert.
        dataclass_type: The dataclass type to convert to.

    Returns:
        The dataclass instance.
    """
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-20 11:41:08 -07:00
H
c87e91b2ef [ci] test: inspect the type annotation of newly added code, focusing on func defs (#2113)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Per https://github.com/volcengine/verl/discussions/2112, type annotation
should be encouraged to increase readability.
In previous PRs, the type check script does not really take effect
(either too strict or too loose). In this PR, the check is limited to
only function definitions, with a default threshold. By default on CI it
only inspect the files changed in the current PR. For reference, below
is a glimpse of failure cases if we force it to inspect all files under
`verl`.

Upon failure, it prints:
```
f"Please add type annotations for inputs and outputs to meet threshold {args.threshold}. 
Cases exempt from checking:"
"1. Private methods."
"2. Args with name in ('self', 'cls'), or *args / **kwargs"
"3. Files under tests/"
```

```
verl/trainer/main_generation.py:44: def main(config):
verl/trainer/main_generation.py:48: def run_generation(config) -> None:
verl/trainer/main_generation.py:60: def main_task(config):
verl/trainer/main_eval.py:33: def process_item(reward_fn, data_source, response_lst, reward_data):
verl/trainer/main_eval.py:40: def main(config):
verl/trainer/main_ppo.py:26: def main(config):
verl/trainer/main_ppo.py:31: def run_ppo(config) -> None:
verl/trainer/main_ppo.py:182: def create_rl_dataset(data_paths, data_config, tokenizer, processor):
verl/trainer/main_ppo.py:224: def create_rl_sampler(data_config, dataset):
verl/trainer/main_ppo.py:57: def run(self, config):
verl/trainer/fsdp_sft_trainer.py:71: def extract_step(path):
verl/trainer/fsdp_sft_trainer.py:549: def run_sft(config):
verl/trainer/fsdp_sft_trainer.py:572: def main(config):
verl/trainer/fsdp_sft_trainer.py:576: def create_sft_dataset(data_paths, data_config, tokenizer):
verl/trainer/fsdp_sft_trainer.py:384: def training_step(self, batch: TensorDict):
verl/trainer/fsdp_sft_trainer.py:433: def validation_step(self, batch: TensorDict):
verl/trainer/fsdp_sft_trainer.py:444: def save_checkpoint(self, step):
verl/trainer/fsdp_sft_trainer.py:486: def fit(self):
verl/trainer/ppo/reward.py:25: def get_custom_reward_fn(config):
verl/trainer/ppo/reward.py:60: def load_reward_manager(config, tokenizer, num_examine, **reward_kwargs):
verl/trainer/ppo/reward.py:111: def compute_reward(data: DataProto, reward_fn):
verl/trainer/ppo/reward.py:133: def compute_reward_async(data: DataProto, config, tokenizer):
verl/trainer/ppo/reward.py:54: def wrapped_fn(*args, **kwargs):
verl/trainer/ppo/ray_trainer.py:132: def apply_kl_penalty(data: DataProto, kl_ctrl: core_algos.AdaptiveKLController, kl_penalty="kl", multi_turn=Fals
verl/trainer/ppo/ray_trainer.py:181: def compute_response_mask(data: DataProto):
verl/trainer/ppo/ray_trainer.py:199: def compute_advantage(data: DataProto, adv_estimator, gamma=1.0, lam=1.0, num_repeat=1, multi_turn=False, norm_a
verl/trainer/ppo/ray_trainer.py:89: def create_resource_pool(self):
verl/trainer/ppo/ray_trainer.py:710: def init_workers(self):
verl/trainer/ppo/ray_trainer.py:892: def fit(self):
verl/trainer/ppo/ray_trainer.py:381: def check_mutually_exclusive(mbs, mbs_per_gpu, name: str):
verl/trainer/ppo/core_algos.py:34: def register_adv_est(name_or_enum):
verl/trainer/ppo/core_algos.py:53: def get_adv_estimator_fn(name_or_enum):
verl/trainer/ppo/core_algos.py:116: def get_kl_controller(kl_ctrl):
verl/trainer/ppo/core_algos.py:127: def compute_gae_advantage_return(
verl/trainer/ppo/core_algos.py:174: def compute_grpo_outcome_advantage(
verl/trainer/ppo/core_algos.py:231: def compute_grpo_passk_outcome_advantage(
verl/trainer/ppo/core_algos.py:291: def compute_reinforce_plus_plus_baseline_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, 
verl/trainer/ppo/core_algos.py:336: def compute_rloo_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, index: np.ndarray, 
verl/trainer/ppo/core_algos.py:379: def compute_opo_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, index: np.ndarray, 
verl/trainer/ppo/core_algos.py:426: def compute_reinforce_plus_plus_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, 
verl/trainer/ppo/core_algos.py:463: def compute_remax_outcome_advantage(token_level_rewards: torch.Tensor, reward_baselines: torch.Tensor, response_mask: 
verl/trainer/ppo/core_algos.py:492: def compute_rewards(token_level_scores, old_log_prob, ref_log_prob, kl_ratio):
verl/trainer/ppo/core_algos.py:497: def agg_loss(loss_mat: torch.Tensor, loss_mask: torch.Tensor, loss_agg_mode: str):
verl/trainer/ppo/core_algos.py:533: def compute_policy_loss(
verl/trainer/ppo/core_algos.py:599: def compute_entropy_loss(logits, response_mask, loss_agg_mode: str = "token-mean"):
verl/trainer/ppo/core_algos.py:616: def compute_value_loss(vpreds: torch.Tensor, returns: torch.Tensor, values: torch.Tensor, response_mask: torch.Tensor, 
verl/trainer/ppo/core_algos.py:651: def kl_penalty(logprob: torch.FloatTensor, ref_logprob: torch.FloatTensor, kl_penalty) -> torch.FloatTensor:
verl/trainer/ppo/core_algos.py:689: def compute_pf_ppo_reweight_data(
verl/trainer/ppo/core_algos.py:43: def decorator(fn):
verl/trainer/ppo/core_algos.py:99: def update(self, current_kl, n_steps):
verl/trainer/ppo/core_algos.py:112: def update(self, current_kl, n_steps):
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:329: def init_cache_engine(self):
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:334: def free_cache_engine(self):
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:355: def from_engine_args(
```

### Usage Example

For current git diffs compared to `main`:
```
python3 tests/special_sanity/type_coverage_check.py 
```
For inspecting all files under `verl/`
```
find verl -type f -name "*.py" | xargs -n 1 python3 tests/special_sanity/type_coverage_check.py --all-lines --debug --target-file
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-21 00:47:23 +08:00
H
92f9381ed0 [ci] test: enforce API docstring checks (#2114)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

For any function or class included in `__all__`, there must be docstring
associated.


### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-21 00:46:35 +08:00
b1cdef84b5 [recipe] feat: Move entropy reward to the entropy recipe (#2118)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?
Move entropy reward to the entropy recipe, and kl_cov anf clip_cov to
README

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Jiacheng Chen <jackchan9345@gmail.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-20 17:27:40 +08:00
a3498c9fa8 [rollout] fix: fix rollout key not found (#2116)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

fix rollout `multi_turn.format` key not found error.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-20 15:25:03 +08:00
OC
6642bb2eae [rollout] fix: error in sgyang async mode (#2098)
Fixed regression from:
 - https://github.com/volcengine/verl/pull/1668
 - https://github.com/volcengine/verl/pull/1933

Added e2e test for both sglang and vllm async mode test
2025-06-19 19:22:57 -07:00
39b7250b0a [recipe] feat: integrate entropy-mechanism recipe: Clip-Cov and KL-Cov methods (#1830)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add support for the Clip-Cov and KL-Cov methods in paper: The Entropy
Mechanism of Reinforcement Learning for Reasoning Language Models. Also
add the verifier used in the paper.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.
in `core_algos.py`, we add the clip-cov and kl-cov loss
```
def compute_policy_loss_clip_cov(
    old_log_prob,
    log_prob,
    advantages,
    response_mask,
    cliprange=None,
    cliprange_low=None,
    cliprange_high=None,
    loss_agg_mode="token-mean",
    clip_ratio=0.0002,
    clip_cov_lb=1.0,
    clip_cov_ub=5.0,
):
    """
    Compute the clipped policy objective and related metrics for Clip-Cov.
    Adapted from
    https://github.com/PRIME-RL/Entropy-Mechanism-of-RL/blob/main/verl/trainer/ppo/core_algos.py
    Args:
        old_log_prob (torch.Tensor):
            Log-probabilities of actions under the old policy, shape (batch_size, response_length).
        log_prob (torch.Tensor):
            Log-probabilities of actions under the current policy, shape (batch_size, response_length).
        advantages (torch.Tensor):
            Advantage estimates for each action, shape (batch_size, response_length).
        response_mask (torch.Tensor):
            Mask indicating which tokens to include in the loss, shape (batch_size, response_length).
        cliprange (float, optional):
            Clipping parameter ε for standard PPO. See https://arxiv.org/abs/1707.06347.
            Defaults to None (must be provided).
        cliprange_low (float, optional):
            Lower clip range for dual-clip PPO. Defaults to same as `cliprange`.
        cliprange_high (float, optional):
            Upper clip range for dual-clip PPO. Defaults to same as `cliprange`.
        loss_agg_mode (str, optional):
            Aggregation mode for `agg_loss`. Defaults to "token-mean".
        clip_ratio (float, optional):
            Ratio for clipping the covariance. Defaults to 0.0002.
        clip_cov_lb (float, optional):
            Lower bound for clipping covariance. Defaults to 1.0.
        clip_cov_ub (float, optional):
            Upper bound for clipping covariance. Defaults to 5.0.
    """
    assert clip_ratio > 0, "clip_ratio should be larger than 0."
    negative_approx_kl = log_prob - old_log_prob
    ratio = torch.exp(negative_approx_kl)
    ppo_kl = verl_F.masked_mean(-negative_approx_kl, response_mask)

    pg_losses1 = -advantages * ratio

    if cliprange_low is None:
        cliprange_low = cliprange
    if cliprange_high is None:
        cliprange_high = cliprange

    corr = torch.ones_like(advantages)
    pg_losses2 = -advantages * torch.clamp(ratio, 1 - cliprange_low, 1 + cliprange_high)
    clip_by_origin = (pg_losses2 > pg_losses1) & (response_mask > 0)

    cov_all = (advantages- verl_F.masked_mean(advantages, response_mask)) * (log_prob- verl_F.masked_mean(log_prob.detach(), response_mask))
    cov_all[response_mask == 0] = -torch.inf
    cov_all[clip_by_origin] = -torch.inf

    clip_num = max(int(clip_ratio * response_mask.sum().item()), 1)
    top_k_idx = (cov_all < clip_cov_ub) & (cov_all > clip_cov_lb) & (response_mask > 0)
    top_k_idx = torch.nonzero(top_k_idx)

    if len(top_k_idx) > 0:
        perm = torch.randperm(len(top_k_idx))
        top_k_idx = top_k_idx[perm[:min(clip_num, len(top_k_idx))]]
    else:
        top_k_idx = torch.empty((0, 2), device=cov_all.device, dtype=torch.long)

    corr[top_k_idx[:, 0], top_k_idx[:, 1]] = 0

    pg_clipfrac = verl_F.masked_mean((corr==0).float(), response_mask)

    pg_losses = torch.maximum(pg_losses1, pg_losses2) * corr
    pg_loss = agg_loss(loss_mat=pg_losses, loss_mask=response_mask, loss_agg_mode=loss_agg_mode)

    return pg_loss, pg_clipfrac, ppo_kl, torch.tensor(0.)


def compute_policy_loss_kl_cov(
    old_log_prob,
    log_prob,
    advantages,
    response_mask,
    loss_agg_mode="token-mean",
    k_ratio=0.0002,
    ppo_kl_coef=1,
):
    """
    Compute the clipped policy objective and related metrics for Clip-Cov.
    Adapted from
    https://github.com/PRIME-RL/Entropy-Mechanism-of-RL/blob/main/verl/trainer/ppo/core_algos.py
    Args:
        old_log_prob (torch.Tensor):
            Log-probabilities of actions under the old policy, shape (batch_size, response_length).
        log_prob (torch.Tensor):
            Log-probabilities of actions under the current policy, shape (batch_size, response_length).
        advantages (torch.Tensor):
            Advantage estimates for each action, shape (batch_size, response_length).
        response_mask (torch.Tensor):
            Mask indicating which tokens to include in the loss, shape (batch_size, response_length).
        loss_agg_mode (str, optional):
            Aggregation mode for `agg_loss`. Defaults to "token-mean".
        k_ratio (float, optional):
            Ratio for selecting the top-k covariance values. Defaults to 0.0002.
        ppo_kl_coef (float, optional):
            Coefficient for the KL penalty term in the loss. Defaults to 1.
    """
    assert k_ratio > 0, "k_ratio should be larger than 0."
    negative_approx_kl = log_prob - old_log_prob
    abs_kl = negative_approx_kl.abs()
    ratio = torch.exp(negative_approx_kl)
    ppo_kl_abs = verl_F.masked_mean(negative_approx_kl.abs(), response_mask)
    pg_losses1 = -advantages * ratio
    pg_losses_kl = - advantages * ratio + ppo_kl_coef * abs_kl
    pg_losses = pg_losses1

    all_valid = (response_mask > 0)
    all_valid_idx = torch.nonzero(all_valid.reshape(-1), as_tuple=True)[0] 
    all_valid_adv = advantages[all_valid].detach().reshape(-1).cpu()
    all_valid_logp = log_prob[all_valid].detach().reshape(-1).cpu()

    k = min(k_ratio, len(all_valid_adv))

    if k != 0:
        cov_lst_all = (all_valid_adv - all_valid_adv.mean()) * (all_valid_logp - all_valid_logp.mean())
        k_percent_nums = max(1, int(len(cov_lst_all) * k_ratio))
        large_cov_idxs = torch.topk(cov_lst_all, k_percent_nums, largest=True).indices

        if len(large_cov_idxs) != 0:
            large_cov_idxs = all_valid_idx[large_cov_idxs]
            pg_losses[large_cov_idxs // advantages.shape[1], large_cov_idxs % advantages.shape[1]] = pg_losses_kl[large_cov_idxs // advantages.shape[1], large_cov_idxs % advantages.shape[1]]

    pg_loss = agg_loss(loss_mat=pg_losses, loss_mask=response_mask, loss_agg_mode=loss_agg_mode)

    return pg_loss, torch.tensor(0.), ppo_kl_abs, torch.tensor(0.)

```

in the `dp_actor.py`, we add the loss mode switch feature:
```
                    loss_mode = self.config.get("loss_mode", "vanilla")
                    if loss_mode not in ["vanilla", "clip_cov", "kl_cov"]:
                        raise ValueError(f"Unsupported loss mode: {loss_mode}. Supported modes are: 'vanilla', 'clip_cov', 'kl_cov'.")

                    if loss_mode == "vanilla":
                        pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower = compute_policy_loss(
                            old_log_prob=old_log_prob,
                            log_prob=log_prob,
                            advantages=advantages,
                            response_mask=response_mask,
                            cliprange=clip_ratio,
                            cliprange_low=clip_ratio_low,
                            cliprange_high=clip_ratio_high,
                            clip_ratio_c=clip_ratio_c,
                            loss_agg_mode=loss_agg_mode,
                        )

                    elif loss_mode == "clip_cov":
                        pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower= compute_policy_loss_clip_cov(
                            old_log_prob=old_log_prob,
                            log_prob=log_prob,
                            advantages=advantages,
                            response_mask=response_mask,
                            cliprange=clip_ratio,
                            cliprange_low=clip_ratio_low,
                            cliprange_high=clip_ratio_high,
                            loss_agg_mode=loss_agg_mode,
                            clip_ratio=self.config.clip_cov_ratio,
                            clip_cov_lb=self.config.clip_cov_lb,
                            clip_cov_ub=self.config.clip_cov_ub,
                        )

                    elif loss_mode == "kl_cov":
                        pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower= compute_policy_loss_kl_cov(
                            old_log_prob=old_log_prob,
                            log_prob=log_prob,
                            advantages=advantages,
                            response_mask=response_mask,
                            loss_agg_mode=loss_agg_mode,
                            k_ratio=self.config.k_ratio,
                            ppo_kl_coef=self.config.ppo_kl_coef,
                        )
```


### Usage Example

> Provide usage example(s) for easier usage.

We create a recipe (built on dapo recipe) named entropy to store our
scripts, for example the `7b_kl_cov.sh`:

```

#!/usr/bin/env bash
set -xeuo pipefail

export WANDB_API_KEY=YOUR_WANDB_API_KEY
# export VLLM_USE_V1=1

project_name='Qwen2.5-7B'
exp_name='klcov'

adv_estimator=grpo

use_kl_in_reward=False
kl_coef=0.0
use_kl_loss=False
kl_loss_coef=0.0

clip_ratio_low=0.2
clip_ratio_high=0.2

max_prompt_length=$((1024 * 2))
max_response_length=$((1024 * 8))
enable_overlong_buffer=False
overlong_buffer_len=$((1024 * 2))
overlong_penalty_factor=1.0

loss_agg_mode="token-mean"
loss_mode="kl_cov"
enable_filter_groups=False
filter_groups_metric=acc
max_num_gen_batches=10
train_prompt_bsz=256
gen_prompt_bsz=$((train_prompt_bsz * 3))
train_prompt_mini_bsz=256
n_resp_per_prompt=8
max_token=20480

# Ray
RAY_ADDRESS=${RAY_ADDRESS:-"http://localhost:8265"}
WORKING_DIR=${WORKING_DIR:-"${PWD}"}
RUNTIME_ENV=${RUNTIME_ENV:-"${WORKING_DIR}/verl/trainer/runtime_env.yaml"}
NNODES=${NNODES:-4}
# Paths
RAY_DATA_HOME=${RAY_DATA_HOME:-"${HOME}/verl"}
MODEL_PATH=${MODEL_PATH:-"/YOUR_MODELPATH"}
CKPTS_DIR=${CKPTS_DIR:-"/YOUR_CKPTS_PATH"}
TRAIN_FILE=${TRAIN_FILE:-"/YOUR_TRAIN_FILE_PATH"}
TEST_FILE=${TEST_FILE:-["/YOUR_TRAIN_FILE_PATH"]}

# Algorithm
temperature=1.0
top_p=1.0
top_k=-1 # 0 for HF rollout, -1 for vLLM rollout
ppo_kl_coef=1
k_ratio=0.002

# Mathematically equivalent
use_dynamic_bsz=True
infer_micro_batch_size=null
train_micro_batch_size=null
offload=False

HYDRA_FULL_ERROR=1 python -m recipe.entropy.main_entropy \
    data.train_files="${TRAIN_FILE}" \
    data.val_files="${TEST_FILE}" \
    data.prompt_key=prompt \
    data.truncation='left' \
    data.filter_overlong_prompts=False \
    data.max_prompt_length=${max_prompt_length} \
    data.max_response_length=${max_response_length} \
    data.gen_batch_size=${gen_prompt_bsz} \
    data.train_batch_size=${train_prompt_bsz} \
    data.return_raw_chat=True \
    actor_rollout_ref.rollout.n=${n_resp_per_prompt} \
    actor_rollout_ref.actor.use_kl_loss=${use_kl_loss} \
    actor_rollout_ref.actor.kl_loss_coef=${kl_loss_coef} \
    actor_rollout_ref.actor.clip_ratio_low=${clip_ratio_low} \
    actor_rollout_ref.actor.clip_ratio_high=${clip_ratio_high} \
    actor_rollout_ref.actor.clip_ratio_c=10.0 \
    actor_rollout_ref.actor.loss_mode=${loss_mode} \
    actor_rollout_ref.actor.k_ratio=${k_ratio} \
    actor_rollout_ref.actor.ppo_kl_coef=${ppo_kl_coef} \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.rollout.mode=sync \
    algorithm.adv_estimator=${adv_estimator} \
    algorithm.use_kl_in_reward=${use_kl_in_reward} \
    algorithm.kl_ctrl.kl_coef=${kl_coef} \
    algorithm.filter_groups.enable=${enable_filter_groups} \
    algorithm.filter_groups.metric=${filter_groups_metric} \
    algorithm.filter_groups.max_num_gen_batches=${max_num_gen_batches} \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.use_dynamic_bsz=${use_dynamic_bsz} \
    actor_rollout_ref.ref.log_prob_use_dynamic_bsz=${use_dynamic_bsz} \
    actor_rollout_ref.rollout.log_prob_use_dynamic_bsz=${use_dynamic_bsz} \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${max_token} \
    actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${max_token} \
    actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=${max_token} \
    actor_rollout_ref.model.path="${MODEL_PATH}" \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.optim.weight_decay=0 \
    actor_rollout_ref.actor.optim.warmup_style=constant \
    actor_rollout_ref.actor.ppo_mini_batch_size=${train_prompt_mini_bsz} \
    actor_rollout_ref.actor.ppo_micro_batch_size=${train_micro_batch_size} \
    actor_rollout_ref.actor.fsdp_config.param_offload=${offload} \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=${offload} \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.actor.grad_clip=1.0 \
    actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.85 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=${infer_micro_batch_size} \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.enable_chunked_prefill=True \
    actor_rollout_ref.rollout.max_num_batched_tokens=${max_token} \
    actor_rollout_ref.rollout.temperature=${temperature} \
    actor_rollout_ref.rollout.top_p=${top_p} \
    actor_rollout_ref.rollout.top_k="${top_k}" \
    actor_rollout_ref.rollout.val_kwargs.temperature=${temperature} \
    actor_rollout_ref.rollout.val_kwargs.top_p=${top_p} \
    actor_rollout_ref.rollout.val_kwargs.top_k=${top_k} \
    actor_rollout_ref.rollout.val_kwargs.do_sample=False \
    actor_rollout_ref.rollout.val_kwargs.n=1 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=${infer_micro_batch_size} \
    actor_rollout_ref.ref.fsdp_config.param_offload=${offload} \
    actor_rollout_ref.ref.ulysses_sequence_parallel_size=1 \
    actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \
    reward_model.reward_manager=dapo \
    reward_model.overlong_buffer.enable=${enable_overlong_buffer} \
    reward_model.overlong_buffer.len=${overlong_buffer_len} \
    reward_model.overlong_buffer.penalty_factor=${overlong_penalty_factor} \
    trainer.logger=['console','wandb'] \
    trainer.project_name="${project_name}" \
    trainer.experiment_name="${exp_name}" \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes="${NNODES}" \
    trainer.val_before_train=False \
    trainer.test_freq=4 \
    trainer.save_freq=32 \
    trainer.total_epochs=1000 \
    trainer.default_local_dir="${CKPTS_DIR}" \
    trainer.resume_mode=disable

```

### Test

Please refer to the Fig 11 and Tab 2 in https://arxiv.org/pdf/2505.22617
for detailed results.

### Additional Info.

NA

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: Jiacheng Chen <jackchan9345@gmail.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-19 15:08:43 -07:00
ba908710ff [doc] fix: s/Linkedin/LinkedIn (#2111)
### Checklist Before Starting
as titled
- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Use formal name of LinkedIn
> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-19 12:35:32 -07:00
18c2825c53 [trainer] fix: make reward_extra_info optional in reward_result (#2109)
### Checklist Before Starting

- [X] Searched for similar PR(s).
- [X] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Fix the error message: `Error in reward_fn: reward_extra_info`, as for
some reward function implementation, only `reward_tensor` is included in
the returned dictionary.

-
b401382405/verl/workers/reward_manager/prime.py (L176)
-
b401382405/examples/split_placement/main_ppo_split.py (L88)

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-06-20 01:44:25 +08:00
b401382405 [tool] feat: Add Search Tool implemented with MCP (#1948)
1. MCP client manager which manages the connection with
MCP server, such as session multiplexing, rate limit.
2. Search Tool with MCP client and
[Tavily](https://app.tavily.com/home) MCP server, which delivers the
same capability with Search R1 Tool.
3. A general MCP tool base for handling the logic of
executing.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

1. Register a [Tavily](https://app.tavily.com/home) account  
2. Edit the `mcp_server.json` file by replacing `url` and `auth_token`.
Surely, you can use your own MCP server according to the instructions
provided by
[FastMCP](https://gofastmcp.com/clients/transports#configuration-based-transports)
(supporting SSEServer, stdioServer and streamHTTP)
3. Configure the `mcp_tool_config.yaml` file:  
- `mcp_server_config_path` should point to the JSON file from step 2
- `tool_selected_list` specifies the tools you need to register from the
MCP server
4. *(Optional)* Implement a concrete instance based on `MCPBaseTool` to
parse the results returned by the server

Details are listed in
[tutorial](https://github.com/AlecHenx/ml-recipe/blob/main/Tutorial%20for%20MCP%20Tool%20in%20veRL.md)

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes part of issue #1837 
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-19 22:41:14 +08:00
f9a7cf3049 [doc] fix: DAPO branch & doc (#2104)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

This PR fixes the broken link for DAPO branch and add some details to
the doc.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that cover the code path.
2025-06-19 19:44:54 +08:00
ccefcf05ca [doc] fix: Fix mismatched config description for ppo_epochs in critic (#2102)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Fix mismatched config description for `ppo_epochs` in critic

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.


![image](https://github.com/user-attachments/assets/72df0d9a-3ac8-418c-b1c0-aa6e6daaccfd)

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-19 18:19:31 +08:00
42f612dc15 [rollout] refactor: Add option for rollout_log_probs, and default as False (#2072)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> As discussed in https://github.com/volcengine/verl/pull/1712, we may
want to minimize communication cost on large clusters, add an option for
it and default as `False`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-06-19 15:16:47 +08:00
0077f3e38f [ci] feat: Add CI for checking irregular device api usage (#2089)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Add CI for checking irregular device api usage, suggest using api in
`verl/utils/device.py` to get device name or object.

Besides, this CI test case is friendly for non-linux system (e.g.
windows), which is easier to debug and find out the problem.

### Test

Not related.

### High-Level Design

Not related.

### Specific Changes

Add a new CI test case for checking irregular device api usage, suggest
using api in `verl/utils/device.py`.

### API

Not related.

### Usage Example

```shell
python tests\special_sanity\check_device_api_usage.py --directory ./recipe`

[CHECK] File D:\workspace\verl\recipe\char_count\create_dataset.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\char_count\reward_function.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\dapo\dapo_ray_trainer.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\dapo\main_dapo.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\prime\main_prime.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\prime\prime_core_algos.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\prime\prime_dp_rm.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\prime\prime_fsdp_workers.py is detected for device api usage check, check result: success.
[SKIP] File D:\workspace\verl\recipe\prime\prime_ray_trainer.py is in device api usage check whitelist, checking is skipped.
[CHECK] File D:\workspace\verl\recipe\prime\__init__.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\data_process.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\main_eval.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\reward_score.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\__init__.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\tasks\gpqa.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\tasks\livecodebench.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\tasks\math.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\r1\tasks\__init__.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\retool\retool_multi_turn_sft_preprocess.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\spin\core_algos.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\spin\dp_actor.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\spin\fsdp_workers.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\spin\main_spin.py is detected for device api usage check, check result: success.
[SKIP] File D:\workspace\verl\recipe\spin\spin_trainer.py is in device api usage check whitelist, checking is skipped.
[CHECK] File D:\workspace\verl\recipe\sppo\dp_actor.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\sppo\main_sppo.py is detected for device api usage check, check result: success.
[SKIP] File D:\workspace\verl\recipe\sppo\sppo_ray_trainer.py is in device api usage check whitelist, checking is skipped.
[CHECK] File D:\workspace\verl\recipe\sppo\sppo_worker.py is detected for device api usage check, check result: success.
[CHECK] File D:\workspace\verl\recipe\sppo\__init__.py is detected for device api usage check, check result: success.
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-19 10:38:09 +08:00
a44b83c1a5 [misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb (#2094)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

- As title

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-18 19:16:14 -07:00
H
83cb13ad53 [recipe, doc] fix: fix dapo branch name (#2090)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

As title
2025-06-19 09:35:05 +08:00
7dc3ee7476 [vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs (#2068)
All scripts using LLM (Non-VLM + vllm rollout backend) break (Error
details can be found at issue
https://github.com/volcengine/verl/issues/1923, also mentioned in PR
https://github.com/volcengine/verl/pull/1900)
This error currently occurs in vllm>=0.9.0).

The reason is that `disable_mm_preprocessor_cache=True` only works for
VLM, and will cause errors for non-VLM models.

It appears that the default value in vllm is `False` and it's
recommended to be set to False, even for VLM, according to official
guidelines below:


ca94d7fa00/vllm/config.py (L380C5-L382)

Therefore, it's would be better to set `disable_mm_preprocessor_cache`
to `False` here.
2025-06-18 22:43:46 +08:00
ed9cec8081 [megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text (#1999)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?


fix qwen2_vl on plain-text data and mix data of plain-text and
image-text, refer to https://github.com/volcengine/verl/pull/1286

### Test


test on gsm8k dataset and mix data of gsm8k and geo3k.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-18 22:42:50 +08:00
9466d371ee [doc] chore: (baseline.md)Add scripts and logs for performance testing of GRPO-LoRA. (#2083) 2025-06-18 21:59:05 +08:00
d815db5ad8 [trainer] fix: Fix trainer config for val_only (#2084)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

fix: val_only not in trainer structure

### Test

no need.

### High-Level Design

no need.

### Specific Changes

- verl/trainer/config/ppo_trainer.yaml

### API

no need.

### Usage Example

> For eval only

```python
python3 -m verl.trainer.main_ppo \
...
trainer.val_before_train=True \
trainer.val_only=True \
...
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
2025-06-18 19:34:32 +08:00
5d54876b48 [training_utils] feat: Add project and experiment name to tensorboard log path (#2080)
By adding project name and experiment name to the log path, avoid all
tensorboard logs being mixed in the same folder, improving log
management clarity.

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-18 15:49:02 +08:00
e48421160b [doc] feat: update DAPO doc (#2081)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that cover the code path.
2025-06-18 15:47:27 +08:00
4c2ea9aa21 [sglang] fix: AsyncSglangServer use async wake_up/sleep (#2062)
### Checklist Before Starting

- [X] Searched for similar PR(s).
- [X] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Correctly implement async wake_up and sleep for AsyncSglangServer. They
are called in await manner by ActorRolloutRefWorker.

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-18 12:00:52 +08:00
H
34342365e6 [doc] test: ensure new docs are included in TOC tree (#2070)
### Checklist Before Starting

- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`
- [x] Searched for similar PR(s).

### What does this PR do?

Add docs to the ToC tree of the documentation website.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.



### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-17 20:27:35 -07:00
992ac065a1 [data] fix: multimodal overlong prompt length filtering (#2063)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Prompt length filtering should utilize the processor when handling
multimodal inputs.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-18 03:38:13 +08:00
e48292f698 [perf] feat: Add verl profiling support from Nvidia Nsight System (#1820)
Add verl profiling support from Nvidia Nsight System

### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

Add verl profiling support from Nvidia Nsight System

### High-Level Design

This PR add config fileds to trigger Nsight profiling. If
`trainer.profile_steps` is set, Nsight system will be triggered to
profiling the corresponding steps. In each task role, other config
fields control also control the profiling details.

The profiling tasks include the single_controller process and the worker
process. Single_controller process uses the re-designed `marked_timer`
to record each task range in NVTX.

The worker processes dumps the GPU execution details. Since veRL has
hybrid-engine mode and supports split mode, there are two profiling
modes, discrete or not. Discrete mode means each task will generate a
dedicate database; otherwise a whole giant database will be generated.
Nsight system supports to import and align multiple databases
automatically.

### Specific Changes

`verl.utils.debug.profile` add general profling interface and
`verl.utils.debug.nvtx_profile` implements the interface.

### API

`verl.utils.debug.performance._timer` has been changed to
`simple_timer`, and `marked_timer` is added to support profiler range
marker.

`verl.utils.debug.profile` wrappers the basic profiler interfaces,
including mark_*_range, mark_annotate, ProfilerConfig, WorkerProfiler,
and WorkerProfilerExtension. `verl.utils.debug.nvtx_profile` implements
the interfaces when nvtx is available.

### Usage Example

Two examples are added in
`/examples/ppo_trainer/run_deepseek_math_gsm8k_megatron_nsys.sh`
`/examples/ppo_trainer/run_qwen2-7b_rm_seq_balance_nsys.sh`

### Test

There should be no functional changes and performance changes.

### Additional Info.

- **Training**: both FSDP, Megatron will be affected.
- **Inference**: both vLLM, SGLang will be affected.

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if necessary.
2025-06-17 11:05:16 -07:00
8e9e73723f [Bug] fix None check in DataProto print_size() (#2067) 2025-06-17 23:18:27 +08:00
e83215a854 [trainer] chore: Reducing the number of calls to the write (#2043)
### Checklist Before Starting
 Search for similar PR(s).
### What does this PR do?
All entries are first concatenated into a single large string, then
written to the file in one operation
### Test
Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro15,2
      Processor Name: Quad-Core Intel Core i5
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 4
      L2 Cache (per Core): 256 KB
      L3 Cache: 6 MB
      Hyper-Threading Technology: Enabled
      Memory: 16 GB
System Firmware Version: 2022.100.22.0.0 (iBridge: 21.16.4222.0.0,0)
      OS Loader Version: 580~1678
      Activation Lock Status: Disabled

<img width="931" alt="截屏2025-06-16 17 59 53"
src="https://github.com/user-attachments/assets/66dbf3cf-e3f6-45a1-8a27-6003b96b7116"
/>

Co-authored-by: Lancer <maruixiang6688@gmail.com>
2025-06-17 20:04:16 +08:00
0333f8dafc [hardware] feat: support qwen2_5_vl on ASCEND NPU (#1924)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

support vLMs on ASCEND NPU

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-17 19:51:06 +08:00
83ebd007e0 [doc] fix: Fix typo for trainer.resume_mode (#2054)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

`default_local_dir` is used, not `default_hdfs_dir`:
7737bf06e5/verl/trainer/ppo/ray_trainer.py (L818-L825)

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-17 11:29:35 +08:00
7737bf06e5 [Doc] Update "Awesome work using verl" Section in README.md (#2045) 2025-06-16 22:31:25 +08:00
a50000fa25 fix: TensorDict usage error (#2046) 2025-06-16 22:30:49 +08:00
H
cfc5ff2452 [ci] fix: add tests for vllm (#2036)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Fix the failing vllm test

### Test

Added one more test to make sure problematic tool class should fail
during initialization

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: wuxibin <wuxibin@bytedance.com>
2025-06-16 18:27:28 +08:00
fe8bb0d259 [CI] feat: update npu image to vLLM-ascend-v0.7.3.post1 (#2035)
### Checklist Before Starting

[done] Search for similar PR(s).

### What does this PR do?

Version of vLLM-ascend upgraded to v0.7.3.post1 to support multimodal
PRs.

### Specific Changes

Change .github/workflows/e2e_ascend.yml

### Checklist Before Submitting
[ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
[ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).

Co-authored-by: liaochangyue <liaochangyue@bytedance.com>
2025-06-16 13:27:14 +08:00
615f5f1461 [megatron] fix: dpskv3 convert src and dst mixed up bug (#2029)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

- fix DeepseekV3 convert bug introduced from
https://github.com/volcengine/verl/pull/1995 which mixed up the `src`
and `dst` parameters of function `safe_copy`. appologize for my mistake

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-16 10:28:15 +08:00
38d9a88170 [misc] fix: fix format (#2023)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-14 22:58:06 +08:00
27bd30dd3c [trainer] fix: fix sft max_position_embeddings (#2019)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-14 22:40:06 +08:00
ca65c363fb [hardware] refactor: refactor part of device management (#1974)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [x] In format of: [modules] type: Title
- [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [x] type is in `feat, fix, refactor, chore`
- [x] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Refactor device management such as `torch.cuda` and `nccl` in most part
of code in `verl/recipe` and `verl/verl`, which is more convinent for
supporting other devices or platforms.

### Test

Not related.

### High-Level Design

Not related.

### Specific Changes

1. use `get_torch_device()` to get corresponding `torch.device()` object
based on specific device.
2. use `get_device_id()` to get corresponding device rank index based on
specific device.
3. use `get_nccl_backend()` to get corresponding nccl backend based on
specific device.

### API

Not related.

### Usage Example

Monifications in this PR should not be perceived.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-14 20:53:47 +08:00
d50c6cd66e [fsdp] fix: position_ids in qwen-vl (#1947)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

Fix two issues releated to position_ids for qwen2_VL/qwen2.5_VL:

(1) Create processor with use_fast=True lead to use
`Qwen2VLImageProcessorFast`, however, when determining whether to handle
3D position ids, the Qwen2VLImageProcessor was still used.
(2) And 3D position is not considered in ulysses_pad.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-06-14 20:50:00 +08:00
ae75bb6af6 [data] fix: fix retool sft data source (#2018)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-14 15:12:13 +08:00
6e15bbe258 [algo] fix: vf_loss factor (#2016)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

- Fix `vf_loss` factor:
ae528e06e9 (diff-af3da2c60785abde478f7bb68c303cd20e044e8af1b1ae93a2698f5b8fd5ed63R646-R647)
- Fix `core_algos.__all__`:

```diff
- __all__ = ["register", "get_adv_estimator_fn", "AdvantageEstimator"]
+ __all__ = ["register_adv_est", "get_adv_estimator_fn", "AdvantageEstimator"]
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-14 14:22:46 +08:00
c3ffce26d1 [ci] feat: pre-commit check all the files by default (#2017)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

We found that most files have fixed the linting errors, so it might be
the time to check all the files by default.

This PR

1. fixes the remaining linting errors
(4409ad0070aa11027e13e26c469d46c63cdab7fb)
2. sets the pre-commit to check all the files by default
(4c30c2bb99ffec50b038c2a7ff34e28062d7a168)

> [!NOTE]
> **About merging / rebasing overhead**
> Similar to the previous, contributors only need to merge / rebase the
files they have changed, so the overhead should be acceptable.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-14 14:22:17 +08:00
H
e2ffa1c871 [ci] chore: add code owners (#2000)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

#### initial codeowner list

- Added codeowners to a small subset of verl namespaces. Some are left
unassigned for now and we may add them in the future.
- The code owners must demonstrate long-term commitments to the project,
sufficient past contribution to the assigned module, and owner list may
change if commitment changes
- we yet need to have better file/folder separation for vlm specific
changes


#### Test structure enforcement
Let the test folder structure mirror the subfolders under `verl`. Below
is an example failure:
```
  Test layout violations found:

  - tests/non_existent_namespace/test_xx.py: must be inside one of ['models', 'single_controller', 'special_distributed', 'special_e2e', 'special_sanity', 'special_standalone', 'third_party', 'tools', 'trainer', 'utils', 'version', 'workers'] (not at tests root)

Guideline:
  Place each test file under   tests/<module_name>/…
  where <module_name> is one of the top-level packages inside 'verl', or is explicitly listed via --allow-dirs.
```



### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-14 10:33:45 +08:00
6681e25ff4 [ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError (#2010)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
2025-06-14 00:45:24 +08:00
2c85b43299 Stabilize loss calculations by clamping KL divergence values (#1779)
## Stabilize PPO Loss Calculations by Clamping KL Divergence Values

### Summary

This PR improves the numerical stability of PPO training in `verl` by
clamping KL divergence-related values in the loss calculations.
Specifically:

- In `compute_policy_loss`, the `negative_approx_kl` value is now
clamped to the range \([-10, 10]\) before exponentiation and further
use.
- In `kl_penalty` (for the `"low_var_kl"` mode), the KL value is also
clamped to \([-10, 10]\) before further calculations.

### Motivation

During PPO training, extreme log-probability differences can
occasionally occur, leading to numerical instabilities or
exploding/vanishing gradients. By clamping these values, we ensure more
stable and reliable training dynamics, especially in edge cases.

### Changes

- Added `torch.clamp` to `negative_approx_kl` in `compute_policy_loss`.
- Added `torch.clamp` to KL values in `kl_penalty` for `"low_var_kl"`
mode.
- Both are clamped to the range \([-10, 10]\).

### Related Issues

#891 #721

---------

Co-authored-by: syo <syo@jupiter.local>
2025-06-13 23:43:09 +08:00
ffeaed8c41 [megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk (#1995)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

- `DeepseekV3` is too large to load and init weights, as `meta device`
is a better approach.
- accumulate numel to check if model weight is not missed

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-13 23:32:17 +08:00
2441d533fa [megatron] fix: multiple key error when trying to override megatron tr… (#1990)
fix `TypeError:
verl.models.mcore.config_converter._get_mla_transformer_config() got
multiple values for keyword argument ` when user trying to override
megatron config

### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>
2025-06-13 23:30:55 +08:00
a90f2d8793 [tests] chore: ppo workflow runs on volcengine machine learning platform (#1979)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [x] In format of: [modules] type: Title
- [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [x] type is in `feat, fix, refactor, chore`
- [x] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Currently, GPU-related CI jobs in the verl repository have long
execution times, which is not agile-development friendly.
To address this issue, we're introducing dynamic runners in the CI
workflow. These runners operate under the dedicated account for verl CI
tasks on the VolcanoEngine Machine Learning Platform, alleviating GPU
resource constraints in our CI pipeline.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

This PR serves as a prototype. After merging, we'll monitor its
performance improvement and plan migration for other workflows
accordingly.


### Specific Changes

The workflow configuration requires the following adaptations to support
dynamic runners:
Remove container configuration in jobs and add an IMAGE environment
variable to specify the job execution environment
Add setup and clean jobs for runner registration and cleanup, with
proper job dependency configuration

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-13 17:21:23 +08:00
8af15da77d [megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading (#1983)
### Checklist Before Starting

- [x] Searched for similar PR(s).

### What does this PR do?

> This merge request addresses an issue encountered when using Megatron
as the backend for loading models with
`load_state_dict_to_megatron_gptmodel`. Specifically, when loading 32B
or larger models on 64 or more GPUs, it is common to exceed the default
NCCL timeout of 10 minutes(default 10 mins for
[torch.distributed.init_process_group("nccl")](https://docs.pytorch.org/docs/stable/distributed.html),
leading to errors during the[ dist.barrier()
](a1a152ee4a/verl/models/mcore/loader.py (L463))call.


a1a152ee4a/verl/models/mcore/loader.py (L360)


a1a152ee4a/verl/models/mcore/loader.py (L463)


To mitigate this issue, this PR introduces a configuration option to
increase the NCCL timeout. This enhancement allows users to easily
adjust the timeout duration when encountering errors, improving the
robustness of model loading in distributed settings.

Thank you for considering this change!
2025-06-13 14:52:27 +08:00
cfa1750eb4 [ci] feat: assignment type annotation except for assignment (#2007)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Type checking seems to be too strict.

```py
a = 0
data = data.to("cpu")
```

seems have no need for annotation.

Assignment only print warnings.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-13 14:50:48 +08:00
H
9ec260be23 [ci] chore: add type annotation coverage check (#1935)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

See https://github.com/volcengine/verl/issues/1936 for details. Need to
first wait for RFC to pass.


### High-Level Design

Please see RFC for details

### Specific Changes


### Usage Example

```bash
python3 type_coverage_check.py
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-13 08:00:17 +08:00
H
0de4982168 [ci] chore: add documentation coverage test (#2004)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Added a test that asserts every function and class imported by a target
file verl/trainer/ppo/ray_trainer.py. We may extend this test further
for all frequently inspected/extended modules.

Example error msg:
```
Docstring verification failed:

 • /opt/tiger/open_verl/verl/trainer/ppo/ray_trainer.py:58 - function `verl.utils.seqlen_balancing.log_seqlen_unbalance` is missing a docstring.
Traceback (most recent call last):
  File "/opt/tiger/open_verl/tests/special_sanity/validate_imported_docs.py", line 136, in <module>
    main()
  File "/opt/tiger/open_verl/tests/special_sanity/validate_imported_docs.py", line 129, in main
    raise Exception(" Docstring verification failed.")
Exception:  Docstring verification failed.
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-13 07:59:42 +08:00
8a247f7dca [doc] fix: revert previous ray cluster description (#1998)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

[doc] fix: revert previous ray cluster description

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-12 09:29:10 -07:00
49b08e9509 [doc] chore: Add GRPO-LoRA Training Resource & Batch Size Tests (#1985)
### Checklist Before Starting

- [ y] Searched for similar PR(s).
- [ y] Checked PR Title format
  - [ y] In format of: [modules] type: Title
- [ y] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ y] type is in `feat, fix, refactor, chore`
- [ y] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

**1. Add: Tested the minimum resource requirements and corresponding max
batch sizes for 0.5B/1.5B/3B/7B/14B/32B/72B models during GRPO-LoRA
training.
2. Add: Added test scripts for GRPO-LoRA on 0.5B/1.5B/3B/7B/14B/32B/72B
models.**

### Checklist Before Submitting

- [y ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [y ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [y ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [y ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [y ] New CI unit test(s) are added to cover the code path.
- [y ] Rely on existing unit tests on CI that covers the code path.
2025-06-12 21:37:39 +08:00
H
5fa911b3ce [ci] refactor: setup testing guidance (#1958) 2025-06-12 06:16:58 -07:00
a0673f0c89 [doc] feat: Add RL-Factory agentic learning project with verl on README (#1994) 2025-06-12 21:04:05 +08:00
4a3881b6b5 Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh (#1996) 2025-06-12 21:02:59 +08:00
13475caaa9 [env] fix: npu ray verion to 2.46.0 for CI problem (#1987)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [ ] type is in `feat, fix, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-12 17:22:56 +08:00
a1a152ee4a [ckpt] refactor: enhance FSDP checkpoint manager flexibility (#1350)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

This PR enables `FSDPCheckpointManager` to accept optimizer and
`lr_scheduler` as None, removing some existing TODO. Now
`FSDPCheckpointManager` performs saving and loading according to
`checkpoint_contents`, only saving/loading content in
`checkpoint_contents`. This behavior is consistent with
`MegatronCheckpointManager`.

When allowing `optimizer` and `lr_scheduler` to be None, we can create
an `FSDPCheckpointManager` for `fsdp_module` when FSDPWorkers are
initialized only for rollout (`is_actor==False and is_rollout==True`).
This allows users to use `main_generation.py` to directly load FSDP
checkpoints without merging them into hf_model.

Also, added `save_xx` property in the base class to replace all `"xx" in
checkpoint_contents` statements, making the code look better.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

Currently CI should test this PR correctly.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: FSDP
- **Inference**: VLLM

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-06-12 09:37:20 +08:00
87d97c9acd [recipe] feat: qwen2.5vl 7b report and guide (#1969)
### What does this PR do?

add a report and a script containing tuning guide of megatron training
qwen2.5vl 7b


> Add one-line overview of what this PR aims to achieve or accomplish.
Reference related github issues and PRs if that help review.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting
2025-06-11 20:06:19 +08:00
c8908e197c [fsdp] feat: Memory efficient cross entropy with a linear layer fused (#462)
Implemented forward and backward of the following compute logics, which
eliminated many intermediate storage tensors, and resulted in reduced
peak memory usage.

## Equivalent compute logic:
```python
def run_torch_entropy(hidden: torch.Tensor,
                    weight: torch.Tensor,
                    labels: torch.Tensor) -> typing.List[torch.Tensor]:
    logits = torch.matmul(hidden.to(torch.float32), weight.to(torch.float32)) # [num_tokens, vocab_size]
    pd = torch.nn.functional.softmax(logits, dim=-1) # [num_tokens, vocab_size]
    entropy_a = torch.logsumexp(logits, dim=-1) # [num_tokens]
    entropy_b = torch.sum(pd * logits, dim=-1) # [num_tokens]
    entropy = entropy_a - entropy_b
    logprobs = torch.nn.functional.cross_entropy(logits, labels) # [1]
    logprobs = torch.neg(logprobs)
    return logprobs, entropy
```

## API
```python
from verl.utils.kernel import linear_cross_entropy

hidden = torch.randn(num_tokens, hidden_size, dtype=torch.bfloat16, device="cuda")
weight = torch.randn(hidden_size, vocab_size, dtype=torch.bfloat16, device="cuda")
labels = torch.randint(0, vocab_size, (num_tokens,), device="cuda")

loss, entropy = linear_cross_entropy(hidden, weight, labels, reduction="mean")
```

## Storage and latency
<img width="636" alt="image"
src="https://github.com/user-attachments/assets/396b7303-a46a-46b1-a261-917fda034b02"
/>

## Unit test
```shell
$ cd verl/
$ python3 tests/kernel/test_memory_efficient_entropy.py
```

# NOTE
For compatibility, `torch.library.triton_op` was not applied to those
APIs, so that `torch.compile` might not be able to be enabled on top of
it.

---------

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: gaoziyuan.955 <gaoziyuan.955@bytedance.com>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-06-11 19:48:47 +08:00
675a06d172 [doc] fix: FSDP typo in README.md (#1956)
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-06-11 13:25:37 +08:00
9e5510ab3a [rollout] fix: set repetition_penalty=1.0 to AsyncLLM (#1949)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?
- set repetition_penalty=1.0 for AsyncLLM
- add missing timing metrics, close #1926 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-11 12:53:49 +08:00
0bd03d7c05 [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy (#1927)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Add fsdp1 forward pefetch configuration.
2. Add chunk entropy computation.
3. Add torch.checkpoint to entropy computation.
4. Move data to device from `ActorRolloutRefWorker.update_actor` to
`DataParallelPPOActor.update_policy`.
5. Add `npu_cross_entropy_loss` fusion kernel.

### High-Level Design

1. More detail see [FSDP
forward_pefetch](https://docs.pytorch.org/docs/stable/fsdp.html#module-torch.distributed.fsdp)
2. `logits` usually is a large tensor [bsz\*seq_len, voc], on
`compute_entropy_from_logits` will use [bsz\*seq_len, voc] * (4(float32)
+ 2(autocast of softmax+logsumexp) + 1(output of softmax)) memory. To
reduce this memory peak, we can use chunk calculation, changing
[bsz*seq_len, voc] to [chunk_size(2048), voc].
3. During the training phase, `enable_gradient_checkpointing=True` is
not applicable to entropy calculation, so add the recomputation function
of entropy to reduce the memory peak during the training phase.
4. On `ActorRolloutRefWorker.update_actor` all batch data is moved to
the device, but this is unnecessary,
`DataParallelPPOActor.update_policy` will move the data to the device
for each micro batch.


### Specific Changes

> List the specific changes.

### API

Add 3 new configurations in actor/ref, 1 new configuration in
critic/reward.

- actor_rollout_ref.actor.fsdp_config.forward_prefetch: False
- actor_rollout_ref.actor.entropy_from_logits_with_chunking: False
- actor_rollout_ref.actor.entropy_checkpointing: False
- actor_rollout_ref.ref.fsdp_config.forward_prefetch: False
- actor_rollout_ref.ref.entropy_from_logits_with_chunking: False
- actor_rollout_ref.ref.entropy_checkpointing: False
- critic.model.fsdp_config.forward_prefetch: False
- reward_model.model.fsdp_config.forward_prefetch: False


### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-11 12:52:19 +08:00
966c84595b [misc] doc: fix typo in deepseek v3 docker image name install.rst (#1957) 2025-06-11 06:54:01 +08:00
d2665c5eb5 [hardware] fix typo in dockerfile (#1950) 2025-06-11 06:46:46 +08:00
H
7a8122d86a [ci] chore: minor adjustment for PR template (#1952)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

Make the PR template more concise 


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-11 06:46:02 +08:00
22974eeaca [trainer] docs: Fix Typos in Documentation Files (#1954)
Description:  
This pull request corrects several typographical errors in the
documentation:

- In docs/advance/checkpoint.rst, the word "togather" has been corrected
to "together".
- In docs/faq/faq.rst, the word "trainning" has been corrected to
"training".

These changes improve the clarity and professionalism of the
documentation. No functional code changes are included.
2025-06-10 12:30:22 -07:00
b4aa2dce8f [fsdp] fix: fsdp entropy metrics (#1943)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

FSDP entropy calculation forgot to revert indices when use dynamic batch
size.
This does not affect training loss or gradient, but rather the metrics
displayed on tensorboard/wandb.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-10 11:28:48 -07:00
cfa4e701ac [training_utils] Add qwen3 multi-turn sft support (#1889) 2025-06-10 22:08:34 +08:00
3f630e741d [megatron] fix: rope_type typo in config_converter.py (#1944)
![image](https://github.com/user-attachments/assets/bae987fe-9543-4da3-b3bb-5e3bd11cc551)

fix TypeError: MLATransformerConfig.__init__() got an unexpected keyword
argument 'rotary_type'

### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-10 21:34:48 +08:00
74463f9129 [hardware] fix: fix issue when sp>1 on ASCEND NPU (#1942) 2025-06-10 20:30:13 +08:00
f880ec4c72 [ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters (#1821) 2025-06-10 20:29:16 +08:00
85fef90d51 [megatron] feat: qwen2.5vl (#1286)
works with qwen2.5vl 3b + geo3k


<img width="1148" alt="image"
src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8"
/>
<img width="1143" alt="image"
src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85"
/>
<img width="1503" alt="image"
src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43"
/>
2025-06-10 15:38:16 +08:00
1e1645d8e2 [rollout] feat: add async llm perf script (#1930)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Add perf scripts comparing AsyncLLM backend: 
- RayDistributedExecutor: default executor with compiled graph
- ExternalRayDistributedExecutor: external executor with remote call

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-10 14:30:07 +08:00
2b5d66a721 [megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 (#1836)
I encountered an error when training DeepSeek V3 with the latest code
due to the TransformerConfig not including q_lora_rank, which is
required for DeepSeek V3.

#### Error Message
```
(TaskRunner pid=1256989)   File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config
(TaskRunner pid=1256989)     tf_config = hf_to_mcore_config(hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config
(TaskRunner pid=1256989)     return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3
(TaskRunner pid=1256989)     args = _get_base_transformer_config(
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config
(TaskRunner pid=1256989)     return TransformerConfig(**base_config)
(TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank'
```

#### Solution
The `hf_to_mcore_config_dpskv3` function should directly create an
`MLATransformerConfig` instance instead of going through
`_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent
Attention (MLA) which requires MLA-specific parameters.

---------

Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
2025-06-10 13:01:43 +08:00
ea121f0d39 fix sequence parallelism conflict in kimiVL (#1899)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-06-10 09:45:43 +08:00
6d8b2fe37e [sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 (#1852)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

SGLang multiturn rollout relies on bos and eos token in the tool parser
to retrieve the right tool parser. SGLang==0.4.6.post5 changed those
tokens for Qwen2
parser([PR](https://github.com/sgl-project/sglang/pull/6597/files#diff-725eae87b1043c063d85c22b71f415941e2983c60eb52ef1a0d0be89f13b1110))
so it breaks async rollout. This PR updates the logic in Verl to fix the
issue.

Error example:
```
Traceback (most recent call last):
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 28, in main
    run_ppo(config)
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 40, in run_ppo
    ray.get(runner.run.remote(config))
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=64101, ip=100.96.58.10, actor_id=5f74f8de7594144240b2dbcf01000000, repr=<main_ppo.TaskRunner object at 0x7bab01cdd9f0>)
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 155, in run
    trainer.init_workers()
  File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 837, in init_workers
    self.actor_rollout_wg.init_model()
  File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 51, in __call__
    output = ray.get(output)
ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_init_model() (pid=84769, ip=100.96.58.10, actor_id=d2da89f7763ecc0e0681bcdd01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7409bc4c7fa0>)
  File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 645, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/home/jobuser/resources/verl/single_controller/base/decorator.py", line 534, in inner
    return func(*args, **kwargs)
  File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 564, in init_model
    self.rollout, self.rollout_sharding_manager = self._build_rollout(trust_remote_code=self.config.model.get("trust_remote_code", False))
  File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 474, in _build_rollout
    rollout = SGLangRollout(
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 161, in __init__
    ) = self._initialize_tools(config, tokenizer)
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 384, in _initialize_tools
    tool_call_parser_type = get_tool_call_parser_type(tokenizer)
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 113, in get_tool_call_parser_type
    raise ValueError(f"No tool call parser found for tokenizer {tokenizer}")
ValueError: No tool call parser found for tokenizer Qwen2TokenizerFast(name_or_path='/shared/public/elr-models/Qwen/Qwen2.5-7B-Instruct/52e20a6f5f475e5c8f6a8ebda4ae5fa6b1ea22ac', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={
        151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
        151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)
```

### Test

Tested with both SGLang==0.4.6.post4 and SGLang==0.4.6.post5,
successfully executed multiturn RL experiments that failed with
SGLang==0.4.6.post5 before this change .

### Additional Info.

- **Inference**: SGLang

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-10 09:44:33 +08:00
H
581735a5d8 [rollout] fix: fix async llm config passing (#1933)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

Here we should pass full config instead of the sub config. Consumed
here:
https://github.com/volcengine/verl/blob/main/verl/workers/rollout/async_server.py#L111

Also, move the sandbox test another folder to mirror source code folder
structure.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-10 09:41:55 +08:00
16662ceff4 [sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking (#1668) 2025-06-10 00:13:56 +08:00
d843f95992 [CI] feat: hint PR title in template (#1925) 2025-06-09 23:39:01 +08:00
60138ebd19 [worker] fix: do not break dynamic bsz in dp critic (#1922)
### What does this PR do?

Fix bug introduced in #1839 

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-09 15:26:59 +08:00
af5dbec99b [doc] Add TTS model GRPO tuning project with verl on README (#1918)
### Checklist Before Starting

* [x] Search for similar PR(s).

### What does this PR do?

> Integrates Korean TTS fine-tuning using GRPO optimization based on
LLASA-1B models, significantly improving synthesis quality by reducing
Character Error Rate (CER).

### High-Level Design

> This PR enhances the existing TTS model training pipeline by
introducing a reinforcement learning optimization (GRPO) step using
Whisper's NLL and CER metrics.

### Specific Changes

* Adds GRPO reward calculation based on Character Error Rate (CER) and
Negative Log-Likelihood (NLL).
* Implements a Whisper server to compute NLL metrics efficiently.
* Provides scripts for training (`run_llasa_tts_grpo.sh`) and data
preprocessing (`tts.py`).

### API

> No changes to existing public APIs. Internal additions only.

### Usage Example

```bash
CUDA_VISIBLE_DEVICES=2 python3 tts/whisper_server.py --port 8001 --model large-v3

WHISPER_SERVER=http://localhost:8001 nohup bash ./examples/grpo_trainer/run_llasa_tts_grpo.sh > verl_grpo_1b.log 2>&1 &
```

### Test

> Evaluated on internal dataset:

* LLasa1B + 15K Korean dataset baseline CER = 0.0266
* LLasa1B + 15K Korean dataset + GRPO optimization CER = 0.0204

The reduction in CER demonstrates the effectiveness of the GRPO
optimization.

### Additional Info.

* **Issue Number**: N/A
* **Training**: FSDP, Megatron (as relevant)
* **Inference**: vLLM, SGLang (as relevant)

### Checklist Before Submitting

* [ ] Read the [[Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
* [ ] Apply [[pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
* [ ] Add `[BREAKING]` to the PR title if it breaks any API.
* [ ] Update the documentation about your changes in the
[[docs](https://github.com/volcengine/verl/tree/main/docs)](https://github.com/volcengine/verl/tree/main/docs).
* [ ] New CI unit test(s) are added to cover the code path.
* [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-09 13:52:23 +08:00
6baa44d605 revert HIP_VISIBLE_DEVICES in worker.py (#1920)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

Sorry, I found in my tests that with the latest branch and the
AMD-modified version of Ray
(https://github.com/ray-project/ray/pull/53531/files), it’s no longer
necessary to override HIP_VISIBLE_DEVICES here. For the sake of keeping
the code clean, could you please revert this change?

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-09 13:51:50 +08:00
cc9bc3fc21 [bugfix] fix megatron model merger (#1774)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix megatron model merger.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Fix get rank method to support just TP.
- Fix state_dict keys after convert.
- Add mla/moe convert support.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

Test with Qwen3-8B and Qwen2.5-7B.

### Additional Info.

- **Issue Number**: Fixes issue #1757
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
2025-06-09 13:28:24 +08:00
5aa1b046b4 [ppo] feat: add critic valuehead model support for multi-modal PPO (#1839)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

- 支持多模的 PPO,主要是复用 trl 的 `AutoModelForCausalLMWithValueHead` 作为 critic
valuehead model

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-06-09 10:22:54 +08:00
40f5db4a6e [recipe] doc: Rename READMD.md to README.md (#1917)
Fix typo.
2025-06-09 09:35:01 +08:00
8e82bf196c set CUDA and HIP VISIBLE DEVICES (#1914) 2025-06-09 07:57:47 +08:00
H
916ab431b7 [trainer] refactor: refactor reward manager, advantage estimator (#1916) 2025-06-09 07:57:16 +08:00
2bd291e549 fix typos (#1912)
Hey devs! Fixed typo

recipe/spin/dp_actor.py
slient - silent

recipe/spin/spin_trainer.py
differnt - different
2025-06-08 15:41:11 +08:00
H
e8645158a3 [trainer] doc: enforce documentation for config fields (#1910)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Force documentation for the trainer yaml file



### Test

Added a test to enforce it.


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-08 12:18:56 +08:00
450d479b38 [recipe] feat: char count (#1908)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add a tiny recipe char count that can be run on a consumer GPU with only
8GB.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-07 20:57:33 -07:00
f8df864d5f [rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager (#1909)
### What does this PR do?

> Fix bug on DAPO lora training, based on currently main branch, the
entrance file is recipe/dapo/test_dapo_7b_math_lora.sh

### Specific Changes

> Just 2 line code fix as explain in the "Test" module below

### Usage Example
adds actor_rollout_ref.model.lora_rank=8 \ into the
"recipe/dapo/test_dapo_7b_math_lora.sh" file to enable lora RL training.

```bash
bash recipe/dapo/test_dapo_7b_math_lora.sh
```

### Test
a test .sh file: recipe/dapo/test_dapo_7b_math_lora.sh to test dapo with
lora training.

Before this change, the training have the error

> TypeError: argument of type 'torch.device' is not iterable in the line
code below
orig_dev = "cpu" if "cpu" in next(model.parameters()).device else "cuda"

This error is caused the string "cpu" is not in the class "torch.device"
which is not a string or not iterable.

After this change, the lora RL training starts normally.

---------

Co-authored-by: qichang.dong <dongqichang@ecmas.ai>
2025-06-08 09:14:03 +08:00
59379539a0 fix qwen2vl grpo for vllm 0.9 and transformers 4.52 (#1880)
### What does this PR do?

Fixes #1710


![image](https://github.com/user-attachments/assets/185d37b6-a4fe-4e89-8eed-72f4477937e8)

1. vLLM 0.9.0 does not support `limit_mm_per_prompt=None`; this
parameter must be a `dict`.
2. Transformers 4.52.* changes the weight keys in the model state dict,
causing mismatches with vLLM's weight loader.

See also:
https://github.com/huggingface/transformers/pull/38385
https://github.com/vllm-project/vllm/pull/19054
https://github.com/vllm-project/vllm/pull/19151

### Test

run `bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh`


![image](https://github.com/user-attachments/assets/b8137c87-f250-40d0-b9c3-c3f44f1a40a1)

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-07 18:09:06 +08:00
H
897619d738 [tests] chore: add PR title check (#1901)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-07 18:08:14 +08:00
5bf69923f2 fix errors in megatron_workers.py (#1906)
Hey team! Fixed errors in verl/workers/megatron_workers.py

`startegy` - `strategy` x3
2025-06-07 16:56:28 +08:00
01ae0198ff [feat][BREAKING] Megatron: Support learning rate scheduler (#1701)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Support lr scheduler in megatron

### High-Level Design

Still got some difference with FSDP's optimizer in APIs

### Specific Changes

> List the specific changes.

### API

```yaml
    optim:
      lr: 1e-6
      clip_grad: 1.0
      total_training_steps: -1  # must be override by program
      lr_warmup_init: 0.0  # initial learning rate for warmup, default to 0.0
      lr_warmup_steps: -1 # Prioritized. Negative values mean delegating to lr_warmup_steps_ratio.
      lr_warmup_steps_ratio: 0.  # the total steps will be injected during runtime
      lr_decay_steps: null
      lr_decay_style: linear # select from constant/linear/cosine/inverse_square_root
      min_lr: 0.0 # minimum learning rate, default to 0.0
      weight_decay: 0.01
      weight_decay_incr_style: constant # select from constant/linear/cosine
      lr_wsd_decay_style: exponential # select from constant/exponential/cosine
      lr_wsd_decay_steps: null
      use_checkpoint_opt_param_scheduler: False # use checkpoint optimizer parameter scheduler
```


Notice that there are some differences in APIs between Megatron
optimizer and FSDP optimizer.

- Megatron optimizer scheduler names the period after lr_warmup as
lr_decay_steps, so the ``warmup_style`` actually means the style of lr
decay after warmup.
- Megatron optimizer also support weight decay decay mechanism
- ``use_checkpoint_opt_param_scheduler`` determines whether to use the
checkpoint optimizer parameter scheduler. If set to True, the optimizer
parameter scheduler will be saved in the checkpoint and loaded from the
checkpoint during resuming training.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-06-07 13:19:09 +08:00
01fee0a231 [feat] add validation shuffle (#1886)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In scenarios involving multiple validation sets, where the difficulty
levels of these sets differ significantly and the generated content
lengths vary notably, the order in which the validation sets are
processed can have a substantial impact on the validation speed.

### High-Level Design

add validation shuffle

### Usage Example

> Provide usage example(s) for easier usage.

```python
validation_shuffle: True
```

### Test

Validation speed increase of over 10%.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-07 13:12:03 +08:00
d02b3d5134 Dockerfile.rocm update tensordict==0.6.2 (#1898)
### Checklist Before Starting

- [x ] Search for similar PR(s).

### What does this PR do?

Update tensordict version

Resolve PPO training error
+ python3 -m verl.trainer.main_ppo algorithm.adv_estimator=gae
data.train_files=/root/data/gsm8k/train.parquet
data.val_files=/root/data/gsm8k/test.parquet data.train_batch_size=256
data.max_prompt_length=512 data.max_response_length=512
data.return_raw_chat=True
actor_rollout_ref.model.path=/root/models/Qwen/Qwen2.5-0.5B
actor_rollout_ref.model.use_liger=True
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1
actor_rollout_ref.actor.ppo_mini_batch_size=128
actor_rollout_ref.actor.use_dynamic_bsz=False
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=32768
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2
actor_rollout_ref.actor.ulysses_sequence_parallel_size=1
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.actor.use_kl_loss=False
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=32768
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2
actor_rollout_ref.rollout.tensor_model_parallel_size=2
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.gpu_memory_utilization=0.8
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=32768
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2
critic.optim.lr=1e-5 critic.ulysses_sequence_parallel_size=1
critic.model.use_remove_padding=True
critic.optim.lr_warmup_steps_ratio=0.05
critic.model.path=/root/models/Qwen/Qwen2.5-0.5B
critic.model.enable_gradient_checkpointing=False
critic.use_dynamic_bsz=False critic.ppo_max_token_len_per_gpu=32768
critic.ppo_micro_batch_size_per_gpu=2
critic.model.fsdp_config.param_offload=False
critic.model.fsdp_config.optimizer_offload=False
reward_model.enable=True reward_model.ulysses_sequence_parallel_size=1
reward_model.model.path=/root/models/Qwen/Qwen2.5-0.5B
reward_model.model.use_remove_padding=True
reward_model.model.fsdp_config.param_offload=True
reward_model.use_dynamic_bsz=False
reward_model.forward_max_token_len_per_gpu=32768
reward_model.micro_batch_size_per_gpu=2 algorithm.use_kl_in_reward=False
trainer.critic_warmup=0 'trainer.logger=[console]'
trainer.project_name=verl-test
trainer.experiment_name=qwen2.5-0.5b-model-reward-minimal
trainer.nnodes=1 trainer.n_gpus_per_node=8
trainer.val_before_train=False trainer.test_freq=False
trainer.save_freq=-1 trainer.resume_mode=disable trainer.total_epochs=2
trainer.total_training_steps=1
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "/sgl-workspace/verl/__init__.py", line 22, in <module>
    from .protocol import DataProto
  File "/sgl-workspace/verl/protocol.py", line 30, in <module>
    import tensordict
File "/usr/local/lib/python3.12/dist-packages/tensordict/__init__.py",
line 6, in <module>
    import tensordict._reductions
File
"/usr/local/lib/python3.12/dist-packages/tensordict/_reductions.py",
line 11, in <module>
    from tensordict._lazy import LazyStackedTensorDict
File "/usr/local/lib/python3.12/dist-packages/tensordict/_lazy.py", line
38, in <module>
    from tensordict.memmap import MemoryMappedTensor
File "/usr/local/lib/python3.12/dist-packages/tensordict/memmap.py",
line 25, in <module>
    from torch.multiprocessing.reductions import ForkingPickler
ImportError: cannot import name 'ForkingPickler' from
'torch.multiprocessing.reductions'
(/usr/local/lib/python3.12/dist-packages/torch/multiprocessing/reductions.py)

### Checklist Before Submitting

- [x ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x ] Rely on existing unit tests on CI that covers the code path.

Signed-off-by: Vicky Tsang <vtsang@amd.com>
2025-06-07 08:09:12 +08:00
H
69c2a1a81f [release] chore: bump version to v0.4 (#1897) 2025-06-07 07:49:37 +08:00
H
043c72bc7b [docs] moe: add docs for deepseek 671b and qwen-236b (#1896) 2025-06-07 07:49:01 +08:00
457f4d2a20 [rollout] feat: follow OpenAI tool calling schema in chat scheduler (#1831) 2025-06-07 07:47:47 +08:00
70bd3d3d6b [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding (#1834) 2025-06-07 07:45:50 +08:00
c0f5ccbe5d [recipe] retool: add retool sft (#1828)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

- Add retool qwen3 dataset and sft
- The original retool doesn't follow standard qwen multiturn chat
template. In this PR, we recompile the dataset and add a SFT script to
train QWen-8b

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-06 10:39:51 -07:00
cfead14adf [misc] fix: fix indent (#1891)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-06 21:11:40 +08:00
2b9a440bb6 update dapo trainer process (#1888)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

To handle the process bar update frequency when training in DAPO.

### Specific Changes

> List the specific changes.

1.When we set algorithm.filter_groups.enable=true, the DAPO training
process will skip samples whose advantages are all 0 or 1.
2.However, the progress bar does not update simultaneously, which can
confuse users.
3.This merge request addresses the issue by updating the progress bar
before filtering the samples.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

Co-authored-by: techzhu <techzhu@tencent.com>
2025-06-06 20:30:50 +08:00
2038048184 DAPO npu support (#1858)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Support DAPO algorithm on npu

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. change `cuda` hardcode to get_torch_device()
2. add `device_name` parameter to RayDAPOTrainer

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-06 20:28:59 +08:00
OC
fe23634116 [rollout] feat: ChatScheduler requests sglang fully async (#1769)
Changed sglang rollout pipeline to async method to have better
performance.

resolved issue #1721

### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

In previous version, the sglang async_generate is called with a sync ray
actor with lots of sync functions, and resulted poor performance ( GPU
SM is 20% in TP2)

This PR changed the while pipeline to async method. 

Performance comparsion to previous "sglang_async" mode:
  | sglang_async (old) | async (new) | % faster
-- | -- | -- | --
timing_s/gen | 95 | 25 | 73.68%
timing_s/step | 170 | 90 | 47.06%
perf/throughput | 2700 | 4000 | 48.15%

### High-Level Design

see https://github.com/volcengine/verl/pull/1698

This is a follow up task from above PR.


### Usage Example

examples/grpo_trainer/run_qwen2-7b_seq_balance.sh

### Test

.github/workflows/e2e_ppo_trainer.yml

### Additional Info.

- **Issue Number**: Fixes issue #1721

### Checklist Before Submitting

- [ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ done ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ done ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ done ] Add CI test(s) if necessary.
2025-06-06 16:46:30 +08:00
OC
22da46bc1f Add rollout Module Development Progress & Roadmap (#1884)
Updated readme for rollout related ppcoming features and changes.
2025-06-06 16:01:29 +08:00
4653f82fa5 fix: typos (#1879)
fix: typos
2025-06-06 15:25:26 +08:00
OC
9afa8d6dff fix error when ci failed by incorrect sgl-kernel version (#1872)
### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

Fix ci failure from incorrect sgl-kernel version in docker image:

```
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 647, in assert_pkg_version
    raise Exception(
Exception: sgl-kernel is installed with version 0.1.0, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall`
```
2025-06-06 13:55:08 +08:00
bd94bd61fe [misc] fix: fix flops for H200 (#1877)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-05 22:29:35 -07:00
OC
dafd33de59 [ray] profiler: add timeline option for performance analyse (#1768)
### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

Add an option to generate ray timeline for performance analysing.

### Usage Example
Run a job with this option. It can generate the trace file at the end of
training. You can view it from https://ui.perfetto.dev/
```
python3 -m verl.trainer.main_ppo \
    ray_init.timeline_json_file=/tmp/timeline.json \
...
```


<img width="1347" alt="截屏2025-05-30 13 13 56"
src="https://github.com/user-attachments/assets/ec57ef94-3ecd-467e-b33f-ae0da3a54c49"
/>
2025-06-05 20:03:34 -07:00
78240de7dd [DeepSeek][Docker Image] Update dpsk image (#1870)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Split docker image used by CI and deepseek-V3 running, using cudnn 9.8
to support MLA.

New Image is
``whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.1-te2.3-deepseekv3``.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-06 09:35:29 +08:00
f1fd0f095d [single controller] feat: mitigate pickle cost (#1862)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

ray put all the args in advance to avoid duplicate serialization cost
for megatron dispatch.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-06 09:34:58 +08:00
aa11dd19b3 [ppo critic] fix EOS token value to zero (#1850)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

For PPO critic training, the value of EOS tokens should be zero and
should not be fitted. However, the current implementation does not mask
the EOS token values, resulting in non-zero EOS token values. Although
the learning target is zero, when PPO GAE lambda < 1, this affects the
advantage calculation for tokens preceding EOS, thereby impacting
performance.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
2025-06-06 01:05:18 +08:00
45aec859d6 Fixed URL for ProRL in README.md (#1866)
Fixed URL for ProRL in README.md
2025-06-05 22:43:52 +08:00
3870869cc0 Make DeepSeek 671B GRPO example more GPU memory friendly (#1867)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

- Run on 512 GPUs with TP1PP16EP32, 2k input + 4k output
- Add some tips on memory saving

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-05 22:43:29 +08:00
b23829704f [CI]feat:Add NPU CI action and fallback SFT's e2e test defaults to FSDP1 (#1823)
### Checklist Before Starting

- [done] Search for similar PR(s).

### What does this PR do?

Mirror the CI for VeRL to run on the NPU and fallback the e2e test of
the SFT to FSDP1, as the NPU is not currently adapted for FSDP2

### Specific Changes

Add `.github/workflows/e2e_ascend.yml`
Change `tests/e2e/sft/run_sft.sh`

### Checklist Before Submitting

- [ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).

---------

Co-authored-by: liaochangyue <liaochangyue@bytedance.com>
2025-06-05 22:03:20 +08:00
a6f15ae0ad Add DeepSeek 671B GRPO example (#1771)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add an example for DeepSeek 671B GRPO

### Specific Changes

- Need https://github.com/volcengine/verl/pull/1694
- Set `torch._dynamo.config.suppress_errors = True` at entrypoint, if 

```
ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception
traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ray/exceptions.py", line 46, in from_ray_exception
    return pickle.loads(ray_exception.serialized_exception)
TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception'
```

### Additional Info.

- vllm as backend, sglang working in process
(https://github.com/sgl-project/sglang/issues/6762). Merged when both
backends are ready.
- For DeepSeek-V3-0324 at `gsm8k`, the reward starts from 0.8 and
saturated at around 0.95 using only 3 steps.
- Memory peaks around 90GB during actor update (1.5k input + 2.5k
output), consider using TP/ETP for a lower requirement.
- For gsm8k training using this yaml,


![image](https://github.com/user-attachments/assets/d16cf959-5845-4dd0-95af-07fc35820f18)


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-05 22:00:49 +08:00
2f050a8516 [Mcore] dpskv3 671B (#1694)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

support training with deepseekv3 671B
support MTP on top of #1284 

now it is functional ready for 671B, still lacking of practice

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-05 21:55:04 +08:00
28587336a9 [megatron] moonlight fix per_tensor_generator (#1772)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

there is a tricky bug in per_tensor_generator with
model.named_parameter().
"decoder.layers[n].mlp.router.expert_bias" in GPTModel is not registered
in named_parameter, but in state_dict(). Before this fix, the
router_bias or
`model.layers.{layer_number}.mlp.gate.e_score_correction_bias` is not
transfered from m-core to infer engine.





> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-05 19:56:52 +08:00
81acbb2cc5 [bugfix] Force create checkpoint directory before saving dataloader state. (#1625)
Fix training crash due to missing checkpoint directory

We encountered a training crash with error: "RuntimeError: Parent
directory /workspace/ckpts/global_step_20 does not exist".

It appears that `self.actor_rollout_wg.save_checkpoint`, which should
create the checkpoint directory, might be running asynchronously and
doesn't complete creating the folder in time.

This change explicitly forces creation of the directory before saving
the dataloader state to prevent this race condition.

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**:
[1657](https://github.com/volcengine/verl/issues/1657)
- **Training**: FSDP/Megatron
- **Inference**: vLLM

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-06-05 19:33:45 +08:00
f7f8b042d5 [feat] Add support for FSDP2 in GRPO-LoRA (#1844)
1. Add: Add support for FSDP2 in GRPO-LoRa
2. Format: Automatic code formatting changes initiated by the pre-commit
tool
3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into
the CI pipeline.
2025-06-05 19:32:53 +08:00
2fe47f71ab Add ProRL to README.md (#1855)
ProRL is a novel training methodology that incorporates KL divergence
control, reference policy resetting, and a diverse suite of tasks. The
empirical analysis reveals that RL-trained models consistently
outperform base models across a wide range of pass@k evaluations,
including scenarios where base models fail entirely regardless of the
number of attempts.

It is developed based on Verl. 

Link: https://arxiv.org/abs/2505.24864
2025-06-05 17:51:11 +08:00
2a386cf0e9 [BugFix][CI] Megatron: add ep CI (#1726)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Fix ep bug and try to add CI with 15B model, finding smaller models
which are more convenient to test.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-06-05 14:02:00 +08:00
5b66489b52 [refactor] Align name_prefix same behavior for pool and wg (#1851)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Follow-up of #1838, make the `name_prefix` mechanism same for
`RayWorkerGroup` and `RayResourcePool`, default to be `None` and will be
initialized randomly.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-06-05 11:52:28 +08:00
565c496f87 fix batch size validation for Megatron (#1811)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix batch size validation for Megatron. `real_train_batch_size` should
be divisible by dp*mbs.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-05 11:18:12 +08:00
2ed63bbf39 [fix] Adding a default value for RayWorkerGroup.from_detached(name_prefix=None) (#1838)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In #1443, a new argument `name_prefix` was introduced for function
`from_detached` without setting a default value. This PR sets its
default value as `None`, in which case, `RayWorkerGroup` will generate a
random string as the prefix. This fix makes the API compatible with
existing usage, and the users don't need to worry about this new args
when a `name_prefix` is not not context necessary.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-06-04 20:47:58 +08:00
5580b0b057 Add log_generations_to_tensorboard Function (#1841)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

- This PR introduces the `log_generations_to_tensorboard` function.
When` trainer.log_val_generations` is greater than 0 and `tensorboard`
is selected in` trainer.logger,` the function writes the generations
to`generations/text_summary` in TensorBoard.
- I have already tested this in the experiment, and the resulting
TensorBoard is shown in the image below:
<img width="1652"
alt="WeChatWorkScreenshot_d6ac53ed-253e-44a1-b641-4542f3eb1db0"
src="https://github.com/user-attachments/assets/78dcb226-0ada-4af6-9231-f40c558eb3d5"
/>

- The training scripts is shown below:

```
set -x
model_path=$MODEL_PATH 
python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=/data/gsm8k/train.parquet \
    data.val_files=/data/gsm8kest.parquet \
    data.train_batch_size=1024 \
    data.max_prompt_length=512 \
    data.max_response_length=256 \
    data.filter_overlong_prompts=True \
    data.truncation='left' \
    actor_rollout_ref.model.path=$model_path \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=128 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.8 \
    actor_rollout_ref.rollout.n=6 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.log_val_generations=10 \
    trainer.logger=['console','tensorboard'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@
```
2025-06-04 20:41:56 +08:00
fdf7d513e4 fix fsdp train save checkpoint bug (#1843)
### Checklist Before Starting

- [ Y] Search for similar PR(s).

### What does this PR do?

Fix the save_checkpoint logic (otherwise it will save checkpoint at
every step !)


### Checklist Before Submitting

- [ Y] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [Y ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).

Co-authored-by: zhouyi9 <zhouyi9@APBBS24115035.local>
2025-06-04 20:39:53 +08:00
7c49f7098a Add DeepMath to awesome work list (#1847)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Adding DeepMath to the README as a list of work that used veRL

### High-Level Design

> 1-line update. 

### Specific Changes

> only changed the readme.md (1-line update).

### API

> No.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-04 20:38:50 +08:00
0a5c491639 [sglang] Fix for broadcast_pyobj nccl timeout in sgl rollout with larger model (e.g. 32B) (#1846)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

A method proposed by Congkai Xie to avoid sglang_rollout broadcast_pyobj
with nccl timeout error.

### Specific Changes

> List the specific changes.

- Add dist.barrier() before `broadcast_pyobj` to avoid nccl
communication waiting happens at same time TP0 start rollout

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: close #1420 
- **Training**: none
- **Inference**: SGLang

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-04 20:38:01 +08:00
OC
fba8f3463a fix sglang e2e_sppo test (#1832)
### Checklist Before Starting

- [done] Search for similar PR(s).

### What does this PR do?

Fix error in e2e_sppo  CI test

Exception: sgl-kernel is installed with version 0.0.9.post2, which is
less than the minimum required version 0.1.1. Please reinstall the
latest version with `pip install sgl-kernel --force-reinstall`

For example:

https://github.com/volcengine/verl/actions/runs/15431843178/job/43430980736?pr=1769
2025-06-04 11:44:34 +08:00
996b945e74 Add LUFFY to awesome work list #1608 (#1816)
### What does this PR do?

> Adding LUFFY to the README as a list of work that used veRL

### High-Level Design

> 1-line update. 

### Specific Changes

> only changed the readme.md (1-line update).

### API

> No.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-04 08:38:57 +08:00
H
adf775c43b [logging] misc: update PR template and fix lint (#1806) 2025-06-04 07:53:12 +08:00
15ca90ceaa [sft] trainer: port features (dtype, save_freq, test_freq) from PPO config to SFT config (#1451)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

enable (dtype, save_freq, test_freq) config to sync SFTTrainer with
PPOTrainer

### Specific Changes

1. add new config items
2. sync `defaul_local_dir`, `default_hdfs_dir` and `logger` with PPO
config
```yaml
model:
  fsdp_config:
    model_dtype: fp32
trainer:
  save_freq: -1 # unit: iteration
  test_freq: -1
```

### Usage Example

Just works same as `main_ppo`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-03 11:50:24 -07:00
299dde1f86 [docs] Add verl-tool in the list of "awesome works using verl" (#1829) 2025-06-04 00:44:01 +08:00
ed3aec22df [misc] fix fsdp2 has no _fsdp_wrapped_module in lora collect param (#1822)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

[misc] fix fsdp2 has no _fsdp_wrapped_module in lora collect param

### Additional Info.

- **Training**: FSDP
- **Inference**: vllm

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-03 19:20:27 +08:00
OC
de5b2f1ca7 [Tracking]feat: support wandb proxy (#1817)
### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

For environment that can not access wandb directly, you can add a proxy
setting to wandb without impact to other https requests.


### Usage Example

see docs/faq/faq.rst
2025-06-03 16:58:25 +08:00
8540d6ce5f [config] feat: Hardcode moe_router_load_balancing_type to none (#1814)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

- Hardcode moe_router_load_balancing_type to none as it hurts perf in
QWen3MoE
- We can provide a config to set it

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-03 16:36:34 +08:00
2fbdcb38fb [script] feat: upload qwen3 236b script (#1813)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Upload a script that uses QWen3 236b to train on DAPO dataset. Note that
we set the response length to 4k. This results in many truncations at
the beginning. So the training dynamic acts as using RL to compress the
math capabilities of QWen3 236b into 4k response instead of verbose
thinking.

We can achieve 0.5 on AIME'24 after 30 steps. Didn't train for longer.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-03 15:55:27 +08:00
99100867be [optimization] feat: move kv cache wakeup after model weights release (#1810)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Details are in the comments inside the code.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-03 14:48:57 +08:00
668e5f617b [megatron] fix: critic and reward model load tokenizer from config (#301)
Currently, the worker will fail if the critic or reward model path
doesn't contain a tokenizer. This PR tries to fix this by loading
tokenizer from the config for the previously mentioned case.

- For the critic model, we fall back to load from
`critic.model.tokenizer_path`.
- For the reward model, we first fall back to load from
`reward_model.model.rm_tokenizer`, throw an error if that is not set.

---------

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
2025-06-03 13:26:00 +08:00
263115cd9d [dev] fix: note that DP balancing doesn't affect advantage calculation (#1809)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes the comments about DP balancing.

btw, it adds the DP balancing option in the PRIME trainer, while keeping
the default value as `False`.

### Additional Info.

- **Issue Number**: #1718 
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-06-03 10:20:54 +08:00
7695b8db43 [recipe] prime: Code example for PRIME (#1714)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add running example for PRIME algorithm on coding data of
[Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data)

### Specific Changes

> Runing example
> Log

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-06-02 19:08:11 -07:00
a4b1bb7fb9 [algo] OPO: add implementations and descriptions for OPO (On-Policy RL with Optimal Reward Baseline) (#1796)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add implementations and descriptions for OPO (On-Policy RL with
Optimal Reward Baseline)

### Specific Changes

> Add docs of OPO in `docs/algo/opo.md`.
> Add the addvantage estimation function of OPO in
`verl/trainer/ppo/core_algos.py`.
> Add `opo` option for addvantage estimation in
`verl/trainer/ppo/ray_trainer.py`.

### Usage Example

```bash
export GLOBAL_BSZ=256
python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=${GLOBAL_BSZ} \
    actor_rollout_ref.actor.ppo_mini_batch_size=${GLOBAL_BSZ} \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.actor.kl_loss_coef=0.0 \
    actor_rollout_ref.actor.entropy_coeff=0.0 \
    algorithm.kl_ctrl.kl_coef=0.0 \
    ...
```

### Tests
Have tested the changes locally in the provided docker.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-06-02 14:46:06 -07:00
07897f84e5 [AMD] fix: Add support for RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES (Fix AMD support) (#1465)
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

Add support for RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES, also Fix AMD
support

### High-Level Design

Current approach for supporting AMD in verl is fundamentally not
correct, and is just working out of the luck:

Calls such as `torch.cuda.is_available()` or
`torch.cuda.get_device_name()` will initialize the CUDA/ROCm
environment:

c65ee728f0/torch/cuda/__init__.py (L342-L392)

Setting CUDA/HIP/ROCR_VISIBLE_DEVICES after CUDA/ROCm is initialized
will not take effect (Please check
https://github.com/pytorch/pytorch/issues/141678), which means that all
current code that wrapped inside `[SUPPORT AMD: torch]` are mostly
noops.

CUDA_VISIBLE_DEVICES also works for AMD, but it's because that a lot of
AMD migrated software call those `torch.cuda.*` during importing, e.g.:

- https://github.com/ROCm/TransformerEngine/pull/183
- https://github.com/vllm-project/vllm/pull/15246

While ray/vllm manipulates those *_VISIBLE_DEVICES during runtime, which
cause those `torch.cuda.*` to poison the current process if the
CUDA/ROCm environment is initialized before the manipulation happens.

So, here, it would be a good solution to use only one environment
variable for all (`CUDA_VISIBLE_DEVICES`) for consistency and
hardware-agnostic, move all the other `*_VISIBLE_DEVICES` to the CUDA
one. Note that we must pay attention if both HIP/CUDA and ROCR env vars
are set as they have different meanings. Both env vars accept either a
list of ints or a list of UUIDs. The ROCR env var is processed first
which then reduces the number of GPUs that HIP can select from.
(Refering to https://github.com/pytorch/pytorch/pull/144026) To avoid
the complexity of this, we simply gives out error if both are set (Also
to keep consistency with ray's practice with 2.45.0).

For the poisoning issue, before those 2 PRs are merged, we will need to
ask the users to set `RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES` or
`RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES`, so that ray no longer
manipulates these variables, and make verl workable when there is no
`*_VISIBLE_DEVICES`.

Note that for latest ray (after their switch to `HIP_VISIBLE_DEVICES`),
we also need this patch: https://github.com/ray-project/ray/pull/52794

### Test

Tested manually on both megatron and fsdp beckend with vllm.

### Additional Info.

- **Issue Number**: none
- **Training**: both FSDP and Megatron
- **Inference**: both vLLM and SGLang

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if neccessary.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-06-02 10:12:45 -07:00
ea81658b5f [bugfix] fix select_idxs function in DataProto (#1794)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix the batch_size type error when using DataProto.select_idxs, which
originally causes the TypeError when using DataProto.chunk

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

Fix the batch_size type of select_idxs func.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
import torch
from verl import DataProto
import numpy as np

data = {"random_array": torch.randn(4, 2)}
batch = DataProto.from_dict(data)
valid_mask = np.array([True, False, True, False])

batch_select_idxs = batch[valid_mask]

batch.chunk(2) # correct

batch_select_idxs.chunk(2) # incorrect, raising TypeError

# with tensordict version == 0.6.2
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "./verl/verl/protocol.py", line 667, in chunk
#     batch_lst = self.batch.chunk(chunks=chunks, dim=0)
#   File "/opt/conda/envs/verl/lib/python3.10/site-packages/tensordict/base.py", line 2134, in chunk
#     return self.split(split_size, dim=dim)
#   File "/opt/conda/envs/verl/lib/python3.10/site-packages/tensordict/_td.py", line 1715, in split
#     raise TypeError(WRONG_TYPE)
# TypeError: split(): argument 'split_size' must be int or list of ints
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: N/A.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Co-authored-by: keithsun <keithsun@tencent.com>
2025-06-03 00:09:06 +08:00
0e127b208b chore: fix typos across codebase (#1805)
Fixed typos across codebase.
2025-06-02 21:05:07 +08:00
6e0e860f37 [feat] worker_group: support custom label for specific devices (#1773)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Place `worker_group` in specific devices when using heterogeneous GPUs
in a ray cluster.

### Specific Changes

Add `accelerator_type` in `RayResourcePool` to set custom label in
bundle to identify specific devices.
Refer to
https://docs.ray.io/en/latest/ray-core/scheduling/resources.html#custom-resources

### API

> Demonstrate how the API changes if any.


### Usage Example

1. set custom label when start ray cluster
```bash
# H20 Node
ray start --head --port 6379 --resources='{"H20": 1}'
# 4090 Node
ray start --address='<master_ip>:6379' --resources='{"4090": 1}'
```
2. specify the accelerator type when creating RayResourcePool
```python
pool_h20 = RayResourcePool([4], use_gpu=True, accelerator_type='H20')
pool_4090 = RayResourcePool([4], use_gpu=True, accelerator_type='4090')
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-06-02 12:47:48 +08:00
H
b93b9bc2cb [CI] test: disable unstable test temporarily (#1799)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Disable the always failing test
2025-06-02 08:34:10 +08:00
366d29c084 [eval] fix: fix main_eval (#1797) 2025-06-01 08:40:22 -07:00
3126c8b428 remove redundant 'get_custom_reward_fn' function (#1791)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> remove redundant 'get_custom_reward_fn' function. 

### High-Level Design

> None.

### Specific Changes

> "from verl.trainer.ppo.reward import get_custom_reward_fn" instead of
'get_custom_reward_fn' function in verl/recipe/dapo/main_dapo.py
verl/recipe/r1/main_eval.py verl/recipe/spin/main_spin.py
verl/verl/trainer/main_eval.py verl/verl/trainer/main_eval.py
> remove 'get_custom_reward_fn' function in
verl/verl/trainer/main_ppo.py

### Additional Info.

- **[Issue Number](https://github.com/volcengine/verl/issues/1716)**:
Fixes issue # or discussion # if any.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
2025-06-01 21:45:54 +08:00
1fd729c25e fix import issue for mcore package (#1775)
fix import issue for mcore package in `patch_v012.py`
2025-06-01 16:23:04 +08:00
ad9470068e fix freeze router weights for Qwen2MoE (#1792) 2025-06-01 16:03:57 +08:00
0ae50562cc [doc] fix: Fix doc_testci workflow pipeline (#1767)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

The existing doc test ci won't fail, because `SPHINX` doc system only
raise on `fatal`, Error and Warning won't block the doc build process.

This PR tries to fix the problem by grep `Error` messages in the
building log.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-31 23:49:58 -07:00
4de247fe4d [sglang] refactor: Unify async rollout under SGLangRollout, and support sglang==0.4.6.post5 (#1717)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

- Unify the functionality of SGLangRollout and AsyncSGLangRollout,
remove original SGLangRollout and rename AsyncSGLangRollout to
SGLangRollout.
- Make trivial changes due to modification in sglang==0.4.6.post5.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Co-authored-by: zyzshishui <@qq.com>
Co-authored-by: Xiang Long <mindsculptor@yeah.net>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-05-31 19:47:25 -07:00
H
cef6361def [docs] lora: fix lora image and add GRPO docs (#1788)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Fix image rendering
2025-06-01 09:49:42 +08:00
ab97d9b290 [docs] LORA: Train RL(HF) algorithms with LoRA support (#1755)
### Checklist Before Starting

- [done] Search for similar PR(s).

### What does this PR do?

> This PR adds documentation on how to train RL (HF) algorithms with
LoRA support, including configuration parameters and an example script
for practical training.

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-05-31 10:00:40 -07:00
H
106d33f9ec [docs] ppo: add a page for PPO algorithm (#1781)
### Checklist Before Starting

- [x] Search for similar PR(s).

This PR includes contribution and suggestions from
[richardodliu](https://github.com/richardodliu) in
https://github.com/volcengine/verl/pull/979

### What does this PR do?

Update documentation page, include key configs for PPO and other
recipes.
Pending docs:
- GRPO
- DrGRPO
- DAPO, etc

TODO: let config.rst directly show the content of ppo_trainer.yaml and
other related yaml files. In the yaml file, colocate the comment and
explanation with the option. This way the yaml is always consistent with
the documentation page.

For critical feature or algorithms, we list the core configs in a
self-contained page like PPO.md

### High-Level Design

None

### Specific Changes

- use k1, k2, k3 for the kl calculation, still backward compatible
- changed ppo.rst to baseline.md 
- added ppo.md to explain core options for ppo 


### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-31 09:03:12 -07:00
c5bc81b692 [sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (#1682) 2025-05-31 12:51:19 +08:00
e23e67ba53 [feat] dataproto: Supporting new operations (sample_level_repeat, unfold_column_chunks) for DataProto (#1761)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Adding/ Enriching new operations on `DataProto` data class:

1. Making `DataProto` compitable with `self.batch is None`, this is
useful when we are using a `DataProto` to contain non-tensor data only,
i.e., images for vlm use cases;
2. `sample_level_repeat`: this function repeat the rows in DataProto
multiple times in sample level;
3. `unfold_column_chunks`: this function split along the second dim into
`n_splits` folds. Useful in passing grouped tensors that doesn't want to
be shuffled in dataset.

### API & Test

Please check the usage from the added unit test files:
`tests/test_protocol.py`. There are three unit tests added, which are:
`test_dataproto_no_batch`, `test_sample_level_repeat`, and
`test_dataproto_unfold_column_chunks`.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-30 13:11:12 -07:00
316644dc8f [docs] Add linux-arm64 platform tensordict package version problem handle FAQ (#1776)
### What does this PR do?
> Add linux-arm64 platform tensordict package version problem handle
FAQ.
> Besides me, there are other people in the community who have
encountered this problem(issue #919 )

### Detailed reason for change:
The Linux-arm64 platform does not have a suitable version of the
tensordict package. The verl requirement for tensordict is <=0.6.2.0.
The version that can be installed on the Linux-arm64 platform is 0.1.2,
but the `"key" in tensordict_var ` syntax is not supported by 0.1.2, so
error take place. The error message is as follows:
```
  File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 191, in run
    trainer.fit()
  File "/home/mnj/models/code/verl/verl/verl/trainer/ppo/ray_trainer.py", line 1043, in fit
    old_log_prob = self.actor_rollout_wg.compute_log_prob(batch)
  File "/home/mnj/models/code/verl/verl/verl/single_controller/ray/base.py", line 50, in func
    output = ray.get(output)
ray.exceptions.RayTaskError(NotImplementedError): ray::WorkerDict.actor_rollout_compute_log_prob() (pid=152918, ip=172.17.0.5, actor_id=64244f99243c810c9e882f3101000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0xfffc41ca44c0>)
  File "/home/mnj/models/code/verl/verl/verl/single_controller/ray/base.py", line 635, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/home/mnj/models/code/verl/verl/verl/single_controller/base/decorator.py", line 534, in inner
    return func(*args, **kwargs)
  File "/home/mnj/models/code/verl/verl/verl/workers/fsdp_workers.py", line 739, in compute_log_prob
    output, entropys = self.actor.compute_log_prob(data=data, calculate_entropy=True)
  File "/home/mnj/models/code/verl/verl/verl/utils/debug/performance.py", line 80, in f
    return self.log(decorated_function, *args, **kwargs)
  File "/home/mnj/models/code/verl/verl/verl/utils/debug/performance.py", line 90, in log
    output = func(*args, **kwargs)
  File "/home/mnj/models/code/verl/verl/verl/workers/actor/dp_actor.py", line 289, in compute_log_prob
    entropy, log_probs = self._forward_micro_batch(micro_batch, temperature=temperature, calculate_entropy=calculate_entropy)
  File "/home/mnj/models/code/verl/verl/verl/workers/actor/dp_actor.py", line 83, in _forward_micro_batch
    if "multi_modal_inputs" in micro_batch:
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/tensordict/tensordict.py", line 2932, in __contains__
    raise NotImplementedError(
NotImplementedError: TensorDict does not support membership checks with the `in` keyword. If you want to check if a particular key is in your TensorDict, please use `key in tensordict.keys()` instead.
```

### Platform linux-arm64 available version listing  as follows:
`pip install tensordict==0.6.2`

Output information:
```
ERROR: Could not find a version that satisfies the requirement tensordict==0.6.2 (from versions: 0.0.1a0, 0.0.1b0, 0.0.1rc0, 0.0.2a0, 0.0.2b0, 0.0.3, 0.1.0, 0.1.1, 0.1.2, 0.8.0, 0.8.1, 0.8.2, 0.8.3)
ERROR: No matching distribution found for tensordict==0.6.2
```

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-05-30 20:56:58 +08:00
H
2aed8d0a45 [BREAKING] config: set the default value of actor.entropy_coeff to 0 (#1770)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

entropy_coeff shall be set carefully during RL. When enabled,
inappropriate coefficient may case training to collapse. You can see
more empirical experiments from Skywork Open Reasoner 1 Technical Report
(https://arxiv.org/pdf/2505.22312).

In this PR, the default value of entropy_coeff is set to 0. This is a
breaking change that may affect your experiment, although majority of
verl example scripts set it to 0 manually already.

We let most example script just pick up the default value of 0 for
entropy_coeff. For a few documentation page where the reference model
performance and commands are provided, we modify the doc so that the
experiment result is consistent with the config setup.

### Usage Example

To enable entropy loss coefficient, use 
```bash
actor_rollout_ref.actor.entropy_coeff=0.001 # or other values
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-30 14:42:53 +08:00
ed3767dcb3 [refactor] update func generator implementation to improve its observability (#1762)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Making the `func_generator` return type as a subclass of `Functor` with
`__call__` method, whose name is `method_name`. Comparing to the
previous implementation. This PR will makes the log record the
`method_name` explicitly, instead of the previous `<class 'function'>`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-30 13:19:45 +08:00
2981aa26db fix a bug that Moe's GPU memory offload is not properly handled (#1766)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

This PR fixes the issue of improper memory offloading for Moe. When
expert parallelism is enabled in Megatron's MoE, additional
expert_parallel_buffers are used to store the buffers, which occupy a
significant amount of GPU memory. The current code fails to offload and
onload these expert_parallel_buffers, resulting in incomplete memory
offloading of the model. This may lead to out-of-memory (OOM) problem.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-30 13:17:39 +08:00
562ac53d05 [docs][NPU] Document optimize: added the dataset preparation step to ascend quick start guide (#1763)
Document optimize: added the dataset preparation step to ascend quick
start guide

Without this content, error will take place, error "FileNotFoundError:
Unable to find '/root/data/gsm8k/train.parquet'", error stack as
follows:
```
Traceback (most recent call last):
  File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 63, in main
    run_ppo(config)
  File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 76, in run_ppo
    ray.get(runner.run.remote(config))
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FileNotFoundError): ray::TaskRunner.run() (pid=34827, ip=172.17.0.5, actor_id=6de3b5cdc4feb78723a1aa2901000000, repr=<main_ppo.TaskRunner object at 0xfffc17a22fe0>)
  File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 172, in run
    train_dataset = create_rl_dataset(config.data.train_files, config.data, tokenizer, processor)
  File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 219, in create_rl_dataset
    dataset = dataset_cls(
  File "/home/mnj/models/code/verl/verl/verl/utils/dataset/rl_dataset.py", line 119, in __init__
    self._read_files_and_tokenize()
  File "/home/mnj/models/code/verl/verl/verl/utils/dataset/rl_dataset.py", line 132, in _read_files_and_tokenize
    dataframe = datasets.load_dataset("parquet", data_files=parquet_file)["train"]
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 2062, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 1782, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 1497, in dataset_module_factory
    ).get_module()
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 913, in get_module
    data_files = DataFilesDict.from_patterns(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/data_files.py", line 689, in from_patterns
    else DataFilesList.from_patterns(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/data_files.py", line 582, in from_patterns
    resolve_pattern(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/data_files.py", line 383, in resolve_pattern
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to find '/root/data/gsm8k/train.parquet'
```

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-05-30 13:15:36 +08:00
28a31d1b55 [fix] self.reward_module._handle.reshard(True) not required for fsdp2 (#1765)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix RewardModelWorker when using FSDP2, where
self.reward_module_handle.reshare(True) is not required.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-30 13:11:24 +08:00
96903e0e97 [feat] support kimi_vl VLM model (#1639)
### Checklist Before Starting

- [x] Search for similar PR(s).
Some code will conflict with this PR #1613 

### What does this PR do?

Add initial support for Kimi_vl;
Add sp patch for kimi_vl.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Add some minor changes to be compatible with kimi_vl
- Add patch to support ulysses_sequence_parallel

### API

> Demonstrate how the API changes if any.

### Usage Example

```bash

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$DATA_PATH/geo3k/test.parquet \
    data.val_files=$DATA_PATH/geo3k/test.parquet \
    data.train_batch_size=16 \
    data.max_prompt_length=2048 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    data.shuffle=False \
    +data.trust_remote_code=True \
    actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.model.trust_remote_code=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8\
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='Kimi_VL_test' \
    trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \
    trainer.n_gpus_per_node=8\
    trainer.nnodes=1\
    trainer.save_freq=50 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

```


### Test & Problem
During the dev, I discovered some issues, but they did not affect the
code for this PR.
Existing problems:(with vllm==0.8.5.post1)
- Occasional errors of vllm
```python
  File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build
    self.page_size)
    ^^^^^^^^^^^^^^
AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size'
```
releated: https://github.com/vllm-project/vllm/issues/16908
Reference this method to avoid the problem temporarily:
https://github.com/vllm-project/vllm/issues/16908#issuecomment-2820504215

- Garbled output from vllm under specific circumstances
During test, I found that when SamplingParams.n > 1,vllm's output is
some meaningless characters or keeps repeating. This will affect grpo.

releated: https://github.com/vllm-project/vllm/issues/18378
Note: Using the Hopper architecture gpu can avoid this problem, but it
is not clear whether there are still potential issues.


Training curve:
The training curve will comming soon after I solve the second problem.


### Additional Info.

- **Issue Number**: #1428 
- **Training**: FSDP
- **Inference**: vLLM

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-05-30 11:16:06 +08:00
9c50ffd0cb [vlm] Support ulysses sequence parallelism for vlm (#1739)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Only apply Ulysses sequence parallel to the LLM part of the VLM model,
which is the main component, to avoid `the Image features and image
tokens do not match` issue from occurring before `masked_scatter`.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. For the VLM model, we only pad the inputs before forward pass without
slicing them; instead, we perform slicing after the embedding stage.
2. In cases where ViT and LLM share/reuse FlashAttention, distinguish
the ViT scenario and skip the Ulysses logic.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

```
python -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=/mnt/hdfs/zhudelin123/data/geo3k/train.parquet \
    data.val_files=/mnt/hdfs/zhudelin123/data/geo3k/test.parquet \
    data.train_batch_size=64 \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation=error \
    data.image_key=images \
    actor_rollout_ref.model.path=/mnt/hdfs/Qwen2.5-VL-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=64 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.model.use_fused_kernels=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=[console,wandb] \
    trainer.project_name=nanzhe_verl_grpo_example_geo3k \
    trainer.experiment_name=qwen2_5_vl_7b_sp2_test \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=2 \
    trainer.save_freq=-1 \
    trainer.test_freq=-1 \
    trainer.default_hdfs_dir=null \
    trainer.total_epochs=1 \
    trainer.resume_mode=disable
```

<img width="481" alt="image"
src="https://github.com/user-attachments/assets/066db41d-46cf-4bc8-9d50-b9a8189c7654"
/>


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-30 11:12:49 +08:00
55f13ff16f [fix] moonlight runnable with trust_remote_code (#1749) 2025-05-29 22:25:28 +08:00
195f61b0f5 [feat] add fsdp2 to fsdp_sft_trainer (#1713)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add fsdp2 to fsdp_sft_trainer. Resolve issue #1504.

### High-Level Design

Refer to the implementation of #1026.

### Usage Example

```python

model.strategy=fsdp2

```

### Test

<img width="1095" alt="image"
src="https://github.com/user-attachments/assets/1f70db1c-9ac3-448e-abca-fd302480f0c7"
/>

### Additional Info.

- **Issue Number**: #1504 
- **Training**: [Note which backend this PR will affect: FSDP]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-29 21:36:57 +08:00
7853292336 Fix copy_to_local function calls with incorrect argument usage (#1756)
- Fixed two copy_to_local calls where use_shm was passed as positional
argument
- Changed to use keyword argument use_shm=use_shm to prevent TypeError
- This resolves the 'expected str, bytes or os.PathLike object, not
bool' error
- Affects lines 566 and 607 in verl/workers/fsdp_workers.py

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Changed `copy_to_local(self.config.model.path, use_shm)` to
`copy_to_local(self.config.model.path, use_shm=use_shm)`

### Specific Changes

Problem:
The `copy_to_local` function was being called with `use_shm` as a
positional argument instead of a keyword argument, causing `cache_dir`
to receive a boolean value instead of a string path. This resulted in:

```
TypeError: expected str, bytes or os.PathLike object, not bool
```

Solution:
- Changed `copy_to_local(self.config.model.path, use_shm)` to
`copy_to_local(self.config.model.path, use_shm=use_shm)`
- Fixed two instances in `verl/workers/fsdp_workers.py` (lines 566 and
607)

Testing:
- Error no longer occurs during model initialization
- Function calls now correctly pass parameters according to the function
signature

Files Changed:
- `verl/workers/fsdp_workers.py`
```

Co-authored-by: qingyuhao <qingyuhao@bytedance.com>
2025-05-29 17:01:02 +08:00
904a252379 Add an example script for PF-PPO training (#1753)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add an example script for PF-PPO training

### Specific Changes

> Add an example script `run_deepseek7b_llm_pfppo.sh` in
`examples/ppo_trainer/`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-29 15:53:43 +08:00
OC
b8ae4a1fba [rollout] feat: Implement sglang async rollout and multi-turn using AsyncServerBase (#1698)
…sing AsyncServerBase

Implemented AsyncSglangServer similar with AsyncvLLMServer.

Tested run_qwen2-7b_seq_balance_sglang.sh with TP=1, but still has some
todos:

TODO

- [ ] improve performance when TP>1. Current implementation is slow
because sglang_engine.async_generate is called in sequence for each
request.
- [ ] test in multi node deployment.
- [ ] add an unit test


### Checklist Before Starting

- [done] Search for similar PR(s).

### What does this PR do?

resolve issue: https://github.com/volcengine/verl/issues/1636

### High-Level Design
<img width="462" alt="截屏2025-05-26 20 22 25"
src="https://github.com/user-attachments/assets/f07b218d-8e6e-4ccb-b266-2c514d7b4370"
/>

https://github.com/volcengine/verl/issues/1636

### Specific Changes

add AsyncSglangServer

### API

N/A

### Usage Example

    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.mode=async \


### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue 1636
- **Training**: [none]
- **Inference**: [SGLang]

### Checklist Before Submitting

- [done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ done] Add `[BREAKING]` to the PR title if it breaks any API.
- [ done] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ done] Add CI test(s) if necessary.
2025-05-29 14:41:27 +08:00
1b17bb6f92 fix: show last step progress bar (#1750)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Update last step progress bar

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

Signed-off-by: shinytang6 <shinytang6@gmail.com>
2025-05-29 14:19:20 +08:00
de553a2eba feat: sandbox fusion for multi-turn (#1525)
- As users of veRL, we want to allow the model to call certain tools
during Actor rollout, incorporating the results into the training
process.
- We aim to support tool-calling capabilities of inference engines using
`sandbox-fusion` as the code execution system, providing the community
with a reimplementation of `retools`.
2025-05-29 12:12:17 +08:00
OC
bb4f97b754 [ray] fix: error when bind async method in create_colocated_worker (#1745)
### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

fix a bug when register async method to fsdp worker.

When use async method in fsdp worker, it fails with:
```
>                       raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(TypeError): ray::WorkerDict.critic_sub() (pid=232160, ip=192.168.111.50, actor_id=ca29f2b51caa8e56243d6b8e01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f8c50729270>)
E                         File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
E                           cp.dump(obj)
E                         File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
E                           return super().dump(obj)
E                       TypeError: cannot pickle 'coroutine' object
```
/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:919:
RayTaskError(TypeError)

You can reproduce this error in tests/ray_gpu/test_colocated_workers.py
with async method.

### High-Level Design

wrap async method if the original method is coroutine

### Specific Changes

changed _bind_workers_method_to_parent

### API

n\a

### Usage Example

tests/ray_gpu/test_colocated_workers.py


### Test

tests/ray_gpu/test_colocated_workers.py

### Additional Info.

- **Issue Number**: required by
https://github.com/volcengine/verl/issues/1721

### Checklist Before Submitting

- [done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ done] Add `[BREAKING]` to the PR title if it breaks any API.
- [ done] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ done] Add CI test(s) if necessary.
2025-05-29 11:32:47 +08:00
H
abb87bc147 [docs] readme: add lora and move social icons (#1743) 2025-05-28 17:02:07 -07:00
913ca6ee24 Improve run_qwen3moe-30b_megatron training script (#1742) 2025-05-29 07:53:01 +08:00
16f6c1ee65 [feat] lora: new feature -- LoRA support for PPO (#1127)
Co-Authored-By: Stephen Xie <stephenx@berkeley.edu>
Co-Authored-By: Tony Lian <longlian@berkeley.edu>
Co-Authored-By: Jiayi Pan <jiayipan@berkeley.edu>
Co-Authored-By: Simon Huang <thelongestusernameofall@gmail.com>

测试脚本如下:


```
#!/bin/bash
#
#   Author  :   simon huang
#   Date    :   2025年04月15日14:20:30
#   
#   For GRPO LoRA Support Dev 
#

set -x
## master:
# ray start --head --port=6379

## slave:
# ray start --address='localhost:6379'


# export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export WANDB_DIR=wandb-kkr1-lora-4p3bv1
export WANDB_PROJECT=simon-kkr1-lora-4p3bv1

# wandb server start --port 9090
export WANDB_BASE_URL=http://wandblocal:9000
export WANDB_API_KEY=local-5239e89783ebebea9bac5509e2bd1a8e734f55f7
# wandb login --relogin --host=http://wandblocal:9000
# export WANDB_MODE=offline

MODEL_PATH=/data1/models/Qwen/Qwen2.5-0.5B-Instruct

export VLLM_ATTENTION_BACKEND=XFORMERS

nproc_per_gpu=1
nnodes=1
nproc_per_node=2
total_procs=$(( nproc_per_gpu * nnodes * nproc_per_node ))
mini_batch_size=$(( total_procs ))

python3 -m verl.trainer.main_ppo \
    --config-name=lora-ppo_trainer.yaml \
    algorithm.adv_estimator=grpo \
    data.train_files=data/kk/parquet/train.parquet \
    data.val_files=data/kk/parquet/val.parquet \
    data.train_batch_size=${total_procs} \
    data.val_batch_size=${total_procs} \
    data.max_prompt_length=2000 \
    data.max_response_length=600 \
    actor_rollout_ref.model.path=$MODEL_PATH\
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.model.lora_rank=8 \
    actor_rollout_ref.model.lora_alpha=16 \
    actor_rollout_ref.model.target_modules=[k_proj,v_proj] \
    actor_rollout_ref.actor.optim.lr=3e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=${mini_batch_size} \
    actor_rollout_ref.actor.ppo_micro_batch_size=${mini_batch_size} \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=${mini_batch_size} \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.1 \
    actor_rollout_ref.rollout.n=2 \
    actor_rollout_ref.rollout.max_num_seqs=4 \
    actor_rollout_ref.rollout.max_model_len=4000 \
    actor_rollout_ref.rollout.max_num_batched_tokens=4000 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.ref.log_prob_micro_batch_size=${mini_batch_size} \
    actor_rollout_ref.ref.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 \
    actor_rollout_ref.actor.entropy_coeff=0.001 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    reward_model.reward_manager=naive \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name=$WANDB_PROJECT \
    trainer.experiment_name=$WANDB_PROJECT \
    trainer.n_gpus_per_node=${nproc_per_node} \
    trainer.nnodes=${nnodes} \
    trainer.default_local_dir=$WANDB_PROJECT \
    trainer.default_hdfs_dir=null \
    trainer.save_freq=1 \
    trainer.test_freq=1 \
    trainer.total_epochs=8 $@ 2>&1 | tee ${WANDB_PROJECT}.log

```


输出log如下:

```
(TaskRunner pid=2931272)   [Error] </answer> appears 0 times (expected 1)
(TaskRunner pid=2931272)   [Error] Incorrect tag order: Expected <think>...</think><answer>...</answer>
(TaskRunner pid=2931272)
(TaskRunner pid=2931272)   Format validation: FAIL
(TaskRunner pid=2931272)   Format score: -2
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) [Content Validation] Skipped due to format errors or missing answer
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) --------------------------------------------------------------------------------
(TaskRunner pid=2931272) --------------------------------- Final Score ----------------------------------
(TaskRunner pid=2931272)   Format: -2
(TaskRunner pid=2931272)   Answer: -2
(TaskRunner pid=2931272)   Total: -4
(TaskRunner pid=2931272) ================================================================================
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) local_global_step_folder: simon-kkr1-lora-4p3bv1/global_step_10
(WorkerDict pid=2948236) [rank-0]: LoRA adapter saved to simon-kkr1-lora-4p3bv1/global_step_10/actor/lora_adapter
Training Progress:   0%|          | 10/47200 [05:16<308:34:14, 23.54s/it]
(WorkerDict pid=2948236) [rank-0]: Saving model to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/model_world_size_2_rank_0.pt
(WorkerDict pid=2948236) [rank-0]: Saving checkpoint to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/model_world_size_2_rank
_0.pt
(WorkerDict pid=2948236) [rank-0]: Saving extra_state to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/extra_state_world_size
_2_rank_0.pt
(TaskRunner pid=2931272) step:10 - global_seqlen/min:1981.000 - global_seqlen/max:4883.000 - global_seqlen/minmax_diff:2902.000 - global_seqlen/balanced_min:3417.000 - global_seqlen/bal
anced_max:3447.000 - global_seqlen/mean:3432.000 - actor/entropy:1.657 - actor/pg_loss:0.000 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_
norm:1.258 - perf/mfu/actor:0.034 - perf/max_memory_allocated_gb:12.799 - perf/max_memory_reserved_gb:13.301 - perf/cpu_memory_used_gb:49.778 - actor/lr:0.000 - val-core/simon-kkr1/rewar
d/mean@1:-5.278 - val-aux/simon-kkr1/reward/std@1:0.000 - val-core/simon-kkr1/reward/best@1/mean:-5.278 - val-core/simon-kkr1/reward/best@1/std:0.000 - val-aux/simon-kkr1/reward/worst@1/mea
n:-5.278 - val-aux/simon-kkr1/reward/worst@1/std:0.000 - critic/score/mean:-3.658 - critic/score/max:-1.638 - critic/score/min:-5.734 - critic/rewards/mean:-3.658 - critic/rewards/max:-1
.638 - critic/rewards/min:-5.734 - critic/advantages/mean:-0.174 - critic/advantages/max:0.707 - critic/advantages/min:-0.707 - critic/returns/mean:-0.174 - critic/returns/max:0.707 - c
ritic/returns/min:-0.707 - response_length/mean:81.500 - response_length/max:150.000 - response_length/min:28.000 - response_length/clip_ratio:0.000 - prompt_length/mean:1634.500 - prom
pt_length/max:2319.000 - prompt_length/min:950.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:3.607 - timing_s/old_log_prob:0.482 - timing_s/adv:0.015 - timing_s/update_actor:1.428
 - timing_s/testing:5.142 - timing_s/save_checkpoint:2.504 - timing_s/step:13.183 - timing_per_token_ms/adv:0.002 - timing_per_token_ms/update_actor:0.208 - timing_per_token_ms/gen:11.0
65 - perf/total_num_tokens:6864.000 - perf/time_per_step:13.183 - perf/throughput:260.329
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) ================================================================================
(TaskRunner pid=2931272) ============================ Processing New Sample =============================
(TaskRunner pid=2931272) [Warnning] Failed to locate model response header
(TaskRunner pid=2931272)
```

LoRA adapter会和Checkpoint一同保存,截图如下:
<img width="831" alt="image"
src="https://github.com/user-attachments/assets/5b8b2283-decc-499a-b08c-62dcaa961c9f"
/>


少量训练后的reward@worst曲线:
<img width="511" alt="image"
src="https://github.com/user-attachments/assets/d3253782-50b8-4f42-b203-38a09685dc24"
/>

---------

Co-authored-by: Stephen Xie <stephenx@berkeley.edu>
Co-authored-by: Tony Lian <longlian@berkeley.edu>
Co-authored-by: Jiayi Pan <jiayipan@berkeley.edu>
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-05-28 10:53:47 -07:00
18fa5c7e87 [Docker Image] hot fix moonlight tokenizer request (#1740) 2025-05-28 23:30:10 +08:00
75d2b361c2 Add support for PF-PPO (#1719)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add support for [PF-PPO](https://arxiv.org/abs/2409.06957) in verl.

### Specific Changes

> `verl/trainer/config/ppo_trainer.yaml`: Add config for PF-PPO
`verl/trainer/ppo/core_algos.py`: Add `compute_pf_ppo_reweight_data`
function.
`verl/trainer/ppo/ray_trainer.py`: Do PF-PPO in `compute_advantage` when
`config.algorithm.use_pf_ppo` is `True`
`README.md`: Update PF-PPO in README

### Usage Example

```bash
set -x

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=gae \
    algorithm.use_pf_ppo=True \
    algorithm.pf_ppo.reweight_method=pow \
    algorithm.pf_ppo.weight_pow=2.0 \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.max_prompt_length=512 \
    data.max_response_length=512 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.rollout.n=5 \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=deepseek-ai/deepseek-llm-7b-chat \
    critic.model.enable_gradient_checkpointing=True \
    critic.ppo_micro_batch_size_per_gpu=32 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example_gsm8k' \
    trainer.experiment_name='deepseek_llm_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=1 \
    trainer.total_epochs=15 $@
```

### Test

Simple gsm8k test.

<img width="502" alt="image"
src="https://github.com/user-attachments/assets/4298ce20-a691-4edb-8e4a-ef68fb0fb6be"
/>

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-05-28 21:23:48 +08:00
7c91b103f5 [misc] fix: reduce training iter in spin and sppo ci (#1738)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Reduce training iterations in spin and sppo ci to reduce ci time.

### Test

SPIN and SPPO CI

### Additional Info.

No

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-28 19:39:49 +08:00
cdad2e6504 trainer: do not repeat "multi_modal_inputs" in generate_sequences() (#1604)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

"multi_modal_inputs" is not used in generate_sequences() stage, there's
no need to pass this field.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-28 18:23:04 +08:00
be47ac44b2 [mcore] moonlight (small model with deepseekv3 arch) (#1284)
achieve 74.3 at gsm8k, while moonlight reported as 77.4

still WIP with the performance diff
2025-05-28 17:10:29 +08:00
8fe4950061 [BugFix] fix freeze_moe_router typo to enable the config option (#1732)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Fix freeze_moe_router typo to enable the config option as @duomicoding
in #1540 and @vermouth1992 pointed out.

Maybe **freeze** is better than **fix** to describe this function.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-28 17:05:57 +08:00
432f9e91f1 [feat][BREAKING] Megatron support dynamic batch size, to rebalance the workloads (#1617)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Megatron support dynamic batch size, to rebalance the workloads.
2. Fix missing critic metrics.

### High-Level Design

Follow the FSDP's dynamic batch size.

### Specific Changes

Use the `rearrange_micro_batches` API, but compatible with Megatron VPP
constraints.

```py
vpp_size = mpu.get_virtual_pipeline_model_parallel_world_size()
if vpp_size is not None and vpp_size > 1:
    microbatch_group_size_per_vp_stage = self.tf_config.microbatch_group_size_per_vp_stage
    micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, num_batches_devided_by=microbatch_group_size_per_vp_stage, max_token_len=max_token_len)
    assert len(micro_batches) % self.tf_config.microbatch_group_size_per_vp_stage == 0, f"micro_batches {micro_batches} must be divisible by microbatch_group_size_per_vp_stage {microbatch_group_size_per_vp_stage} for megatron backend"
else:
    micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, max_token_len=max_token_len)
```

@vermouth1992 please check whether it makes sense.

Megatron's constraint when using interleaving pipeline:

```py
# If the final micro-batch group has fewer micro-batches than pipeline-parallel size,
    # the pipeline will have dependency bubbles.
    final_microbatch_group_size = num_microbatches % config.microbatch_group_size_per_vp_stage
    if 0 < final_microbatch_group_size < pipeline_parallel_size:
        msg = 'The remainder of M (the total micro-batches) divided by N (number of '
        msg += 'contiguous micro-batches in a virtual pipeline stage) should be 0, '
        msg += 'or larger than or equal to the pipeline-parallel size, but it is '
        msg += f'{final_microbatch_group_size}. '
        msg += 'Otherwise, it introduces dependency bubbles in the pipeline '
        msg += 'and reduces throughput.'
        raise RuntimeError(msg)
```

### API

Megatron forward_backward_batch has changed input, and the output has
become a dict, containing original `output` and the `indices` needed for
compute_old_log_probs.

### Usage Example

```bash
    actor_rollout_ref.actor.use_dynamic_bsz=${USE_DYNAMIC_BSZ} \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${ppo_max_token_len_per_gpu} \
    critic.ppo_max_token_len_per_gpu=${forward_max_token_len_per_gpu} \
```

Other models will directly copy the config.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-28 10:52:36 +08:00
99e749a1f7 Fix Configuration for Micro Batch Size in Megatron's Ref Policy (#1700)
### What does this PR do?

 Fix Configuration for Micro Batch Size in Megatron's Ref Policy

### High-Level Design
This pull request addresses an issue with the micro batch size
configuration in the ref policy of Megatron. The default
ppo_megatron_trainer.yaml only includes two configurations:
log_prob_micro_batch_size and log_prob_micro_batch_size_per_gpu.

54c9b7364c/verl/trainer/config/ppo_megatron_trainer.yaml (L119-L120)
However, in `megatron_workers.py`, the required configuration is
ref.log_prob_micro_batch_size_per_gpu

54c9b7364c/verl/workers/megatron_workers.py (L517-L518)
or in `megatron_actor.py ` the required configuration is
ref.ppo_micro_batch_size_per_gpu,

54c9b7364c/verl/workers/actor/megatron_actor.py (L271-L274)

which are not directly related to ppo_micro_batch_size.

To resolve this, I have made modifications to the configuration
calculations and added raise ValueError statements to ensure that the
necessary parameters are correctly defined.

This update ensures that the required parameters are properly handled,
preventing runtime errors and improving the overall robustness of the
training process.

### Changes Made:

- Modified the configuration calculations in megatron_workers.py.

- Added raise ValueError statements to check for the presence of
log_prob_micro_batch_size_per_gpu and ppo_micro_batch_size_per_gpu.
2025-05-28 10:51:46 +08:00
9b186eda34 Update README.md (#1731)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR updates the README.md for the SPIN recipe to improve accuracy
and completeness. Key changes include corrections and additions to the
method description, the inclusion of related Works, and a more concise
introduction.

### High-Level Design

N/A - Focuses on documentation improvements for clarity and accuracy.

### Specific Changes

- Corrected and supplemented the description of the SPIN methodology.
- Inclusion of related Works along with concise introductions to
relevant papers/concepts.
- Refined and clarified the introductory sections of the README.

### API

N/A - Changes are limited to README.md documentation.

### Usage Example

N/A - This PR does not primarily focus on usage examples, but rather on
descriptive content.

```python
# No new standalone code snippets are part of this PR itself.
2025-05-28 10:39:31 +08:00
d5570c40ef [mics][fix] Deprecate legacy _default_compute_score API and fix ray utils test (#1729)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Handle comments after #1397 being merged:

1. Add back `_default_compute_score` API and mark it as deprecated;
2. Fix a broken ci test `ray_utils_test` on `parallel_put`;

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-28 09:37:03 +08:00
16a13d836e [misc] feat: support logging rollout prob vs. actor probs for debugging purpose (#1712)
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

- Support logging rollout probs vs. actor probs for debugging purpose
- Support both vllm and sglang async

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-28 08:14:31 +08:00
34e409b683 [docs] refactor: Adding doc strings and doc pages for public methods in trainer and utils (#1397)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

* This PR adds doc string for the public methods inside `trainer` and
`utils` module, so that these methods can be reused and referenced
better.
* Two new doc page `PPO Trainer Interface` and `Utilities` were also
provided under the API Reference section.
* Renamed one function `verl.utils._default_compute_score` to
`verl.utils.default_compute_score`, as it was an external function used
by other modules, i.e., trainer and recipe;

<img width="1093" alt="Screenshot 2025-05-26 at 9 20 31 PM"
src="https://github.com/user-attachments/assets/e361e6bd-a33b-426b-85b4-9fe93ab1e398"
/>


### TODO
This is the second of a series of PRs to improve and stabilize the docs
and API. Stacked on top of #1396
TODO includes adding more useful utility functions to the doc with
improved doc strings.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-05-27 14:39:52 -07:00
4d3ca21288 [CI] disable e2e_prime, always hang for 50 minutes (#1728) 2025-05-27 22:39:27 +08:00
54b2677f72 Add dstack example (#2) (#1706)
Co-authored-by: Bihan  Rana <bihan@Bihans-MacBook-Pro.local>
Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
2025-05-27 08:44:03 +08:00
9846360ee0 fix TimeoutError in aiohttp (#1702) 2025-05-27 08:09:04 +08:00
4583e4c27d [Doc] Add a visual explanation of the configuration to the documentation (#1709)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add a visual explanation of the configuration to the documentation

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-27 02:04:59 +08:00
5fe1839223 [CI] fix some tests scope (#1689)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Refactor and reduce some tests scope to reduce unrelated tests.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-26 09:46:30 -07:00
8298f7d267 [Bugfix] Fix for non_fused_kernels passing arguments (#1687)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Non_fused_kernels passing arguments error causes Qwen2_5_VL failed.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-05-26 22:09:49 +08:00
54c9b7364c update ascend_quick_start doc (#1685)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

update ascend_quick_start.rst

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. rename ascend_quick_start.rst
2. add the accuracy and throughput data of GRPO.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-26 15:53:07 +08:00
3d5f15fa9a [fix] use correct variable for saving hf model (#1681) 2025-05-25 18:49:43 +08:00
c60546d305 [misc] fix: fix device (#1671)
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

Currently, the device to run on depends on whether `is_cuda_available`
is True on the driver process. However, the driver process may be a CPU
process that can't see cuda devices even when cuda devices are
available. Thus, it's not appropriate to use `is_cuda_available` to set
the device. Instead, we should set the device explicitly.

In the future, we may have a ray cluster with both NPU and GPU, and we
can use different devices for different workloads. Thus, setting device
explicitly would be a better choice in the long run.

Why CI can't trigger this problem: because we directly run `python3 xxx`
on CI machine instead of using a standard ray cluster that has dedicated
CPUs for head. CI machines all have GPUs.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-25 00:06:22 +08:00
45323080ea [misc] fix: fix megatron entropy (#1672)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

In megatron-core, `vocab_parallel_log_probs_from_logits` is an inplace
operator that would modify the logits in place to save memory. This
makes the `vocab_parallel_entropy` produces incorrect results if
`vocab_parallel_entropy` is computed after
`vocab_parallel_log_probs_from_logits`. We swap the order to make sure
the result is correct.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-25 00:04:23 +08:00
7d26d7359e modify the installation method of vllm on different architectures and hyperlink (#1673)
…res and hyperlink

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

modify the installation method of vllm on different architectures and
hyperlink

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1、modify the installation method of vllm on different architectures
2、modify syntax of hyperlink 

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 21:54:32 +08:00
cf731e84d9 [sglang] Fix megatron support in sglang and add sglang_async support & CI tasks (#1602)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

- Fix sglang megatron support
- Add sglang_async megatron support
- Add CI task to protect megatron-sglang impl

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.


https://wandb.ai/swordfaith/gsm8k_async_rl/runs/6h7apmbn?nw=nwuserswordfaith

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: SGLang
### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>
2025-05-24 18:37:41 +08:00
69582dc177 Add verl-agent and GiGPO to the awesome work list (#1660) 2025-05-24 18:33:47 +08:00
3c048ac750 modify the instructions for using verl on ASCEND NPU (#1670)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

modify the instructions for using verl on ASCEND NPU

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1、Modify table format
2、Modify the installation method of vllm and vllm-ascend

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 18:31:53 +08:00
5dc64391fe [CI] fix: DAPO CI & response_mask (#1666)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes:

- DAPO CI triggering path patterns outdated since #1392
- `response_mask` computation missing but skipping the CI test in #1652 

### Tests

- [x] DAPO CI is correctly triggered and passed, e.g.,
https://github.com/volcengine/verl/actions/runs/15223958183/job/42823610223?pr=1666

### Additional Info.

- **Issue Number**: #1392 , #1652 
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 14:18:57 +08:00
4779f26164 [Refactor] fused kernel in forward (#1624)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Shifts fused_linear_for_ppo into model.forward for FSDP

### High-Level Design

Self explaining

### Specific Changes

- Update monkey patch to return log_probs and entropy instead of
last_hidden_state.

### API

No changes

### Usage Example

```sh
actor_rollout_ref.model.use_fused_kernels=True
```

### Test


![image](https://github.com/user-attachments/assets/c6af68fb-0200-4aee-9596-0b445afdc562)


### Additional Info.

- This is to fix #1565 
- The original bug arises because we tried to access
model.lm_head.weight from outside of the FSDP wrapped context.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 13:50:57 +08:00
02862103ba [Megatron] Support optimizer offload for moe when ep > 1 (#1638)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This simple PR adds support for
[ChainedOptimizer](75b1ca1361/megatron/core/optimizer/optimizer.py (L938))
offloading in the Megatron-LM training environment.

In Megatron-LM, ChainedOptimizer is used when expert parallelism
(expert_parallel > 1, related to #1467 ) is enabled—commonly in
Mixture-of-Experts (MoE) models.

This has been tested and validated with the Qwen3-235B-22A model
configuration.


### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
...
actor_rollout_ref.actor.megatron.optimizer_offload=True \
actor_rollout_ref.actor.megatron.expert_model_parallel_size=16 \
...
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Megatron]
- **Inference**: [none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: charlie.cs <charlie.cs@kakaocorp.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
2025-05-24 12:42:10 +08:00
72255445f2 [SGLang Async Rollout] Validate prompt_len + max_resp_len <= max_mode… (#1627)
…l_len before generation

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds a validation step to prevent generation requests that
exceed the model’s maximum context length in SGLang. Without this check,
multi-turn RL training can fail when the combined length of the prompt
and the maximum response exceeds the model limit. The new validation
ensures `prompt_len + max_resp_len <= max_model_len` before sending
requests to the SGLang engine.


### Test

Successfully tested with my multiturn RL dataset with `max_turns==30`
which keeps failing with the following error before this
change(Qwen2.5-32B-instruct + GRPO):
```
Traceback (most recent call last):
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 64, in main
    run_ppo(config)
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 76, in run_ppo
    ray.get(runner.run.remote(config))
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=1150536, ip=100.96.248.206, actor_id=85b22be1ed8ef671c739638a01000000, repr=<main_ppo.TaskRunner object at 0x796b0bba7010>)
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 183, in run
    trainer.fit()
  File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 872, in fit
    val_metrics = self._validate()
  File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 607, in _validate
    test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded)
  File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 49, in func
    output = ray.get(output)
ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_generate_sequences() (pid=1169888, ip=100.96.248.206, actor_id=6deb9fd4b4ff01530920ada301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7e41e90afa90>)
  File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 625, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/home/jobuser/resources/verl/single_controller/base/decorator.py", line 534, in inner
    return func(*args, **kwargs)
  File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 630, in generate_sequences
    output = self.rollout.generate_sequences_with_tools(prompts=prompts)
  File "/home/jobuser/resources/verl/utils/debug/performance.py", line 78, in f
    return self.log(decorated_function, *args, **kwargs)
  File "/home/jobuser/resources/verl/utils/debug/performance.py", line 88, in log
    output = func(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 613, in generate_sequences_with_tools
    output_req_list = loop.run_until_complete(
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 529, in _async_rollout_a_request
    output = await self._engine.async_generate(
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 265, in async_generate
    return await generator.__anext__()
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 403, in generate_request
    tokenized_obj = await self._tokenize_one_request(obj)
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 450, in _tokenize_one_request
    self._validate_token_len(obj, input_ids)
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 482, in _validate_token_len
    raise ValueError(error_msg)
ValueError: Requested token count exceeds the model's maximum context length of 32768 tokens. You requested a total of 34009 tokens: 23769 tokens from the input messages and 10240 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit.
```

### Additional Info.

- **Inference**: SGLang,

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 08:45:36 +08:00
96c181a2e6 chore(ci): support FSDP2 for multi-turn SGLangRollout with tool calling (#1650) 2025-05-23 22:52:04 +08:00
0528ba1185 [NPU] feat: Support FSDP worker and vLLM Ascend (#332)
For developers, you can follow the docs: docs/ascend/ascend.rst

This pr is committed for supporting Ascend NPU backend.
Co-authored-by: Chendong98
[chendong136@huawei.com](mailto:chendong136@huawei.com)
Co-authored-by: zheliuyu <15750543867@163.com>
Co-authored-by: celestialli
[celestialli@outlook.com](mailto:celestialli@outlook.com)
In this pr, we add the capability to determine the type of NPU device
and we also add a new script for training on NPU.

These are change lists:

1. pyproject.toml change verison of vllm
2. requirements-npu.txt requirements for NPU
3. verl/bert_padding.py Adapted from
https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
4. verl/single_controller/ray/base.py
5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py
6. verl/trainer/fsdp_sft_trainer.py
7. verl/utils/flops_counter.py
8. verl/utils/fsdp_utils.py
9. verl/workers/actor/dp_actor.py
10. verl/workers/critic/dp_critic.py
11. verl/workers/fsdp_workers.py
12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
13. verl/workers/sharding_manager/fsdp_vllm.py
14. verl/utils/device.py get device type for different device
15. docs/ascend/ascend.md 

Here are our roadmap:

**RoadMap**

- [x] sft
- [x] ppo
- [x] grpo

News

[2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on
2*8 devices, and many params related to batch_size need to be reduced.
So this result is only for reference. We will announce the reward
results of the default params as soon as sleep mode is supported.

[2025.03.03] Modify the adaptation method of Ray

[2025.02.25] The PPO algorithm is supported for training on NPU with the
FSDP backend.

[2025.02.23] The SFT algorithm is supported for training on NPU with the
FSDP backend.

[2025.02.21] The GRPO algorithm is supported for training on NPU with
the FSDP backend.

Requirements
We use this PR testing on Ascend NPU and GPU to ensure the same codes
can run on different devices. The device information is 8 Atlas 800T A2
and 8 A100. Other software information is shown in the following table.

| Software | Version | 
|:-------|-------:|
| transformers  | 4.47.1  | 
| accelerate      | 1.3.0  | 
| torch_npu      | 2.5.1.rc1|
|CANN             | 8.1.RC1 (Not Released)|

About mean error
Due to differences in hardware structure, we cannot guarantee that the
loss of Ascend NPU is exactly the same as that of the GPU. According to
our experience, the loss differences less than 2% is acceptable. If the
loss difference is greater than 2%, we will try to fix it. The
calculation formula is as follows.

![loss_comparison](https://github.com/user-attachments/assets/4f62f713-9240-4324-bf7d-3ae59fc85b05)


N represents the number of training steps. For more information, please
refer to [Calculation accuracy
description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html)

---------

Co-authored-by: Chendong98 <chendong136@huawei.com>
Co-authored-by: zheliuyu <15750543867@163.com>
2025-05-23 21:28:57 +08:00
a7b2e29cb6 fix: entropy in DAPO (#1652)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds entropy computation and logging to DAPO trainer, aligning
with other trainers.

### Additional Info.

- **Issue Number**: #1455
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-23 20:15:55 +08:00
c4faf5c94a [CI] feat: add ignore for CI of SPIN & SPPO (#1653)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds ignore patterns to CI for SPIN & SPPO.

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-23 20:15:32 +08:00
cdee00d628 fix: only load reference policy when needed in DAPO (#1651)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes wrong initialization so that verl only loads reference
policy when needed.

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-23 19:32:19 +08:00
9ddc72520e fix: add loss_agg_mode to critics (#1340)
# What does this PR do?

This PR adds `loss_agg_mode` to critics.

# Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info

- **Issue Number**: none
- **Training**: both
- **Inference**: none
2025-05-23 16:09:21 +08:00
aaaaaab900 Activation Offloading (#1220)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR supports activation offloading, and currently it's only for FSDP
backend.

### High-Level Design

Our implementation is based on the
[one](https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/cpu_offload.py)
in TransformerEngine. For efficiency, it groups activations by
TransformerLayer and offloads activation groups asynchronously. This
means that the offloading of the i-th activation group and the
computation of the i+1-th activation group happen at the same time, and
there are at most two activation groups in GPU memory.

### Specific Changes

1. Add activation offloading support.

### API

### Usage Example

``` 
export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=./data/gsm8k/train.parquet \
    data.val_files=./data/gsm8k/test.parquet \
    data.train_batch_size=512 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=./huggingface.co/Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.model.enable_activation_offload=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','tensorboard'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.val_before_train=False \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15

 ```


### Test

We conducted experiments on the Qwen2 7B model based on the above script. The memory and throughput data are shown in the figures below, where the blue line represents activation offloading.
<img width="351" alt="image" src="https://github.com/user-attachments/assets/207576a1-3f47-4b40-bf19-60cf8105d609" /> <img width="361" alt="image" src="https://github.com/user-attachments/assets/d58f0f8b-eb5f-4e19-a892-4d778ff26135" />

### Additional Info.

- **Issue Number**: none
- **Training**: This PR will affect FSDP backend
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-23 15:55:02 +08:00
54a5e6ee6d [megatron] feat: save hf model config in megatron checkpoint manager (#1562)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR enables the Megatron backend checkpoint manager to save hf model
config into verl checkpoints, and simplify our CI since the
`--hf_model_path` has been deprecated in
https://github.com/volcengine/verl/pull/1468, fixes the comment
https://github.com/volcengine/verl/pull/1468#issuecomment-2883541227.

Note: several changed lines in `verl/utils/megatron_utils.py` are
unrelated to this PR; they were automatically reformatted by pre-commit
hooks.

### Test

The current CI e2e tests should sufficient cover for this PR.

### Additional Info.

- **Training**: Megatron
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-23 14:50:48 +08:00
2c179dae23 Add explicit position_ids to model.generate in hf rollout (#1637)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Added position_ids parameter to the model.generate method call to
provide explicit control over token positions during text generation. I
don't quite understand why have obtained position ids above but not
passed them to generate, so I modified this.😂

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-23 09:43:49 +08:00
01abd3c77c Mert/passatk advantage (#1621) 2025-05-23 07:15:27 +08:00
6dfa11adb1 [docs] recipe: fix spin README (#1647)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-22 13:54:07 -07:00
04acd09d65 [megatron] optimization: avoid padding to logits (#1629)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

Avoid a huge memory overhead (bs*seq_len*vocab_size) when training with
megatron

given bs=4, seq_len=4k, vocab_size=150k, the memory overhead is about
4.8GB

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

calculate the log_p and entropy right after the sequence packed logits,
avoid the sequence unpack of logits

> List the specific changes.

add a logit_processor callback to forward_function, let megatron_actor
give the logit_processor to caluculate the log_p and entropy

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

recipe: qwen2-7B PPO gsm8k 
machine: 8*H100
Before changes:
    TP2: OOM
    TP4: OOM
    TP2PP2: about 56s/step, actor MFU of first step is 0.133
After changes:
    TP2: about 40s/step, actor MFU of first step is 0.165


> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-22 23:28:41 +08:00
867d3024bf [Recipe] SPIN: spin algorithm implementation (#1407)
This PR introduces an implementation of the Self-Play Fine-Tuning (SPIN)
algorithm, adapting the existing PPO framework within verl.

You can find more information about SPIN here:
https://github.com/uclaml/SPIN


This implementation adapts the PPO framework for SPIN/Online DPO,
involving these core changes:

* **Objective & Loss:**
* PPO maximizes cumulative reward via policy gradients and value
estimates.
* SPIN/Online DPO directly optimizes preference likelihood using a
**DPO-specific loss function** (e.g., sigmoid loss).
* **Code Change:** The primary logic change is in the actor's update
step (`dp_actor.py:
SPINDataParallelPPOActor.update_policy_dpo_with_ref`, `fsdp_workers.py:
SPINRolloutRefWorker.update_actor_dpo`) and the loss calculation
(`core_algos.py: compute_online_dpo_loss`).

* **Model Requirements:**
    * PPO uses Actor, Critic, and optionally Reward/Reference models.
* SPIN/Online DPO uses Actor and a **mandatory Reference Model** for the
loss calculation, plus a reward source for *preference labeling*. **No
Critic is needed.**
* **Code Change:** The `ray_trainer.py` logic was modified to
manage/update the reference model and *not* initialize/use the critic
worker. `fsdp_workers.py` was updated for reference model initialization
and checkpointing.

* **Update Signal:**
* PPO relies on **Advantage estimates** derived from rewards and the
critic's value function.
* SPIN/Online DPO uses the **log probability difference** between
chosen/rejected pairs under the policy and reference models.
* **Code Change:** Advantage calculation (`compute_advantage`) was
removed from the training loop in `ray_trainer.py`. Preference
determination (`compute_onlineDPO_pref` in `core_algos.py`) was added.

* **Data:**
    * PPO uses (prompt, response, reward, value) tuples.
* SPIN/Online DPO effectively uses (prompt, chosen_response,
rejected_response) tuples, requiring preference data generation.
* **Code Change:** Data processing in `ray_trainer.py` (`fit_dpo`) was
adapted to handle preference pairs and prepare the specific inputs
needed for `update_policy_dpo`.

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-05-22 07:08:59 -07:00
c803b1f769 [BugFix] Fix sglang and vllm engine args (#1634)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

#1616 causes vllm engine arg init failed, not know why CI of that PR
fail to detect. Some errors have shown up.


![image](https://github.com/user-attachments/assets/ac6bb86e-1576-458e-b341-0e949724ac12)

We may better separate engine args for different inference systems

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

```yml
    engine_kwargs: # inference engine parameters
      vllm:
        swap_space: null # null means "use the engine default value" (usually 4 GB), setting it to, e.g., 32 means 32 GB
      sglang:
        attention_backend: null # null means use the engine default value, available options: flashinfer, triton, flashmla
```

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-22 20:41:17 +08:00
6cbb051753 [PRIME]bug fix: reward scoring hangs after progress reaches 100% (#1466)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Fixes a hang issue during reward scoring where the progress bar
reaches 100% but the program does not continue. Adds robust support for
asynchronous reward computation with subprocess cleanup.

### High-Level Design

> This PR refactors the reward scoring pipeline (`PRIMERewardManager`)
to: Adds forced cleanup of lingering subprocesses using psutil to avoid
deadlocks.


### Specific Changes

- Replaces `asyncio.run()` inside `verify()` with an external event loop
using `new_event_loop()` for better compatibility with Ray and training
frameworks.
- Replaces `proc.kill()` with `psutil.terminate()` and `.wait()`, and
move from `exception` to `finally` to clean up worker subprocesses
safely and avoid zombie processes.

### Additional Info.

- **Issue Number**: #288(maybe)
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-22 16:43:08 +08:00
1cfa2be530 [Megatron][BREAKING] Allow override of transformer config to enable custom megatron features like variable PP layers distribution, with CI tests (#1555)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Allow to override of transformer config to enable custom megatron
features like variable PP layers distribution, with CI tests, which is
in need for larger moe models with 94 layers (Qwen3 moe) or 61 layers
(DeepSeek V3)

We will first fix e2e_prime CI by use fused kernels.

**Notice that now the imbalance PP layers distribution only compatible
with dist_ckpt load and save, not support huggingface direct
load/save.**

Also, other megatron arguments can be passed through scripts.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

Breaking APIs:

```py
class MegatronWorker(Worker):
    def _init_hf_config_and_tf_config(self, model_path, dtype, override_model_config, override_transformer_config):

# and the models building
```

```yaml
  actor:
    megatron:
      override_transformer_config: {} # common transformer config for all models
```

To avoid trouble of input same transformer config arguments, other
models will reuse actor's config, so just need to input once.

### Usage Example

```bash
run_ppo_trainer_megatron.sh \
+actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=13 \
+actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=11
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: Megatron
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-22 13:38:34 +08:00
be215d7b08 [docker] aws efa driver dockerfile (#1631)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add sample dockerfile to support aws efa driver. Otherwise NCCL raise
system error on such aws instances (like sagemaker ai pod).
2025-05-21 21:53:21 -07:00
c07013ea39 [vllm] feat: add engine_kwargs in vllm_rollout_spmd to set params in vllm engine (#1442)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> add `engine_kwargs` in vllm_rollout_spmd to set swap_space in vllm
engine.

### Specific Changes

> add `engine_kwargs` in vllm_rollout_spmd, which can be set in config
file. Same changes has been made in vllm_rollout. As the version of vllm
is update to 0.8, the default vllm_rollout worker becomes
vllm_rollout_spmd, which does not have `engine_kwargs` as in
vllm_rollout, so this RP want to complete it.

### Usage Example

> users can set vllm engine param such as `swap_space`, `seed` through
the `engine_kwargs` in config file. For example, if one want to set the
swap_space=32 in vllm, he can set the item in config like this
```bash
actor_rollout_ref.rollout.engine_kwargs.swap_space=32
```
2025-05-21 20:23:09 -07:00
821689e0e9 docs: Add RM-R1 to awesome work list (#1608)
### What does this PR do?

> Adding RM-R1 to the README as a list of work that used veRL

### High-Level Design

> 1-line update

### Specific Changes

> only changed the readme.md (1-line update)

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-21 16:09:03 -07:00
72683a7ced Expose engine_kwargs from SGLang to Verl configuration (#1616)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Expose `engine_kwargs` from SGLang to Verl configuration

This PR enables RL users to configure `engine_kwargs` directly through
Verl, providing more control and flexibility over inference behavior.



### High-Level Design

One key motivation is the choice of attention backend, which can
significantly affect rollout performance. The SGLang team has observed
that different attention backends perform better in different phases:

FA3 tends to be more efficient during the prefill stage.

FlashInfer or Triton generally offer better performance during decode.

Moreover, the optimal backend may change across versions of SGLang. By
exposing these parameters, we allow users to tune their setup based on
the specific use case and version, ultimately improving performance and
adaptability.
> Add one-line overview of what this PR aims to achieve or accomplish. 

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

In my setup about QWen 2.5 7B Instruct on H200:
```
timing_s/step:106.761 (flash infer)
timing_s/step:100.520 (fa3)
timing_s/step:100.364 (triton)
```

Hence, I would suggest our team to use fa3 or triton for now.

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-21 08:49:05 -07:00
d475654a2b [minor] fix: use init_empty_weights instead of torch.device("meta") (#1587) 2025-05-21 23:35:40 +08:00
cbc02ebc37 [fix] img not displaying on single controller doc (#1622)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

fix the img missing in single controller doc

### High-Level Design

NA

### Specific Changes

- add `?raw=true` to img link in single_controller doc
- move the single_controller doc along with hybridflow programming guide
in index

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-21 20:23:40 +08:00
2dc3e0ebad [recipe] feat: support running dapo using main_ppo (#1612)
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

- Add two scripts that run dapo using main_ppo with FSDP and megatron
backend
- Fix val reward manager init issue
- Fix missing keys in ppo_trainer_megatron.yaml
- Fix megatron optimizer offload when the optimizer state has not been
initialized.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-21 16:42:29 +08:00
8970cb05f3 Simplify FSDP SGLang pre and post process (#1609)
### Checklist Before Starting
Suggested by @BearBiscuit05, we can follow fsdp_vllm to simplify the
device mesh and tp rank logic.


- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test
```
Training Progress:   7%|▋         | 15/210 [25:27<5:27:08, 100.66s/it]
(WorkerDict pid=1408128) update_weights_from_tensor time: 0.8615233898162842 seconds
(WorkerDict pid=1408128) self.sampling_params={'n': 5, 'max_new_tokens': 1024, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False}
(TaskRunner pid=1406018) list(reward_extra_infos_dict.keys())=[]
(WorkerDict pid=1424002) update_weights_from_tensor time: 1.0217373371124268 seconds [repeated 7x across cluster]
(WorkerDict pid=1424002) self.sampling_params={'n': 5, 'max_new_tokens': 1024, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 7x across cluster]
(TaskRunner pid=1406018) step:16 - global_seqlen/min:215129.000 - global_seqlen/max:226140.000 - global_seqlen/minmax_diff:11011.000 - global_seqlen/balanced_min:220875.000 - global_seqlen/balanced_max:220876.000 - global_seqlen/mean:220875.250 - actor/entropy_loss:0.116 - actor/kl_loss:0.008 - actor/kl_coef:0.001 - actor/pg_loss:-0.013 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.088 - perf/mfu/actor:1.919 - perf/max_memory_allocated_gb:26.795 - perf/max_memory_reserved_gb:52.268 - perf/cpu_memory_used_gb:77.170 - actor/lr:0.000 - training/global_step:16.000 - training/epoch:2.000 - critic/score/mean:0.957 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.957 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.003 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.003 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:241.269 - response_length/max:725.000 - response_length/min:55.000 - response_length/clip_ratio:0.000 - prompt_length/mean:103.849 - prompt_length/max:232.000 - prompt_length/min:66.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:43.595 - timing_s/reward:0.898 - timing_s/old_log_prob:8.819 - timing_s/ref:8.876 - timing_s/adv:0.125 - timing_s/update_actor:36.083 - timing_s/step:98.803 - timing_per_token_ms/gen:0.035 - timing_per_token_ms/update_actor:0.020 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.005 - perf/total_num_tokens:1767002.000 - perf/time_per_step:98.803 - perf/throughput:2235.507
```
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-21 12:45:31 +08:00
80af51b609 [constant scheduler] fix: model won't be updated on first training step (#1463) 2025-05-20 20:57:15 -07:00
add17f029e [megatron] support megatron expert parallel (#1467)
### Checklist Before Starting


### What does this PR do?

support expert parallel in megatron


### High-Level Design

introduce EPsize and ETPsize
ETPsize is the TPsize for MoE parts, recommended to set 1, meaning that
MoE parts not use TP


### Specific Changes

1. mcore model initilize
2. megatron vllm parameter transfer

### API

### Usage Example


```bash
LLM=models/Qwen1.5-MoE-A2.7B-Chat
NODES=1
PP=2
TP=4
VLLM_TP=4
EP=4
ETP=1

python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\
    algorithm.adv_estimator=gae \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=128 \
    data.max_prompt_length=1024 \
    data.max_response_length=512 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=$LLM \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    critic.optim.lr=1e-5 \
    critic.model.path=$LLM \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size_per_gpu=1 \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_megatron_gsm8k_examples' \
    trainer.experiment_name='qwen_moe_instruct_1node_ep' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=$NODES \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=$VLLM_TP \
    actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=$PP \
    actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=$PP \
    critic.megatron.pipeline_model_parallel_size=$PP \
    actor_rollout_ref.actor.megatron.tensor_model_parallel_size=$TP \
    actor_rollout_ref.ref.megatron.tensor_model_parallel_size=$TP \
    critic.megatron.tensor_model_parallel_size=$TP \
    actor_rollout_ref.actor.megatron.expert_model_parallel_size=$EP \
    actor_rollout_ref.ref.megatron.expert_model_parallel_size=$EP \
    critic.megatron.expert_model_parallel_size=$EP \
    actor_rollout_ref.actor.megatron.expert_tensor_parallel_size=$ETP \
    actor_rollout_ref.ref.megatron.expert_tensor_parallel_size=$ETP \
    critic.megatron.expert_tensor_parallel_size=$ETP \
    actor_rollout_ref.actor.megatron.use_dist_checkpointing=True \
    actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \
    critic.megatron.use_dist_checkpointing=True \
    actor_rollout_ref.actor.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \
    actor_rollout_ref.ref.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \
    critic.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \
    actor_rollout_ref.actor.megatron.param_offload=True \
    actor_rollout_ref.ref.megatron.param_offload=True \
    critic.megatron.param_offload=True \
    trainer.total_epochs=100 $@
```

### Test


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-05-21 11:05:11 +08:00
7b0426a738 [Docker Image] update images and fix sglang installation (#1606)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

update images and fix sglang installation, the latest image:
`whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3`

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- vLLM: 0.8.5.post1
- SGLang: 0.4.6.post4, fix installation
- Megatron: core_v0.12.0 announcement
- TransformerEngine: 2.3

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-21 09:13:51 +08:00
H
d13507229a docs: update adoption and doc index (#1607)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

update adoption and doc index

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-21 09:12:25 +08:00
0d3360a12d Remove unnecessary broadcast (#1597)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Remove unnecessary broadcast

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

Removed one line
> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test
```bash



(WorkerDict pid=1099764) Before broadcast: TensorDict(
(WorkerDict pid=1099764)     fields={
(WorkerDict pid=1099764)         attention_mask: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         input_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         position_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         prompts: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         responses: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True)},
(WorkerDict pid=1099764)     batch_size=torch.Size([330]),
(WorkerDict pid=1099764)     device=None,
(WorkerDict pid=1099764)     is_shared=False)



(WorkerDict pid=1099764) After broadcast: TensorDict(
(WorkerDict pid=1099764)     fields={
(WorkerDict pid=1099764)         attention_mask: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         input_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         position_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         prompts: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True),
(WorkerDict pid=1099764)         responses: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True)},
(WorkerDict pid=1099764)     batch_size=torch.Size([330]),
(WorkerDict pid=1099764)     device=None,
(WorkerDict pid=1099764)     is_shared=False)
```


> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-20 10:15:28 -07:00
f41a57a827 [misc] docs: add design doc of single_controller (#1549)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add the design doc for `verl.single_controller`.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- `docs/single_controller.rst`
- `docs/imgs/call_generate_sequences.png`
- `docs/imgs/worker_group_init.png`

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-20 09:24:00 -07:00
1d4e23a562 [BugFix] Megatron: fix missing grad_norm and lr calculation, and fix fsdp grad_norm storage (#1601)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

fix Megatron missing grad_norm and lr calculation, and fix fsdp
grad_norm storage

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

Tested Qwen2-7b with FSDP. Different configuration makes the divergence.

<img width="387" alt="image"
src="https://github.com/user-attachments/assets/183c62d0-a86a-4f4b-8168-d98c98961f7b"
/>

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-20 22:04:12 +08:00
15b1b15f99 Support multi-turn rollout with Qwen chat template (#1593)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Support multi-turn rollout with Qwen chat template

### Specific Changes

Currently, Verl's multi-turn rollout only supports ChatML-style
messages. However, Qwen uses a different chat formatting template with
the following key differences:

1. Qwen uses the `user` role tag to wrap tool responses.
2. Qwen merges consecutive tool responses into a single message.

**For example**, for parallel tool calls, ChatML renders consecutive
tool responses like this:
```
<|im_start|>tool
tool response content 1<|im_end|>
<|im_start|>tool
tool response content 2<|im_end|>
```
In contrast, the Qwen chat template renders them as:
```
<|im_start|>user
<tool_response>
tool response content 1
</tool_response>
<tool_response>
tool response content 2
</tool_response><|im_end|>
```

This PR introduces a new `qwen` format option in the config to support
this tool message style.

### Usage Example

Set the multi-turn format to `qwen`:
```
multi_turn:
    enable: True
    max_turns: 5
    format: qwen
```

### Test

Verified the rendered messages via print output to ensure:
1. ChatML format remains unchanged.
2. Qwen format aligns with the Qwen chat template as defined in its
HuggingFace tokenizer config.

Tested across the following scenarios:
1. Assistant message without tool calls.
2. Assistant message with one tool call + one tool response message.
3. Assistant message with parallel tool calls + multiple consecutive
tool response messages.

### Additional Info.
- **Inference**: SGLang

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-20 16:56:57 +08:00
a3c4cb386c Disable fused kernels in prime (#1598)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Currently, the `e2e_prime` test encounters the error` AttributeError:
'NoneType' object has no attribute 'squeeze'`, which is caused by [
#1212].

In PR [#1568], the parameter `use_fused_kernel` in `ppo_trainer.yaml`
was set to `false`, but the corresponding parameter in
`prime_trainer.yaml` was not updated. This is preventing the CI from
passing. Before the root cause of `use_fused_kernel` is fully resolved ,
I guess we should temporarily set `use_fused_kernel` to `false` in
`prime_trainer.yaml`
### High-Level Design

Not needed

### Specific Changes

- Default use_fused_kernels = False

### API

Not needed

### Usage Example

Not needed

### Test

Not needed

### Additional Info.

Not needed

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-20 16:27:33 +08:00
3eaaf24d5a [rollout] perf: replace AsyncOpenAI to aiohttp client in ChatCompletionScheduler (#1588)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

AsyncOpenAI has very severe performance issue due to httpx, replace it
to aiohttp client. For train_batch_size=1024, AsyncOpenAI introduces
~25s per generation phase.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-20 11:31:19 +08:00
457ccd9962 [feat] support logging to ClearML (#1582)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Support logging to [ClearML](https://clear.ml/) system



### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-20 10:27:40 +08:00
88527e6aa5 [BugFix] Megatron: fix checkpoint manager, some states only rank 0 need to save to avoid errors (#1586)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Megatron: fix checkpoint manager, some states only rank 0 need to save
to avoid errors

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-20 09:47:06 +08:00
8160ec6a58 Bump to sglang 0.4.6.post4 & unified generate sequences ability between sgl and sgl async (#1577)
### Checklist Before Starting

- [x] Search for similar PR(s).
- Thanks to:
  - close #1558 due to mix of prs
  - close #1449 due to partial fix sgl new version issue
  - close #1300 which is part of current pr
- This pr is co-authored with @ocss884 

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

- bump sglang to 0.4.6.post4
- unified sglang and sglang_async `generate_sequences` api behavior,
e.g. image support
- fix warning for cuda barrier at start of fsdp_workers

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: ocss884 <ocss.lin@gmail.com>
2025-05-20 09:39:07 +08:00
8788e55807 [doc] single_controller: Adding doc strings and doc pages for public methods in single_controller (#1396)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds doc string for the public methods inside
`single_controller` module, so that these methods can be reused and
referenced better.
A new doc page `Single Controller Interface` was also added under the
API Reference section.

![Screenshot 2025-05-04 at 4 58
23 PM](https://github.com/user-attachments/assets/3848b0d3-fbab-4023-915f-47620ed2676a)

### TODO:

This is the first of a series of PRs to improve and stabilize the docs
and API. TODOs include:

* `verl/trainer` docs
* `verl/utils` docs
* Generally refine doc string of the whole repo 

Next PR to review is #1397 

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-19 16:44:05 -07:00
877e097f74 README: add back DeepRetrieval and add a new work s3 (#1592)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> (1) Add back DeepRetrieval (the **first** search agent framework
interacting with search engine) to the "awesome work" of main page, and
(2) add a new work s3 (much more efficient way (70x less data) to train
an powerful search agent!)

### High-Level Design

> Only updates two readme files.

### Specific Changes

> (1) Added "- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval):
RL Training of **Search Agent** with **Search/Retrieval Outcome**
![GitHub Repo
stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)" to the
main page's README.md. (2) Added "- [s3](https://github.com/pat-jj/s3)
**Efficient Yet Effective** Search Agent Training via RL ![GitHub Repo
stars](https://img.shields.io/github/stars/pat-jj/s3)" to the
recipe/README.md

### API

> N/A

### Usage Example

> N/A

### Test

> N/A

### Additional Info.

N/A

### Checklist Before Submitting

- [] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [N/A] Add `[BREAKING]` to the PR title if it breaks any API.
- [N/A] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [N/A] Add CI test(s) if necessary.
2025-05-19 16:28:31 -07:00
ab24d7b5bb [BugFix] fix sglang CI, use stable way to download Qwen 7B model (#1585)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix sglang CI, use stable way to download Qwen 7B model

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-19 22:47:13 +08:00
5b24e01d56 [misc] use lazy import on megatron utils components (#1551)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR moves the `from megatron.core import ModelParallelConfig,
tensor_parallel` into lazy import, so that when utilities who don't use
these modules are imported, we don't always import `megatron.core.xxx`
by default.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-19 13:08:32 +08:00
8176d3b96d [Megatron] Qwen3moe-part3: fix mcore qwen3 moe config, no need for patching now, offer option to freeze moe router (#1540)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

fix mcore qwen3 moe config `moe_router_pre_softmax`, no need for
patching now, and offer option to freeze moe router

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

Moe models initialization:

```py
def initialize(self, **kwargs):
```

### Usage Example

```python
    moe_router_pre_softmax=False,
```

```yaml
    override_config:
      moe_config:
        freeze_moe_router: False
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-19 11:26:43 +08:00
8845a33d6c [misc] ci: fix typo in PULL_REQUEST_TEMPLATE.md (#1571) 2025-05-19 09:58:30 +08:00
8653b1b200 [misc] feat: support return full prompt with chat template in RLHFDataset (#1567) 2025-05-19 01:13:21 +08:00
b9a6890ff3 disable fused kernels by default (#1568) 2025-05-18 23:27:34 +08:00
530154e153 [merger] fix: avoid setting torch's global device to meta (#1564)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes several issues
(https://github.com/volcengine/verl/issues/1484,
https://github.com/volcengine/verl/issues/1255) that cause the error:
"Cannot copy out of meta tensor; no data!".

The related code in our part is:

d36b5e81d6/scripts/model_merger.py (L131-L132)

The `torch.device("meta")` context manager sets the current global torch
device to "meta". During `auto_model_class.from_config`, various import
statements load third-party libraries, whose `__init__.py` files may
contain global statements that use torch for calculations.

For example, transformers imports
[[torchao](5549da8af9/torchao/optim/subclass_4bit.py (L33)),
which executes the following during initialization:

```python
QMAP_UNSIGNED = torch.linspace(0, 1, 17)[1:].tolist()  # no zero
```

In this case, when using the `torch.device("meta")` context manager,
`torch.linspace(0, 1, 17)` gets created on the meta device, which only
assigns metadata and cannot be moved to CPU. This causes the `.tolist()`
call to fail with the error "Cannot copy out of meta tensor; no data!"

To fix this, we're now using `init_empty_weights` from `accelerate`,
which patches `nn.Module.register_parameter` instead of patching torch's
global device
(417bc52965/src/accelerate/big_modeling.py (L96-L170)),
thus avoiding this issue.

Here's a simple illustration:

```python
>>> import torch
>>> from accelerate import init_empty_weights
>>> with init_empty_weights():
...     QMAP_UNSIGNED = torch.linspace(0, 1, 17)[1:].tolist()
... 
>>> QMAP_UNSIGNED
[0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.5, 0.5625, 0.625, 0.6875, 0.75, 0.8125, 0.875, 0.9375, 1.0]
>>> with torch.device("meta"):
...     QMAP_UNSIGNED = torch.linspace(0, 1, 17)[1:].tolist()
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
NotImplementedError: Cannot copy out of meta tensor; no data!
```

cc @ETOgaosion 

### Additional Info.

- **Issue Number**: Fixes issue
https://github.com/volcengine/verl/issues/1484,
https://github.com/volcengine/verl/issues/1255,
https://github.com/volcengine/verl/pull/1468#issuecomment-2886345570
- **Training**: both
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-18 19:17:27 +08:00
d36b5e81d6 Add missing fi to install script (#1559) 2025-05-18 11:15:57 +08:00
Lei
40dcabec38 [BUG] Fix silent bug of using dtype from previous loop scope in build_memory_reference_from_module() (#1553)
This pull request includes a minor fix in the
`build_memory_reference_from_module` function within
`verl/utils/memory_buffer.py`. The change ensures that the correct data
type is passed when calculating the padded number of elements.

* **Bug Fix**:
- Updated the `calc_padded_numel` function call to use `param.dtype`
instead of `dtype`, ensuring compatibility with the parameter's actual
data type.
(`[verl/utils/memory_buffer.pyL107-R107](diffhunk://#diff-77d53102508293685e0b9a1281dbacf7720fb8070db73157aa90157d516004a4L107-R107)`)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-17 10:55:28 +08:00
b8bd596811 [Docker Image] use latest vLLM (0.8.5) to fully support Qwen3 moe (#1544) 2025-05-17 07:28:55 +08:00
3f4647f9bc [model merger] refactor model merger for better usage and maintainability (#1468)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR refactors `model_merge`, making the code cleaner and more
maintainable:

- now verl checkpointer manager will save model config and
processor/tokenizer (introduced in
https://github.com/volcengine/verl/pull/1288), so there is no need for
`hf_model_path`. This PR deprecates this argument and keeps it for
backward compatibility.
- the current `model_merge` has two purposes, merge checkpoints and test
checkpoints (mainly for CI). This PR separates these two purposes into
two sub-commands to better manage user input argument for improved user
experience.
- generally cleans up the code and makes it look better.

### Test
Our current CI hasn't tested DDP+FSDP e2e training. This PR also adds
DDP+FSDP e2e into CI and tests merging DDP+FSDP checkpoints.

The current CI should test this PR correctly.


### Additional Info.

- **Training**: both
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-16 23:53:08 +08:00
eb077f66e5 Feat/memory optimized loss (#1212)
# What does this PR do?

This PR implements fused losses for alignment. #710
It reduces the memory required for loss calculation to a small constant
amount.

# ChangeLog:

- added the option use_fused_kernels
- monkey patch to make model.forward return last_hidden_state and not
calculate logits
- Added FusedLinearForPPO to verl/utils/experimental/torch_functional.py

# Usage

Simply add the following option
```
actor_rollout_ref.model.use_fused_kernels=True
```

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info:
- The current implementation uses chunking to reduce the memory
consumption to a constant value.
- It works by splitting the loss calculations into chunks of 512 tokens.
Calculating the log_probs / entropy values / gradients for each chunk
and accumulating them.
- However the current implementation can be slow. It processes each
chunk sequentially in a python for loop.
- In the future we should consider converting the fused functions into
triton or some other JIT solution.
- Compared to FusedPPOLossFunction, optimizing hidden_states -> entropy
& log_probs is much better for algorithm developers as the memory heavy
part is optimized away for them and they are free to combine the values
for their own custom loss functions.

---------

Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-05-16 22:52:54 +08:00
b52956409c [megatron] Qwen3moe-part 2: Allow Infer and train tp to be different with CI tests, Fix vllm resharding process (#1444)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. This PR eliminates the micro-dp group as the article says, and
support train-infer tp to be different.
2. Side Effect: able to run Qwen3moe on megatron aligned with FSDP.
3. CI tests have been added to check the effect.

### High-Level Design

This PR eliminates the micro-dp group as the article says, since the
`generate_sequence` process only relates to inference engine, there is
no need for us to consider the training side.

The only problem now is that the `dispatch/collect` function cannot
directly use the inference parallel size, so current solution is that we
define a new `MEGATRON_ALL_DP` dispatch method to view all ranks as Data
Parallel rank, which is the same as FSDP.

So we follow the way of FSDP to pre/post-process the data.

### Specific Changes

Mainly in `megatron_vllm.py`

### API

None

### Usage Example

```sh
actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \

# or

actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
```

### Test

Added CI tests.

For e2e test with Qwen 2.5 7B, please refer to
`examples/grpo_trainer/run_qwen2_5-7b_math_megatron_diff_tp.sh`

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: Megatron
- **Inference**: vLLM

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-16 16:39:01 +08:00
12bb85777d [Refactor] Add middle truncation (#1488)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds support for a new truncation mode, **middle**, for loading
datasets. It enables data that exceed the `max_prompt_length` to retain
both the beginning and the end of the prompt, instead of truncating
content only from the left or only from the right.

### High-Level Design

The implementation introduces a `"middle"` option, alongside the
existing truncation modes, making changes in both `rl_dataset.py` and
`torch_functional.py`. When selected, the logic splits the allowed max
length roughly in half and keeps the head and tail of the sequence,
effectively discarding the middle section.

### Specific Changes

**In `verl/utils/dataset/rl_dataset.py`:**  
- Added support for `self.truncation == "middle"` at line ~233.
- Performs symmetric truncation from both ends of the prompt:
```python
elif self.truncation == "middle":
    left_half = raw_prompt_ids[: self.max_prompt_length // 2]
    right_half = raw_prompt_ids[-self.max_prompt_length // 2 :]
    raw_prompt_ids = left_half + right_half
```

**In `verl/utils/torch_functional.py`:**  
- Added support for `"middle"` truncation mode in the `postprocess_data`
function.
- Updated truncation assertion to include `"middle"`:
```python
assert truncation in ["left", "right", "middle", "error"]
```
- Implemented middle truncation logic:
```python
elif truncation == "middle":
    left_half = max_length // 2
    right_half = max_length - left_half
    input_ids = torch.cat([input_ids[:, :left_half], input_ids[:, -right_half:]], dim=-1)
    attention_mask = torch.cat([attention_mask[:, :left_half], attention_mask[:, -right_half:]], dim=-1)
```

### API

- Adds `"middle"` as a valid option to the `truncation` argument in the
API.

### Usage Example

```python
# Example usage when loading prompts with middle truncation
from verl.utils.dataset.rl_dataset import RLDataset

# Assume tokenizer and other necessary args are already initialized
rl_dataset = RLDataset(
    ...,  # other args
    truncation="middle"
)
```

### Test

This change aligns with precedents from long-context evaluation
benchmarks, where *middle truncation* is the default/preferred method
for handling overly long inputs:

- [LongBench
implementation](2e00731f8d/LongBench/pred.py (L56))
([paper](https://arxiv.org/pdf/2308.14508))
- [InfiniteBench
implementation](51d9b37b0f/src/eval_utils.py (L413))
([paper](https://arxiv.org/pdf/2402.13718))

Both benchmarks favor middle truncation for long inputs, as it better
preserves relevant context information from both the beginning and end
of the sequence.

### Additional Info.

- **Issue Number**: N/A (no linked issue yet)
- **Training**: None affected
- **Inference**: None affected

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Co-authored-by: Wang Siyuan <v-siywang@microsoft.com>
Co-authored-by: Wang Siyuan <wsy0227@sjtu.edu.cn>
2025-05-16 11:31:57 +08:00
2c991f6ca2 [megatron] fix head_dim in GQA model when load from hf ckpt (#1513)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

fix head_dim in GQA model when load from hf ckpt

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Change the acquisition methods of q and kv head_dim to be compatible
with GQA.
- Add the conversions of q_layernorm and k_layernorm in
convert_megatron_model_to_transformers_model for Qwen3.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue #1510

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-05-16 10:21:57 +08:00
H
771bd756b3 [misc] docs: move dev folder to scripts. add sandbox documentation to index.rst. (#1539)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

- move dev folder to scripts @ETOgaosion 
- add sandbox documentation to index.rst @chenhaiq  
- installation docs have been updated

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-16 08:12:31 +08:00
a43db53bb5 [chore] refactor: clean utils code. (#1290)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-05-15 16:20:34 -07:00
4e9586a3a0 Fix reinforce_plus_plus_baseline advantage mask (#1527) 2025-05-15 23:39:33 +08:00
6de40fcdfa fix #1534, sglang_async missing offload_param config (#1536)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-15 22:43:00 +08:00
146676091f [misc] fix: no need to use world_size to decide whether to use full_tensor in FSDP2 (#1529)
[misc] fix: no need to use world_size to decide whether to use
full_tensor() for FSDP2 state_dict() when world_size==1

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR simplifies the parameter loading logic within the
`FSDPVLLMShardingManager` by removing an unnecessary `world_size` check
when determining whether to call `full_tensor()` on parameters obtained
from an FSDP2 model's `state_dict()`. As the FSDP2 parameters are all
`DTensor`.

### High-Level Design

The change modifies the update_params method. When loading weights into
the vLLM model, parameters from the FSDP state_dict() (which might be
ShardedTensor or DTensor instances under FSDP2 when world_size == 1) are
converted to full tensors using param.full_tensor(). This PR ensures
this conversion happens if the full_tensor() method is available on the
parameter, without an additional, potentially incorrect, check against
world_size == 1.

### Specific Changes

Skip. See file changes

### API

No

### Usage Example

No

### Test

No CI changes

### Additional Info.

- **Issue Number**: No
- **Training**: [Note which backend this PR will affect: FSDP
- **Inference**: [Note which backend this PR will affect: vLLM

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-15 19:18:00 +08:00
11622fc72f Add Seed-Coder project in README (#1532)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Update README to add Seed-Coder as an example project using verl. 

### High-Level Design

N/A

### Specific Changes

Add one line in README about the Seed-Coder project.

### API

N/A

### Usage Example

N/A

### Test

N/A

### Additional Info.

N/A

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-15 18:01:06 +08:00
OC
2c8b2b995f [feat] Sandbox: support sandbox fusion on FaaS & localhost (#1429)
### Checklist Before Starting

- [ *] Search for similar PR(s).

### What does this PR do?

Implement sandbox fusion backend on FaaS. For example, reward score
using a FaaS instance on volcengine.com. It have better performance and
security comparing to local sandbox.

### Specific Changes

Added a code branch in _default_compute_score to choose sandbox
according to sandbox_fusion_url configuration.


### Usage Example

examples/ppo_trainer/run_deepseek7b_llm_sandbox_fusion.sh

### Test

tests/reward_score/test_sandbox_fusion.py
However, the new testcase requires to setting Sandbox API URL in env
SANDBOX_FUSION_URL. If the env is not set, most testcases will be
skipped.

### Additional Info.

Using sandbox on Faas have save 60% time on reward process comparing
local sandbox:
<img width="273" alt="截屏2025-05-07 20 37 05"
src="https://github.com/user-attachments/assets/fc9c0e23-6afe-4f34-a28a-a1756e85d45f"
/>


### Checklist Before Submitting

- [*] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [*] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [*] Add `[BREAKING]` to the PR title if it breaks any API.
- [*] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [*] Add CI test(s) if neccessary.
2025-05-15 17:53:47 +08:00
e12edc7f35 [lr_schedular] fix: implement proper min_lr_ratio support in cosine scheduler (#1400)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix the cosine learning rate scheduler to properly respect
`min_lr_ratio` parameter during both warmup and decay phases.

Update warmup phase to start from `min_lr_ratio` instead of 0, ensure
decay phase never goes below `min_lr_ratio`, and add explicit
`num_cycles` parameter to scheduler config.

Set default values in configuration files and handle `null` values since
in some example yaml config, the `min_lr_ratio` is set to `null`.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: fix https://github.com/volcengine/verl/issues/1376
- **Training**: FSDP
- **Inference**: None

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-15 09:49:11 +08:00
537003548d [bugfix] correct retrieval of max_position_embeddings from config (#1520) 2025-05-15 07:06:23 +08:00
2d16173baa [doc] update docs for custom tool config (#1523)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Update the docs for Custom Tool Configuration, fixing one broken link
and providing more instructions.

### High-Level Design

N/A

### Specific Changes

- fix broken link to gsm8k_tool_config.yaml
- update docs about custom tool config

### API

N/A

### Usage Example

N/A

### Test

N/A

### Additional Info.

- **Issue Number**: #1511
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-14 21:25:28 +08:00
9b45fc14f7 Skip max_position_embeddings > sequence length check for vLLM rollouts if RoPE scaling is used (#1522)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Allow usage of sequence lengths longer than model's
`max_position_embeddings` when RoPE scaling is used.

Added documentation on how to override RoPE scaling config for models
that support RoPE scaling, but don't have it in its config.json file.


### Specific Changes

Skip context length greater than sequence length check for vLLM rollouts
if RoPE scaling is used.


### API

No API changes

### Usage Example

Please see the updated docs for example usage.

### Test

I didn't capture any metrics, but I verified this works for my own
training run with Qwen/Qwen2.5-7B-Instruct with long contexts.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: This affects vLLM, but I can also update SGLang. I've
only tested vLLM for my use case

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-14 19:42:07 +08:00
258a0d92ed [metrics] Add diversed reduce metrics method according to key name (#1497)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

Add support to reduce max and min with np.max and np.min

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-14 16:12:12 +08:00
43782a24bd [Doc/Docker Image] Update mcore image to use vLLM which support qwen3 and rewrite installation from conda (#1505)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Update mcore image to use vLLM which support qwen3 and rewrite
installation from conda

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

Docker image and docs

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: both
- **Inference**: both

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-14 14:40:13 +08:00
0a4d54551f Add Absolute Zero to awesome work list (#1514)
### What does this PR do?

> Just adding Absolute Zero work to the README as a list of work that
used veRL

### High-Level Design

> just information

### Specific Changes

> only changed README.md, added Absolute Zero

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-14 14:38:20 +08:00
21e3acd6d4 [fix][DataProto] Make classmethod from_single_dict return a cls not the class name (#1509)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

`from_single_dict` is a classmethod, right now, it directly returns
`DataProto.from_dict(.....)`. This PR changes it to
`cls.from_dict(.....)`, In this way any subclass of `DataProto` may
reuse this classmethod to instantiate a subclass.

In the current implementation, when subclass, i.e.,
`MyDataProto.from_single_dict()` is called, it returns a parent class
instance, i.e., a `DataProto`, but not a `MyDataProto` instance.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-14 09:12:16 +08:00
d4a11ebb44 [utils] Enrich and fix utils from fsdp_utils and seqlen_balancing (#1495)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Enrich and fix utility functions in `verl/utils/fsdp_utils.py` and
`verl/utils/seqlen_balancing.py`.

* In `get_fsdp_wrap_policy`, introduce a unified `_get_attr` helper so
both dict‑based (OmegaConf) and dataclass‑style configs can work.

* In `rearrange_micro_batches`, add two new parameters
(`same_micro_num_in_dp`, `min_num_micro_batch`).

* Also re-organized the workflow pipeline structure to make it align
better with the verl file structure.

### API

In `verl.utils.seqlen_balancing.rearrange_micro_batches`, add two new
parameters (`same_micro_num_in_dp`, `min_num_micro_batch`).

### Usage Example

```python
# A very toy example
dataproto = DataProto.from_single_dict({"input_ids": input_ids, "attention_mask": attention_mask})
micros,  idx_map = rearrange_micro_batches(batch, max_token_len=300, same_micro_num_in_dp=False, min_num_micro_batch=2)
```

### Test
* Added in `tests/utils/gpu_tests/test_seqlen_balancing.py`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-13 17:01:16 +08:00
9a956c01b3 [doc] Clarifying gpu_memory_utilization for different engines (#1491)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Fix the outdated description of `gpu_memory_utilization`. 
Clarify its definition for different engines( vLLM<v0.7.0, vLLM>=v0.7.0,
SGLang)

### Additional Info.

- **Reference**: 
- for vLLM v0.5.4 and v0.6.3:
cb1adda924/verl/third_party/vllm/vllm_v_0_5_4/worker.py (L208)
and
cb1adda924/verl/third_party/vllm/vllm_v_0_6_3/worker.py (L205)
- for vLLM v0.7.0 and later:
d6484ef3c3/vllm/worker/worker.py (L247-L257),
and
https://docs.vllm.ai/en/latest/api/vllm/vllm.config.html#vllm.config.CacheConfig.gpu_memory_utilization
- SGLang:
6b8706cd4f/verl/workers/rollout/sglang_rollout/sglang_rollout.py (L176),
and
https://docs.sglang.ai/backend/server_arguments.html#memory-and-scheduling

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-13 10:52:24 +08:00
033853168a [refactor][single_controller] Small refactor and fixes in worker.py and ray.base.py (#1470)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. The existing `WorkerHelper::_get_pid()` returns nothing; this PR fix
this nit by returning `os.getpid()`;
2. The `WorkerMeta` class in `worker.py` is only used in `Worker.init()`
and no where else. This class just maintains a list of env keys and
wraps a dict named `store`. This PR delete this class and move the
contents inside the `Worker` class. In this way, it would be easier if a
user want to subclass Worker, with different env keys;
3. In `merge_resource_pool` function, instead of directly return a
`RayResourcePool`, this PR changes the return type to be `type(rp1)`. In
this way, the function can be applied to not only `RayResourcePool`, but
also any subclass of `RayResourcePool`.

### Note

This PR splits off some small nits and refactors from #1454, so that the
small things here could be reviewed and merged sooner before we decide
on the structural PR #1454


### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-13 10:27:13 +08:00
cb1adda924 [Bug] Fix the problem of long inference timeouts when using Async rollout (#1483)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In Async rollout, `AsyncOpenAI` has a default 600-second timeout, which
can lead to timeouts during longer inference. See details at
https://github.com/volcengine/verl/pull/1138#issuecomment-2869686490.

### High-Level Design

See details at
https://github.com/volcengine/verl/pull/1138#issuecomment-2869686490.

### Specific Changes

See details at
https://github.com/volcengine/verl/pull/1138#issuecomment-2869686490.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-12 17:01:50 +08:00
H
c3b20575d2 [util] docs: add docstrings to metric util functions that recipes reuse (#1395)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In `/recipes`, a few functions under `trainer/ppo/metric_utils` are
imported and reused. Right now many of them are task dependent and
assume specific keys in the input metric dict.

To make these functions more robust and backward compatible, a few tests
are added. Additionally, one method is moved to verl.utils as a public
API due to its general purpose nature. A API doc page is added
correspondingly.

In order to make it easy for others to customize verl trainers, many
more other classes require further documentations, such as:
- AdvantageEstimator, RayPPOTrainer, apply_kl_penalty, compute_advantage
- from verl.single_controller.ray import RayWorkerGroup
- from verl.trainer.ppo.core_algos import agg_loss
- from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role,
WorkerType
- from verl.utils.checkpoint.checkpoint_manager import
find_latest_ckpt_path

They shall be enhanced in future PRs. 

### High-Level Design

None

### Specific Changes

- added tests
- added verl.utils.metric namespace

### API

`verl.trainer.ppo.metric_utils.reduce_metrics` changed to
`verl.utils.metric.reduce_metrics`. deprecation warnings are added.

### Usage Example

None

### Test

Added

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
https://github.com/volcengine/verl/issues/1354
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-12 08:49:14 +08:00
H
f88e2ec4ca [distro] chore: fix incorrect verl main branch version (#1480)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

The main branch version is still 0.2.x, should have been 0.3.x instead. 



### Test

Relying on existing tests.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-12 08:48:39 +08:00
bc9062d74f [sglang] Fix tool format and response position ids padding in AsyncSGLangRollout (#1475)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

Resolved the tool formatting issue: Previously, arguments were stored as
strings, causing iterative addition of `\\` due to multiple calls to
`json.dumps`.

Fixed the `response_position_ids` mismatch between `generate_sequences`
and `generate_sequences_with_tools`: In the earlier implementation,
`generate_sequences_with_tools` used zero padding for positions where
`attention mask == 0`, which resulted in NaN values during the training
phase.

### Specific Changes

> List the specific changes.

- Introduced a new schema, `OpenAIFunctionCallSchema`, to store
converted tool calls.
- Updated the `AsyncSGLangRollout` tool to skip non-dict type arguments
instead of handling any string at the arguments position.
- Aligned `response_position_ids` in `generate_sequences_with_tools`
with the behavior of `generate_sequences`.
- Enhanced tool descriptions to prevent misleading parse errors, as
returning 0.0 caused the model to incorrectly modify answers.

### API

> Demonstrate how the API changes if any.

- Revise the `execute` interface of the tool to directly accept
`dict[str, Any]` instead of a JSON string.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-11 08:01:36 -07:00
db83855616 [ReadMe] Add Seed Paper Explore Paper Data Scale in ReadMe (#1479)
### Checklist Before Starting

None

### What does this PR do?

Add Seed Paper Explore Data Scale in ReadMe

### High-Level Design

Add Seed Paper Explore Data Scale in ReadMe

### Specific Changes

Add Seed Paper Explore Data Scale in ReadMe

### API

None

### Usage Example

None

### Additional Info.

None

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-11 20:20:27 +08:00
f147ede208 [BUG] fix value mask bug in dp_critic (#1440)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

When critic.use_dynamic_size is enabled, values rearrange indices but
attention_mask does not, causing values * attention_mask to produce
unpredictable bugs. This bug may have affected nearly all previous
PPO-based experiments if critic.use_dynamic_size was turned on.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-10 18:29:04 +08:00
H
249c26fdc8 [tests] BREAKING: move recipe.dapo.src to recipe.dapo; move test files to their own namespaces (tests/verl/xxx -> tests/xxx) (#1392) 2025-05-10 11:21:53 +08:00
17f283b1e8 [vllm rollout] minor fix: make vllm version determination stronger (#1401) 2025-05-09 18:11:30 -07:00
H
2d81677ac8 [docs] refactor: use verl consistently in the codebase (#1390)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Always use verl instead of veRL in the codebase, and add a CI check for
this.

### Specific Changes

mostly doc changes


### Test

Added to sanity tests.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

cc @ShaohonChen
2025-05-10 08:54:57 +08:00
c06b9624b3 [utils] Enrich features to a few verl utilities (#1421) 2025-05-09 16:50:59 -07:00
6b8706cd4f [Hardware] Support AMD (ROCMm Kernel) - hardware-agnostic (remove the redundant code) (#1453)
### Checklist Before Starting

- [X] Search for similar PR(s):
[PR#1369](https://github.com/volcengine/verl/pull/1369),
[issue#1488](https://github.com/volcengine/verl/issues/1448)

### What does this PR do?

- Complete [issue#1488](https://github.com/volcengine/verl/issues/1448)

### High-Level Design

- New PR for hardware-agnostic sglang rollout

### Specific Changes

- `verl/workers/rollout/sglang_rollout/async_sglang_rollout.py`
- `verl/workers/rollout/sglang_rollout/sglang_rollout.py`

> We've already submitted the PR to `ray>=2.45`. Actually, in that
version, it's been already supported hardware-agnostic rollout
implementation within verl codebase. Just need to assign
`HIP_VISIBLE_DEVICES` in the training script. Thus, I discard the patch
part that I added last time in verl codebase.

### Usage Example


[amd_tutorial](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst)

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst).
- [ ] Add CI test(s) if neccessary.

---------

Co-authored-by: Yusheng Su <yushensu@pduks-slu000010.amd.com>
2025-05-09 09:22:34 -07:00
H
b2ca3c855f docs: include SPPO, qwen3, FSDP2 in readme (#1450)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Update news

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-09 09:38:59 +08:00
OC
d3b6c7052e add pip dependances (#1439) 2025-05-08 23:35:55 +08:00
325c028ad2 [sglang] Fix data preprocess mismatch in sgl_multiturn example (#1445) 2025-05-08 23:14:23 +08:00
f90b717653 [ray] fix: make spawn worker group hold strong reference to actors (#1443)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Spawned RayWorkerGroup get actors by name, which holds a weak reference
to the actor and causes actors garbage collected unexpectedly. Pass
actor handle explicitly in spawn to make RayWorkerGroup have strong
reference to these actors. close #1365
https://github.com/volcengine/verl/pull/1138#issuecomment-2862087324

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-08 23:08:36 +08:00
c59ab2f478 [BUG] fix swanlab init bug when config is None (#1441)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

When user choose swanlab logger and not set config, original code
`config={"FRAMEWORK": "verl", **config}` would raise error. This PR try
to fix this by init config as an empty dict if it is None

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

```
if config is None:
    config = {} # make sure config is not None, otherwise **config will raise error
swanlab.init(
    project=project_name,
    experiment_name=experiment_name,
    config={"FRAMEWORK": "verl", **config}, # this is the cause of error when config is None
    logdir=SWANLAB_LOG_DIR,
    mode=SWANLAB_MODE,
)
```

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-08 18:14:57 +08:00
4ae9a0fdab [rollout] fix: missing trust_remote_code option in rollout initialization (#1423)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add missing `trust_remote_code` option in customized vllm rollout and
sglang rollout, fix https://github.com/volcengine/verl/issues/1412.


### Additional Info.

- **Issue Number**: https://github.com/volcengine/verl/issues/1412
- **Training**: FSDP
- **Inference**: both

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-08 14:10:53 +08:00
8a158a50a6 feat: add qwen3 grpo example (#1435)
### Checklist Before Starting
- [ ] Search for similar PR(s).

### What does this PR do?
Tested successfully on the
hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.4-flashinfer0.2.2-cxx11abi0
image.
It outperforms the Qwen2 7B base model by two percentage points on the
test set of GSM8K.
<img width="786" alt="image"
src="https://github.com/user-attachments/assets/a753a383-5fc0-42a8-92a8-be4f8eddec60"
/>


> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-08 14:10:32 +08:00
8cac3f8efe [single_controller][decorator] Define a DynamicEnum class to make Dispatch and Execute extensible. (#1424)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Today, extending `verl` in proprietary usage large requires forking it,
and padding code changes in the private fork.
For example, the current `verl` API doesn't support adding new
`Dispatch` and `Execute` mode in runtime. The only way to achieve it is
to make a new private fork.

This PR replace the static `Enum` type of `Dispatch` and `Execute` into
a new `"DynamicEnum"` type, that the users can use new APIs
`register_dispatch_mode` and `update_dispatch_mode` to adding and define
new distributed mode at runtime using native `verl` API, instead of
making a fork.


### Specific Changes

* Defined `DynamicEnum` class in `utils.py_functional.py`;
* Re-defined `Dispatch` and `Execute` classes, all existing Enum API and
usage are still usbale;
* Added `register_dispatch_mode` and `update_dispatch_mode` for users to
register new dispatch modes at runtime;
* nit: `pre-commit` automatically fixed part of code format in another
PR #1331

### Usage Example

> Provide usage example(s) for easier usage.

```python
def test_register_new_dispatch_mode():
    # Test registration
    def dummy_dispatch(worker_group, *args, **kwargs):
        return args, kwargs

    def dummy_collect(worker_group, output):
        return output

    register_dispatch_mode("TEST_MODE", dummy_dispatch, dummy_collect)

    # Verify enum extension
    _check_dispatch_mode(Dispatch.TEST_MODE)

    # Verify registry update
    assert get_predefined_dispatch_fn(Dispatch.TEST_MODE) == {"dispatch_fn": dummy_dispatch, "collect_fn": dummy_collect}
```

### Test

Added `tests/verl/test_decorator.py`

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-08 12:09:41 +08:00
5acd5cab11 [sglang] fix format issue and data_preprocess file path issue in sglang multiturn example README.md (#1437)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

Fix format issue and data_preprocess file path in sglang multiturn
example README.md

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-08 11:37:46 +08:00
312a8cbceb [SGLang] Add support between mcore0.11 and sglang (#1055)
Based on the ongoing alignment between mcore and vllm #851 , I believe
we can simultaneously advance the alignment between mcore and sglang, as
their interfaces are similar. In the end, we will only need to obtain a
generator parameter.
[link](https://github.com/sgl-project/sglang/pull/5345)
2025-05-07 08:57:03 -07:00
8d3631168f docs: update config documentation with validation parameters (#1355)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
This PR update some outdated docs on config:
- Add `filter_overlong_prompts_workers` configuration option, which
introduced in #890
- Add documentation for `actor_rollout_ref.rollout.val_kwargs`
parameters, fix #1352
- Fix attribution of several configuration options to their proper
namespaces

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-07 22:38:36 +08:00
ba6a2e0bb5 [FSDPCheckpointManager] feat: save huggingface model when 'hf_model' in checkpoint_contents (#1288)
Before, `FSDPCheckpointManager` will not save hf model when `hf_model`
is given in `checkpoint_contents`, instead, it only save the hf model's
config.

This PR correctly save the huggingface model when 'hf_model' is in
`checkpoint_contents`.
2025-05-07 20:44:46 +08:00
fd3f21cb0e [megatron] qwen3 support (#1337)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Support qwen3 to run with megatron backend.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Update offline weight convert script(from hf to megatron) for qwen3.
- Add config converter from hf config to mcore config for qwen3.
- Add qk_layernorm weight load logic in mcore loader for qwen3(dense).
- Add model initializer and forward func for qwen3(moe).
- Add online weight converter from mcore to hf for qwen3.
- Fix typo in megatron CriticWorker.update_critic.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```bash
# example for qwen3-8B

HF_MODEL_PATH="Your hf ckpt path"
DIST_CKPT_PATH="Your mcore ckpt path"

# convert ckpt from hf to megatron
python3 scripts/converter_hf_to_mcore.py --hf_model_path $HF_MODEL_PATH --output_path $DIST_CKPT_PATH

NODES=1
N_PER_NODE=8
PP=1
TP=8
CP=1
VLLM_TP=8

python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\
    algorithm.adv_estimator=gae \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=64 \
    data.max_prompt_length=1024 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=$HF_MODEL_PATH \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=64 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.actor.megatron.tensor_model_parallel_size=$TP \
    actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=$PP \
    actor_rollout_ref.actor.megatron.context_parallel_size=$CP \
    actor_rollout_ref.actor.megatron.use_dist_checkpointing=True \
    actor_rollout_ref.actor.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \
    actor_rollout_ref.actor.megatron.param_offload=True \
    actor_rollout_ref.actor.megatron.grad_offload=True \
    actor_rollout_ref.actor.megatron.optimizer_offload=True \
    actor_rollout_ref.ref.megatron.tensor_model_parallel_size=$TP \
    actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=$PP \
    actor_rollout_ref.ref.megatron.context_parallel_size=$CP \
    actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \
    actor_rollout_ref.ref.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \
    actor_rollout_ref.ref.megatron.param_offload=True \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=$VLLM_TP \
    critic.optim.lr=1e-5 \
    critic.model.path=$HF_MODEL_PATH \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size_per_gpu=4 \
    critic.megatron.tensor_model_parallel_size=$TP \
    critic.megatron.pipeline_model_parallel_size=$PP \
    critic.megatron.context_parallel_size=$CP \
    critic.megatron.use_dist_checkpointing=True \
    critic.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \
    critic.megatron.param_offload=True \
    critic.megatron.grad_offload=True \
    critic.megatron.optimizer_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_gsm8k_qwen3-8B' \
    trainer.experiment_name='qwen3_8b_gsm8k_gae_megatron' \
    trainer.n_gpus_per_node=$N_PER_NODE \
    trainer.nnodes=$NODES \
    trainer.save_freq=50 \
    trainer.test_freq=10 \
    trainer.total_epochs=100 $@


```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-05-07 20:41:41 +08:00
a43ead6f82 Fix for RM Data Attention Mask Bug (#1411)
[BUG] This issue addresses the bug related to the RM data attention
mask, which was also mentioned in a previous
[issue](https://github.com/volcengine/verl/issues/1341). The fix has
been implemented to ensure proper functionality.
2025-05-07 14:53:57 +08:00
d6e1c6e3c2 [Metric] fix: boostrap with n == n_resps since with replacement (#1419)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes https://github.com/volcengine/verl/pull/1320 since
bootstrapping is done with replacement, which makes it still meaningful
even when `n == n_resps`

### Additional Info.

- **Issue Number**: https://github.com/volcengine/verl/pull/1320
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-07 13:22:38 +08:00
c05f6c26b6 Qwen2moe[part1]: add cpu converter option, add CI test for current solutions temporarily (#1267)
Temporarily use CPU to initialize larger models for huggingface to
dist_ckpt conversion.

And Support GQA Moe model.

May not require CI as this function can be dependent to VeRL, but
current solution may need.
2025-05-07 13:11:02 +08:00
76084d36cb [AMD] upgrade: Upgrade dockerfile and verl codebase (#1369)
## Checklist Before Starting

- [x] Search for similar PR(s). 

## What does this PR do?

1. Base Docker Image: Upgraded the base sglang docker to
`lmsysorg/sglang:v0.4.6.post1-rocm630` along with `torch_memory_saver
(hip version)`, which resolves the ROCm/aiter compatibility
[issue](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/amd-verl-dev/dev.md).

2. vLLM-0.6.3 Rollout Fix: Adjusted the rollout logic to ensure the
latest VeRL upstream codebase remains both compatible with `vLLM
versions ≤ 0.6.3`, along with sync mechanism, and `vLLM versions >=
0.6.3`, along with async mechanism.

3. Update the ray version to
[2.45.0](https://github.com/ray-project/ray/releases/tag/ray-2.45.0):
[PR#52794](https://github.com/ray-project/ray/pull/52794) and also
support `ray>=2.45.0` within verl - resolve
[verl-issues#1399](https://github.com/volcengine/verl/issues/1399).

- [To-do-1] 3rd party lib - `torch_memory_saver` - rocm virtual memory
allocator issue should be resolved within the [HIP
version](https://github.com/fzyzcjy/torch_memory_saver/issues/9).
- [To-do-2]  New PR for hardware-agnostic vllm/sglang rollout.


## Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Co-authored-by: Yusheng Su <yushensu@pduks-slu000010.amd.com>
2025-05-06 18:06:05 -07:00
OC
3a7376acfe fix: ray worker exit with SYSTEM_ERROR caused by SIGALRM from math re… (#1331)
…ward

Since SIGALRM only works in main thread, if it is fired in a sub thread,
the ray worker will exit with SYSTEM_ERROR.
Fixed this problem by using multiprocessing.Process instead of SIGALRM
handling.

# What does this PR do?

bug fix for ray worke exit with SYSTEM_ERROR when timeout in prime math

# ChangeLog:

Fixed this problem by using multiprocessing.Process instead of SIGALRM
handling.

# Usage

- see tests/utility/test_timeout_decorator.py
2025-05-07 01:17:52 +08:00
6ae2de6195 Update guidance of sppo (#1415)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Just add a line of code to git clone verl. I do not know why this is
missed. lol

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-06 08:49:02 -07:00
78c8b2711e [megatron] support mixtral training with megatron backend (#1325)
# What does this PR do?

Add support of Mixtral MOE model training with Megatron backend
including ``Mixtral8x7B`` and ``Mixtral8X22B``.

# ChangeLog:

It is still labor-heavy to add new type of model to ``mcore`` format
including the following changes:
- ``hf_to_mcore_config_mixtral``: convert `hf_config` to
`TransformerConfig`. some common configs are merged into one function
`_get_base_transformer_config`.
- ``MixtralModel`` in model_initialzier.py: implement a model
initializer class to initialize GPTModel from config.
- `McoreToHFWeightConverterMixtral`: model conversion class from mcore
to huggingface basically rename
- model entry in `registry.py`: add entry function or class in
corresponding registries.

# Usage

- convert
[Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/)
to mcore format
([converted](https://huggingface.co/clyu/Mixtral-8x7B-Instruct-v0.1-mcore/tree/main))
- Run RLOO script as follows:

```bash
set -x

train_files=$gsm8k_train_path
test_files=$gsm8k_test_path

export MEGATRON_MODEL="Mixtral-8x7B-Instruct-v0.1-mcore"

python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer' \
    algorithm.adv_estimator=rloo \
    data.train_files=$train_files \
    data.val_files=$test_files \
    data.train_batch_size=128 \
    data.truncation="left" \
    data.max_prompt_length=512 \
    data.max_response_length=4096 \
    actor_rollout_ref.model.path=Mixtral-8x7B-Instruct-v0.1 \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.actor.megatron.tensor_model_parallel_size=8 \
    actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=2 \
    actor_rollout_ref.actor.megatron.use_dist_checkpointing=True \
    actor_rollout_ref.actor.megatron.dist_checkpointing_path=${MEGATRON_MODEL} \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8 \
    actor_rollout_ref.rollout.max_num_batched_tokens=8192 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.rollout.n=4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=128 \
    actor_rollout_ref.ref.megatron.tensor_model_parallel_size=8 \
    actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=2 \
    actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \
    actor_rollout_ref.ref.megatron.dist_checkpointing_path=${MEGATRON_MODEL} \
    algorithm.use_kl_in_reward=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.val_before_train=True \
    trainer.logger=['console','wandb'] \
    trainer.log_val_generations=100 \
    trainer.project_name='verl_gsm8k_test' \
    trainer.experiment_name='mixtral-8x7b-rloo-gsm8k' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=2 \
    trainer.save_freq=50 \
    trainer.test_freq=10 \
    trainer.total_epochs=15
```

# What is Missing

- refactor hf2mcore conversion scripts as
https://github.com/volcengine/verl/pull/1267
- Have a good design of onboarding new model class to avoid
labor-intensive changes.


## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if necessary? Please add CI tests to
your new feature.

# Additional Info: 
- **Issue Number**: None
- **Training**:  Megatron
- **Inference**:  None

---------

Co-authored-by: changlyu <changlyu@ip-10-0-53-184.us-west-2.compute.internal>
2025-05-06 22:38:48 +08:00
d60499d170 [misc] add support for qwen3 model (dense/moe) (#1409)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

>  add mfu compute function for qwen3 model


### Additional Info.

- **Issue Number**: Fixes issue #1313 

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
2025-05-06 19:45:17 +08:00
dd591e8588 docs: Fix readme ppo.rst (#1413)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

Fix readme NVIDIA GPU Results

### Specific Changes

Mark down fix

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
Before Fix
![Screenshot 2025-05-05 at 9 47
13 PM](https://github.com/user-attachments/assets/487c6aa4-999f-42af-b47f-e03555d83232)


After Fix
![Screenshot 2025-05-05 at 9 47
05 PM](https://github.com/user-attachments/assets/c9db6bd1-5e1c-4614-b82e-7ba74c53dc37)

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-05 22:11:13 -07:00
8bb009bf47 [CI] feat: separate FSDP2 test & fix: CI trigger (#1389)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Separate the FSDP2 test to avoid blocking other tests.
2. Fix the CI trigger rule to avoid redundant runs (since I find the
original PR triggers unrelated tests, so I fix the rule based on [the
doc](https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore))

### Test

For 2, I test by commenting out the matching path for workflow `.yml`,
and see only related workflows are triggered:

Before: <img width="870" alt="image"
src="https://github.com/user-attachments/assets/2f7dbe0c-f638-4a75-8cbc-a364081271fc"
/>

After: <img width="869" alt="image"
src="https://github.com/user-attachments/assets/f5a35d85-f03c-452e-abed-3ca3ce22d699"
/>

### Additional Info.

- **Issue Number**: https://github.com/volcengine/verl/issues/1388
- **Training**: FSDP
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-05 07:20:35 -07:00
ee8c34749d [recipe] sppo: SPPO algorithm implementation (#1222)
Here is a version of the SPPO algorithm implementation.

You can find more information about SPPO here:
[https://github.com/uclaml/SPPO/tree/main](https://github.com/uclaml/SPPO/tree/main)

In short, the main differences between SPPO and PPO are:

1. There is no need to use a critic model.
2. SoftMean is used as the AdvantageEstimator in the trainer.
3. Different loss functions.

I have made an attempt to implement minimal modifications without
altering the code outside the recipe. However, due to the following two
issues, the current code is not entirely elegant:


1. To modify the loss function (including both the loss itself and the
parameters passed in), it is sufficient to modify the `update_policy` in
the `DataParallelSPPOActor`. I attempted to patch this `update_policy`,
but it was unsuccessful. Therefore, I created a class that inherits from
`DataParallelSPPOActor` to override the `update_policy`.
2. Since `ActorRolloutRefWorker` imports `DataParallelSPPOActor` in the
`init_model` function, it also needs to inherit from
`ActorRolloutRefWorker` to override `init_model`. However, I encountered
an issue during inheritance. The base class of `verl`’s `single
controller` calls `actor_rolloutrefworker.super().__init__()`, and the
original `super()` in `ActorRolloutRefWorker` is `Worker`. If we
inherit, it would become `ActorRolloutRefWorker`, which requires passing
parameters to `super().__init__()`, but the `single controller base`
code does not provide any parameters, making inheritance impossible.

I have now submitted a draft PR and would appreciate any suggestions on
code modifications or optimizations!

---------

Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-05-05 09:55:49 +08:00
91fa2a6b94 [docs] fix: typo (#1391) 2025-05-04 12:15:07 -07:00
ec6843c604 [sglang] Upgrade sglang to 0.4.6.post1 & misc fixes (#1385)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
- [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3
- [x] fix: flush_cache was never awaited
- [x] remove unused env 
- [x] fix: add rank num to port to avoid SGLang picking the same port
when random.seed being set
- [x] feat: disable SGLang memory inbalance check by default
https://github.com/sgl-project/sglang/pull/5426
- [x] update setup.py to avoid old version pip can not resolving deps  
- [x] fix: tools_kwargs length mismatch with batch #1380

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-04 11:53:21 -07:00
709796f849 [dev] fix: validation metrics (#1374)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Fix the error that `metric` is not added when `n == 1`.
2. Remove `std@1`.
3. Add assertation for doing initial validation but `val_metrics` is
empty.

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-04 09:06:53 -07:00
1e47e412a4 [rollout] misc: add demo chat completion scheduler described in ReTool paper (#1297)
Co-authored-by: shengguangming <shengguangming@bytedance.com>
2025-05-04 19:07:22 +08:00
96b46d2661 [feat] Enable update_model_config to take nested dict to update AutoConfig of transformers (#1379)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

* Enable `update_model_config` to take nested dict to update
`AutoConfig` of transformers
* Added a test pipeline for all the tests under `tests/utils`, Any
future unit tests for `verl/utils` should be added here
* Re-organized the tests file structure.

### Usage Example

For the new `update_model_config`, an example looks like below:

```python
  override_config_kwargs = {
      "bos_token_id": self.tokenizer.bos_token_id,
      ...
      "nested_config": {k1: v1, k2, v2},
  }
  update_model_config(actor_model_config, override_config_kwargs=override_config_kwargs)
```

### Test

Added `tests/verl/utils/test_model.py::test_update_model_config`

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-04 18:07:09 +08:00
dfb3f70bc5 [fix][ci] fix two pipelines that fails on the main branch (#1378) 2025-05-04 08:02:08 +08:00
9e4074b71a [ci][fix] Enable part of ray test to be run on CPU machine (#1372) 2025-05-03 18:23:33 +08:00
HL
52437be1a6 [trainer] breaking: pass dataset as required args to SFTTrainer; also change ppo ray trainer to take custom datasets as inputs (#1282) 2025-05-02 21:03:22 -07:00
cee3dca867 docs: Add runllm widget for VeRL Doc sites (#1366)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Add runllm widget for https://app.readthedocs.org/projects/verl/ 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-02 16:28:45 -07:00
78abf052e8 [ray] feat: Making decorator register available for async function (#1370)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR enables the decorators to be able to be applied onto async
functions.

### High-Level Design

* Simply added a inner wrapper function available for async func inside
the `register` function.

### Usage Example

```python
  @register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False)
  async def async_fn(self, sleep_time):
      return await asyncio.sleep(sleep_time * 0.1)
```

### Test

* `tests/ray/test_decorator.py`

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
2025-05-02 16:25:14 -07:00
0035afee9c [dataproto] feat: Add auto padding for DataProto (#1356)
### Checklist Before Starting

- [x] Search for similar PR(s).

Coming from #577 , credit to @zw0610 

### What does this PR do?

Today, users must manually duplicate (repeat) a DataProto so its batch
size matches the data‑parallel (dp) size of the target WorkerGroup. This
PR enables `auto_padding` to pad the `DataProto` when chunk is called.

### Specific Changes

* Enriched the `DataProto` so that it can have context of padding during
chunking;
* Modified the `decorator.py` that a DataProto can be automatically
padded and chunked with `dispatch_dp_compute_data_proto`;
* Added unit tests under `tests/ray/test_auto_padding.py`.

### API

Two new API under `DataProto` are introduced, which are `padding` and
`is_padding_enabled`


### Test

Tests added to `tests/ray/test_auto_padding.py`

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: Wang Zhang <zhangwang.nozomi@bytedance.com>
Co-authored-by: Wang Zhang <zw199006@gmail.com>
2025-05-02 10:21:27 -07:00
52d8ae3179 [docs] fix: Fix Arxiv Link (#1364)
Arxiv link is not rendering on github or
https://verl.readthedocs.io/en/latest/index.html#

### Checklist Before Starting

- [x ] Search for similar PR(s).

### What does this PR do?

Makes external link to arxiv paper resolve properly.

### High-Level Design

N/A

### Specific Changes

Single line doc change

### API

N/A

### Usage Example

N/A

### Test
N/A
### Additional Info.

### Checklist Before Submitting

All N/A
2025-05-02 10:04:29 -07:00
db84a40076 [fsdp] feat: support fsdp2 training and inference in fsdp_workers (#1026)
# What does this PR do?

This PR supports fsdp2 for fsdp_worker. Torch version 2.4 or higher is
required.

# Usage Example

```
sh examples/grpo_trainer/run_qwen2-7b.sh \
    actor_rollout_ref.ref.strategy=fsdp2 \
    actor_rollout_ref.actor.strategy=fsdp2 
```
To save more memory, you can add the parameter below to enable the fsdp2
OffloadPolicy:
``` 
actor_rollout_ref.actor.offload_policy=True  
```
You can see the profile comparison between fsdp1 and fsdp2 here:
https://github.com/volcengine/verl/pull/1026#issuecomment-2824343860

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: shengguangming <shengguangming@bytedance.com>
2025-05-02 21:03:57 +08:00
3f41534ad2 [installation] doc: Fix pip install instructions (#1353)
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

There should be no space between `.` and `[vllm]` or `[sglang]`, or it
will result in error:

```logs
ERROR: Invalid requirement: '[vllm]': Expected package name at the start of dependency specifier
    [vllm]
```

In addition, I rewrite this part to make the instructions more clear (as
`.. or ..` can't be executed by bash directly)

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if neccessary.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-05-01 15:30:11 -07:00
335a79da72 [docs] fix: typo (#1351) 2025-05-01 11:20:04 -07:00
ed498f9fa5 [recipe] feat: latest reproduction of DAPO (#1336)
# What does this PR do?

This PR updates the latest reproduction results of DAPO.

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info: 

- **Issue Number**: none
- **Training**: none
- **Inference**: none
2025-05-01 12:03:46 +08:00
0e50afc363 [dev] feat: improve PR template (#1343)
This PR tries to imporve the PR template itself.
2025-05-01 12:02:36 +08:00
856f902b46 [FIX] metric_utils log best, worst, maj only for n_resps > 1 (#1248)
Solves #1249

Instead of logging best@1/mean and worst@1/mean, which is identical to
mean@1, just do not log it when there is only one validation response
per prompt (`n_resps == 1`). Same applies to std.

Otherwise we get many duplicated plots that show the same thing. 

The only change is the addition of the `if n_resps > 1:` statement.
2025-05-01 05:11:34 +08:00
c9787146e2 [test] fix: test arithmetic_sequence failed to run (#1333)
# What does this PR do?

e2e test `arithmetic_sequence` is currently broken, with error
`TypeError: not a string` thrown on code `tokenizer =
AutoTokenizer.from_pretrained(local_path)` when running
`tests/e2e/run_ray_trainer.sh`. This PR aims to fix it.

In the `arithmetic_sequence` task, `tests.e2e.envs.digit_completion`
module was imported in the beginning but not used. This import seems
meaningless. However, when this library is imported,
`AutoTokenizer.register()` will be called to set configurations for
`AutoTokenizer`. Only after that can `AutoTokenizer` be successfully
initialized in test code to perform subsequent tasks.

## Timeline

- In #934 , to improve CI efficiency, the CI corresponding to
`arithmetic_sequence` was removed.
- In #1010 , according to the `unused_import` rule, this import was
deleted, triggering the bug.

# ChangeLog

- `AutoTokenizer.register` was added explicitly, which ensures the
configurations were set before initialization of `AutoTokenizer`.


# Usage

- the original code `tests/e2e/run_ray_trainer.sh` is available for
tests.

```python
bash tests/e2e/run_ray_trainer.sh
``` 

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info: 
- **Issue Number**: none
- **Training**: none
- **Inference**: none
2025-04-30 19:46:07 +08:00
1d66de22e9 [feat] add FusedWorker (#1278)
on behalf of @zw0610 

FusedWorker is designed to enhance the ability of colocated workers.

FusedWorker keeps most of the interfaces as colocated workers: Users
shall use `create_colocated_worker_cls_fused` to create colocated worker
class, use `spawn` to split FusedWorker to dict of workers.

In colocated workers, access the methods of child workers is done by
using `spawn` then access via worker dict or calling
`{worker_group}.{worker}_{method}`. In FusedWorker, the first method was
preserved, while the latter was change to a new way: First use
`{worker_group}.fuse(prefixes)` to bind workers to the worker group,
then use `{worker_group}.{worker}.foo()` to access child workers.
2025-04-30 17:29:19 +08:00
d7c3d127ca [doc] fix dataset path for gsm8k and url error (#1327)
# What does this PR do?

fix dataset path for gsm8k and some url error.

# ChangeLog:

change the readme file to fix gsm8k download path.

# Usage

- You can add one use example below.

```python
# Add code snippet or script demonstrating how to use this 
```
- For algorithm implementation and new model support, you can add
training curve plots and evaluatuion results below.

## Before submitting

- [ ] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info: 
- **Issue Number**: Fixes issue # or discussion # if any. 
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
2025-04-30 15:18:58 +08:00
HL
940caadf72 docs: add community blogs and fix link rendering (#1324)
# What does this PR do?

Add one-line overview of what this PR aims to achieve or accomplish. 

# ChangeLog:

- Add two reference blogs to README

# Usage

None

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [] Did you write any test cases if neccessary? No tests needed
2025-04-30 09:46:04 +08:00
6d58ca6ea0 cancel bootstrapping for n=n_samples (#1320)
# What does this PR do?

The validation metrics currently bootstraps its estimates by randomly
sampling 1,2,4,8,16,...,n_samples results out of n_samples results.
However, this bootstrapping doesn't make sense for `n=n_samples` as you
cannot have more information about the estimate for `pass@n_samples` if
you only have `n_samples` samples.

This results in weird results when doing RL with only one problem in the
validation set (best@N is a value between 0 and 1 instead of 0 or 1)

This PR turns off bootstrapping for n=n_samples case and leaves rest of
the computations the same.
2025-04-30 09:45:14 +08:00
015db832dc [fix] Remove grad_offload in rloo example script (#1323)
# What does this PR do?

`grad_offload` option was removed in #284 for fsdp backend, current
script will error out due to this.

# ChangeLog:

- Remove grad_offload in rloo example script

# Usage

- Run the changed script

## Before submitting

- [X] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [X] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [X] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info: 
- **Issue Number**: N/A
- **Training**: FSDP
- **Inference**: None

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-04-30 08:54:29 +08:00
e0d035cd4a [sglang] feat: Add SGLang async multi-turn rollout with tool support (#1037)
A redesigned version of #917 

## Current Status
[Develop log &
Tracker](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/113)

**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.

**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.

## Key Features will be introduced in future version

- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.

**Future Plan**
[Discussion
Thread](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/74#issuecomment-2763192625)
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.

## Contributors & Acknowledgement

- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH 
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng

---------

Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-29 13:20:06 -07:00
0234d8e3ab fix reward model and add CI test (#1252)
Fix bugs related to #1165 .

Megatron backend reward model has no CI test, add to current ppo
trainer.

Fix `micro_batch_size_per_gpu` but not sure whether it is right for
reward config.

The output format is also not right with current `forward_micro_batch`
implementation.
2025-04-29 21:20:21 +08:00
7299763c06 [vllm] add moe patch for qwen3-moe (#1316)
# What does this PR do?

Add moe patch for qwen3-moe. Fix the weight loader issue in vLLM MoE
models. This isn’t a permanent solution, and we may need to contribute
code to vLLM to address the problem caused by FusedMoE. I’m already
seeking suggestions for this.

# ChangeLog:

- Add Qwen3MoeForCausalLM class for moe_patch
2025-04-29 21:18:45 +08:00
93d2ed5ee8 fix: catch any error in math reward function (#1312)
# What does this PR do?

This PR fixes collapse in the math reward function by catch any possible
errors.

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info: 
- **Issue Number**: None
- **Training**: None
- **Inference**: None
2025-04-29 18:58:17 +08:00
1e75fc04b5 [docs] add pr template (#1287)
# What does this PR do?

add the PR template to improve the readability of PR. 

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
2025-04-29 15:20:39 +08:00
HL
1c66aab162 docs: add DeepWiki and ICLR links (#1283) 2025-04-29 13:52:42 +08:00
1f3cbfcf19 [doc] add the multi modal doc (#1292)
## Motivation
There is currently no docs support for multimodal task on verl, so I
think we need to add a related document.
2025-04-29 13:44:38 +08:00
HL
958eae3523 [example] chore: remove verl_getting_started.ipynb (#1281)
remove the out-dated notebook
2025-04-29 10:55:27 +08:00
f9dae2bb11 [CI] feat: only check changed files (#1294) 2025-04-28 20:24:41 +08:00
ba38413aa5 Option to make model private when pushing to hub, pushing the tokenizer for convenience (#1259)
Very small changes to `model_merger.py` so that tokenizer is pushed to
hub and model can be pushed privately.
2025-04-28 20:17:42 +08:00
ea4cd31987 [merger] fix: merged generation config is inconsistent with hf pre-trained model (#1277)
afeac9a023/scripts/model_merger.py (L195-L200)

Model created by `from_config` won't load the `generation_config.json`
from `args.hf_model_path`, instead it create a generation config
separately.

This inconsistency will lead to strange generating error when user using
vllm/hf rollout without carefully override
sampling_params/generation_config, see issue here:
https://github.com/volcengine/verl/issues/1246

This PR introduce a `patch_model_generation_config` function which patch
the model from config to correctly use the pretrained generation config.
Fix https://github.com/volcengine/verl/issues/1246.
2025-04-28 09:23:19 +08:00
1971133d23 [doc] fix: fix 2 minor issues in installation and reward explanation (#1215)
close
- #1214 
- #1213

Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-27 15:40:17 -07:00
b75c4e16d6 [logging] fix: typo of fsdp_checkpoint_manager saving optim path (#1276)
fix a minor typo of printing optim saving path in
fsdp_checkpoint_manager.py
2025-04-27 15:30:30 -07:00
8e5ad4688a [Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors:

```
examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120)
examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120)
examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120)
examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120)
examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120)
examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120)
examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120)
examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell
recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120)
recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120)
recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120)
recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120)
recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120)
recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120)
recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120)
recipe/r1/data_process.py:131:9: E722 Do not use bare `except`
recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except`
scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used
scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found
scripts/model_merger.py:42:121: E501 Line too long (184 > 120)
scripts/model_merger.py:146:13: E722 Do not use bare `except`
tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120)
tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used
tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used
tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used
tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used
tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120)
tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20
tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120)
tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18
tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120)
tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120)
tests/sanity/check_license.py:22:1: E402 Module level import not at top of file
tests/sanity/check_license.py:23:1: E402 Module level import not at top of file
tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/__init__.py:22:1: E402 Module level import not at top of file
verl/__init__.py:24:1: E402 Module level import not at top of file
verl/__init__.py:25:1: E402 Module level import not at top of file
verl/__init__.py:29:1: E402 Module level import not at top of file
verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120)
verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120)
verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file
verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120)
verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120)
verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120)
verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120)
verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120)
verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120)
verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120)
verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120)
verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120)
verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120)
verl/protocol.py:303:121: E501 Line too long (125 > 120)
verl/protocol.py:352:121: E501 Line too long (171 > 120)
verl/protocol.py:578:121: E501 Line too long (142 > 120)
verl/protocol.py:580:121: E501 Line too long (150 > 120)
verl/protocol.py:583:121: E501 Line too long (167 > 120)
verl/protocol.py:715:1: E402 Module level import not at top of file
verl/protocol.py:725:121: E501 Line too long (121 > 120)
verl/protocol.py:766:1: E402 Module level import not at top of file
verl/protocol.py:768:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file
verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120)
verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120)
verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+`
verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused
verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused
verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120)
verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120)
verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged
verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120)
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120)
verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120)
verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120)
verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82
verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file
verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found
verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120)
verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120)
verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120)
verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used
verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key`
verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key`
verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120)
verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120)
verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition
verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120)
verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120)
verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string
verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120)
verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string
verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120)
verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused
verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused
verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/model.py:464:121: E501 Line too long (124 > 120)
verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string
verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120)
verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120)
verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused
verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120)
verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120)
verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120)
verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120)
verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120)
verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120)
verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120)
verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120)
verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120)
verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120)
verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file
verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120)
verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l`
verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used
verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file
verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used
verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used
verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120)
verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120)
verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120)
verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120)
verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120)
verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120)
verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120)
verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used
verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used
verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120)
verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used
verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120)
verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used
verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120)
verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120)
verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120)
verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used
verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120)
verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used
Found 305 errors.
```

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-04-27 15:24:30 -07:00
0fb0bedb7f [profile] print cuda system memory and offload actor model after init (#1118)
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
2025-04-28 02:11:38 +08:00
cea529116f feat: move AsyncLLM ChatCompletionScheduler to separate thread (#1274)
Move AsyncLLM ChatCompletionScheduler to separate thread to avoid making
PPOTrainer async class.
2025-04-27 22:02:52 +08:00
cb6fc3951d Adding GUI-R1 to the Awesome work (#1275) 2025-04-27 22:00:51 +08:00
afeac9a023 [misc] add offload and profile doc, add validate in profile (#1272) 2025-04-27 17:12:14 +08:00
fbb93e44b1 [CI] feat: only test for push to main (#1271) 2025-04-27 09:51:09 +08:00
cc8fca504d [mcore] add offload param and opt function for magetron (#1162)
## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.

## TODO
- [x] Accuracy testing
2025-04-27 02:03:34 +08:00
85a9b09d85 [profile] add profile for megatron train (#1146)
## Motivation
This is a new feature that adds the functionality of collecting profiles
during the training phase. Since the RL process repeatedly enters the
training process, by default, the profile temporarily captures the
results of the first `update_policy`. Moreover, this modification should
be seamlessly integrated into other training frameworks.
2025-04-27 01:59:32 +08:00
64056835b9 [bugfix] fix: add await for _validate() (#1269)
As titled.
2025-04-26 20:32:46 +08:00
281ed3a41a [rollout] feat: support rollout.n > 1 in hf_rollout (#1199)
Currently, the hf rollout backend only support `rollout.n == 1`, when
`rollout.n > 1` it will lead to an error
(https://github.com/volcengine/verl/issues/1134)

This PR make hf rollout support `do_sample` and `is_validate` to make it
consistent with vllm and sglang backend, and correctly support
`rollout.n > 1`.
2025-04-25 15:03:22 -07:00
5c3802687f distro: clean req packages. (#1253)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-25 07:14:00 -07:00
af15ae12f5 fix: Correct sampling params setting in sglang evaluation (#1181)
This PR fixes an issue where parameters in `val_kwargs` are not
effectively passed during sglang evaluation when `do_sample=True` is
set. Additionally, since the validation data has already been repeated
in `ray_trainer`, the `n` parameter in `sampling_params` needs to be
correctly configured to prevent errors caused by dimension mismatches.
2025-04-25 20:53:54 +08:00
e8cd4196e3 fix: remove deprecated remove_previous_ckpt key in prime_ray_trainer.py (#1254)
deprecated remove_previous_ckpt key cause save checkpoint crash.
See: https://github.com/volcengine/verl/issues/1183
2025-04-25 18:12:18 +08:00
5c0426e134 [AMD] Update AMD performance tuning documentation (#1256)
Update AMD performance tuning documentation according to
@yushengsu-thu's suggestion.

1. fix git branch and link
2. fix tab
2025-04-25 18:10:58 +08:00
aacd3660fc [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138)
### Summary
Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710

### Architecture


![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d)



**New Components**:
- AsyncLLMWorker: standalone vllm server instance
  - FastAPI: provide OpenAI-compatible HTTP server
- AsyncLLM: async LLMEngine for online serving, for more details:
[AsyncLLM](https://github.com/vllm-project/vllm/pull/9826),
[LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine)
- ExternalRayDistributedExecutor: custom executor backend manages
workers in worker group, it grabs corresponding workers by actor names

- AsyncLLManager: manages a group of vllm server
instances(AsyncLLMWorker)
  - AsyncLLM lifecycle: initialization, wake_up, sleep.
  - FastAPI service discovery

- ChatScheduler: schedule multiple chat completion requests with
multiple server instances
  - Least requests load balance
  - Sticky session with prefix caching
  - Chat completion callback: tools calling

### TODO
- [x] AsyncLLM: intialization/wake_up/sleep
- [x] OpenAI API:  support `/v1/chat/completions`
- [x] RayPPOTrainer integration: replace `generate_sequences` to http
call `/v1/chat/completions`
- [x] GSM8K e2e training
- [ ] Add document

---------

Co-authored-by: shengguangming <shengguangming@bytedance.com>
2025-04-25 17:56:34 +08:00
984e8a96c9 [proto] feat: Add bool-type index selection for DataProto (#1082)
After the last change, current DataProto cannot use bool-type index due
to hard-coded batch_size equal to idxs.shape[0].

This patch changes the new batch_size for bool-type idx to idxs.sum().
It's useful when users filter the batch with bool-type masks.
2025-04-24 22:12:24 -07:00
c71b24d2e9 [SGLang] feat: upgrade to 0.4.5.post3 & fix ipv6 (#1203)
The ipv6 part is picked from
https://github.com/volcengine/verl/pull/1184 cc @BearBiscuit05

---------

Co-authored-by: BearBiscuit05 <xiangyongan@bytedance.com>
Co-authored-by: Gelee-Q <leege233@gmail.com>
2025-04-24 18:23:53 -07:00
5080f47df0 [logging] feat: Add step and epoch metrics (#1250)
Solves #1251

Right now the current global step and current epoch are not being
logged. This would be a useful feature.
2025-04-24 13:43:58 -07:00
5bd1ce3f42 [AMD] Add AMD performance tuning documentation (#1240) 2025-04-24 12:42:56 -07:00
7341f52ca5 [logging] feat: Add Rollout and Validation dumps to file (#916)
Co-authored-by: Mert Unsal <mertunsal1905@gmail.com>
2025-04-24 10:31:03 -07:00
f315ac3b98 [misc] refactor moe bash (#1245) 2025-04-24 22:46:47 +08:00
d5a44dabe5 fix: validation top_p=0.7 for DAPO full (#1241) 2025-04-24 16:15:09 +08:00
a35c044627 Migrate to new image with FlashInfer 0.2.2 + vLLM 0.8.3 + SGLang 0.4.5 + MCore 0.12.0 + TE 2.2 + cuDNN 9.8.0 (#1237)
As support both, we let TE to choose attention backend now.

New Image:
`whatcanyousee/verl:ngc-cu124-vllm0.8.3-sglang0.4.5-mcore0.12.0-te2.2`
2025-04-24 16:14:48 +08:00
650115fba9 Fix docs about config page. (#1236)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-24 13:12:57 +08:00
HL
f01a932f80 [mcore] refactor: remove the mcore patches (#1229) 2025-04-24 09:40:45 +08:00
22f7e2c21c [vllm] update moe patch for megatron and fsdp (#1200)
## Motivation
This is a fix for the issue where the `weight_loader` in FusedMoe of the
vLLM code could not be used correctly during the resharding phase,
addressed in #923, #1137, and #1139 respectively. Currently, the results
of these PRs can be used together, allow both FSDP and Megatron to use
the same function, reducing code maintenance costs.
2025-04-24 09:40:12 +08:00
7a01e8c4f3 Update ray_debug_tutorial.rst (#1228) 2025-04-24 09:38:23 +08:00
f95cc7bb54 docker: update Dockerfile.sglang (#1207)
Install ray[default] to include missing components
2025-04-23 11:25:04 -07:00
7cfd705451 fixt: typo (#1217)
Alternatively, we should properly expand on the role of the parameter
`mapping`
2025-04-23 19:21:11 +08:00
a5a77680b6 fix util reward_score/math_dapo.py notes. (#1185)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-23 19:19:46 +08:00
65f512bbee Update the ray debug tutorial (#1204)
## Motivation

The existing Ray tutorial is difficult to follow and doesn’t explain how
to debug across multiple breakpoints.

## Modifications

- Updated `multinode.rst` 

## Checklist

- [x] Created independent `ray_debugger.rst` with step‑by‑step
instructions
2025-04-23 19:18:54 +08:00
7b6b7cb5b8 clean codes (#1219)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-23 18:11:23 +08:00
6a9bef731c [feat] trainer: compute reward during log_prob for ppo trainer (#1114)
### Description
Add a new parameter to ppo_trainer that enables asynchronous reward
computation during the log_probs phase.

This is particularly useful when reward manager is time-consuming and we
want to overlap its computation with GPU-intensive operations, improving
overall throughput.

By default, this parameter is set to False.

#### Example: before and after this PR with the parameter set to True
(GRPO on a 1.5b model):
In the following plot, the CPU reward computation function (taking
around 5min in this case) is now called during log prob phases to avoid
wasting GPU resources.
<img width="617" alt="image"
src="https://github.com/user-attachments/assets/eca2ea18-c966-4525-adde-e9cb96878830"
/>

---------

Co-authored-by: mertunsall <mertunsal1905@gmail.com>
2025-04-22 23:10:26 -07:00
4081d8af1f refactor example and test scripts to use megatron comm/comp overlap and checkpoint save (#1202)
Examples megatron scripts are outdated.
2025-04-23 11:30:30 +08:00
HL
f1a18a2785 docs: update iclr news and gair-nlp/cognition-engineering (#1205) 2025-04-22 18:28:00 -07:00
6501e79589 docker: clean redundant pre-commit dockerfile pip-package (#1195)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-22 11:45:54 -07:00
ad4881e16b feat: add torch_compile param for ref model (#1164) 2025-04-23 01:13:43 +08:00
6cc3afb04c fix: enable val_batch_size to address the oom issue during validatat… (#986)
…ion when the val dataset is large in multi-modal scenarios

Co-authored-by: 刘涛 <liutao.lt@bytedance.com>
2025-04-22 22:57:26 +08:00
Hao
74d9918568 fix: update assertion of ppo_mini_batch_size and ppo_micro_batch_size_per_gpu (#833)
original assertion is inside `if`, only executed when
`ppo_micro_batch_size` is not None, and otherwise may results in NaN
when training

Related to [Issue #405](https://github.com/volcengine/verl/issues/405)
and [PR #382](https://github.com/volcengine/verl/pull/382).
2025-04-22 21:46:57 +08:00
99fdbf6985 Log gpu mem refactor (#1190)
Use wrapper to refactor logging GPU memory enter or exit a function.

Simply use `VERL_LOGGING_LEVEL=DEBUG` to open current implemented memory
logger wrapped around common functions.
2025-04-22 13:28:10 +08:00
64672aef34 fix vllm version in setup.py (#1186)
We have upgrade vllm to 0.8.3 in our docker file:
https://github.com/volcengine/verl/blob/main/docker/Dockerfile.ngc.vllm0.8
2025-04-21 21:18:02 +08:00
103f90113f [dev] fix: instructions about merging from before using ruff (#1180)
Our pre-commit hook and CI action only check the changes for now.

In this PR,

1. We apply `ruff check --fix` and `ruff format`.
2. We remove the unnecessary pipeline from the immigration warning,
since directly merging without applying `ruff`, which might cause extra
conflicts, is the best way to avoid introducing extra file changes.
2025-04-20 13:51:46 -07:00
b0e3f1361e [AMD] docker: Support AMD (ROCMm Kernel) - Support SGLang (#1179)
[Done]
- Update the Docker file and Apptainer file to support the SGLang
engines
- Add the 3rd-party
[torch_memory_saver](torch_memory_saver](https://github.com/ExtremeViscent/torch_memory_saver)
within the docker file in rocm version
2025-04-20 12:51:10 -07:00
725c67666f [ray] fix: ray hang due to num_cpus (#1009)
Fixing #523 according to
https://github.com/volcengine/verl/issues/523#issuecomment-2723652147

Concern: will `num_cpus=1` limit the performance of the cluster
scheduler?
2025-04-20 12:50:17 -07:00
HL
5313d96f9b [CI] fix: add additional pre-commit test before ppo trainer tests (#1175) 2025-04-20 11:16:19 -07:00
6d8f2f6ab9 [algo] feat: Add DrGRPO (#990)
https://github.com/volcengine/verl/issues/742

- Add an option for disabling standard-deviation normalization of
advantages in GRPO.
- This completes one out of two algorithmic changes made by Dr.GRPO to
GRPO, the other one being the removal of sequence-length averaging
during loss aggregation.
2025-04-20 08:44:45 -07:00
b0e2a0ac88 [logging] refactor: use 'from e' for exception stack trace (#1177)
Use the 'from e' inside the try-except statement to keep the stack trace
of the error
2025-04-20 08:43:43 -07:00
28e45cbde2 [Config] fix: disable XFORMERS by default since we immgrated to newer vLLM versions (#1178) 2025-04-20 07:46:20 -07:00
3c46da551d [megatron] fix: avoid initialization of Megatron if not use (#1143)
## Motivation
When using FSDP in an environment that includes Megatron, the components
of Megatron will also be loaded, which may lead to some unnecessary
issues. Therefore, the initialization of Megatron can be postponed until
it is actually used.

---------

Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-20 06:51:57 -07:00
1ab271e1b5 [megatron] fix optimizer config (#1104) 2025-04-20 06:50:35 -07:00
4fa7ed6c0d [mcore] qwen2moe support (#1139)
support qwen2moe structure to run with megatron-core
including:
* qwen2moe config converter 
* qwen2moe model initializer
* refactor the online weight converter from mcore to vllm
* qwen2moe online weight converter
* qwen2moe offline weight conversion script from hf to mcore
* a script to run training qwen1.5moe_a2.7b with 4 nodes

TODO
add option to freeze the MoE router weight during training
2025-04-20 12:48:46 +08:00
HL
c54ec18693 docs: update recent news and logo (#1173) 2025-04-19 21:42:19 -07:00
b39c0214c8 Fix ImportError in is_megatron_core_available() and is_vllm_available() Functions (#1131)
Issue: In a Python 3.10 environment, when using import importlib,
calling importlib.util.find_spec('megatron.core') results in the
following error:

833e7d7878/verl/utils/import_utils.py (L21-L30)
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'
```
This error causes more_spec to be None, which can lead to further issues
in the code execution.

Proposed Solution: I recommend adding import importlib.util to ensure
that the util module is properly imported and available for use. This
change will prevent the AttributeError and allow the find_spec function
to work as intended.

Please see the attached screenshot for reference.
<img width="534" alt="Clipboard_Screenshot_1744872935"
src="https://github.com/user-attachments/assets/92f63ed5-7a52-43ac-86be-2c9585320234"
/>
2025-04-20 10:17:59 +08:00
HL
0fd56b2080 docs: add ReTool (#1154) 2025-04-20 09:20:00 +08:00
f49c5311a4 change deepspeedai url site. (#1171)
DeepSpeed has moved to `deepspeedai` repo.

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-20 09:18:49 +08:00
121f0b034c [CI] fix: only check changed files in CI (#1168)
We also remove previous workaround of adding ignores.
2025-04-19 11:55:28 -07:00
8719371949 revert multinode first (#1161)
Will explore further
2025-04-19 16:00:15 +08:00
6effd52e16 Update test_dapo_7b.sh to remove extra line (#1155)
Seems like there's an extra argument here that's causing an error when
running
2025-04-19 14:53:56 +08:00
f3dc1d7b78 [BREAKING] ray: rewrite multi-node doc (#1160)
The way to use ray has changed.

Ray related issue: https://github.com/ray-project/ray/issues/52454
2025-04-18 23:14:52 -07:00
HL
568239fb38 CI: limit ruff checks and enable push tests (#1157) 2025-04-19 13:54:45 +08:00
b00f77d855 [dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010)
> [!WARNING]
> We are [immigrating to `ruff` as the linter and formatter and
`pre-commit` as the managing
tool](https://github.com/volcengine/verl/pull/1010).
>
> If your branch is based on a previous commit using `yapf` and
`pylint`, simply merging might trigger overwhelming linting errors,
while **you are only expected to resolve ones in the files related to
your PR**.
>
> To resolve this issue, please try the following workaround to only
include the files you **really changed** in the PR:
>
> 1. In your branch, fix linting and format with `ruff`: `ruff check
--fix && ruff-format`
> 2. Squash into a single commit in a new branch: `git reset --soft
$(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."`
> 3. Merge with the latest main: `git merge origin/main`
> 4. Force push to your branch: `git push --force`

We add the reminder above to the documentation to tell contributors how
to avoid overwhelming linting errors.

### Motivation

According to dicussion in #896, this PR immigrates from yapf & pylint to
ruff based on pre-commit, which allows unified version control and
automatic hook on committing.

### Summary

The `pre-commit` hook and CI

- checks staged / committed files in commits / PR's
- checks all files each month (This should fail before we fix all the
files by the ruff standard)

### Explanation for the Failing CI Workflow `pre-commit`

For now, we only apply `ruff format` and `ruff check --fix` **without
resolving all the errors**, since there are too many errors to resolve,
which causes the CI workflow `pre-commit` fails.

For resolving the remaining errors, we leave to future commits.
Specifically, the `pre-commit` hook and CI will require every commit to
fix its related files with `ruff`, which will fix all the files
incrementally.

### Reviewing Suggestion

The commit
3d93f51ba8
is huge since we apply `ruff` to all the files. To review the main
changes, please check the commits before and after it.
2025-04-18 07:49:31 -07:00
c98fb3197b Doc: add a environment to fix that the memory capacity is unbalanced (#1105)
if we use sglang as the rollout engine, we should export
SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity
is unbalanced, please refer to [#5426 in
sglang](https://github.com/sgl-project/sglang/pull/5426)

# why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using
SGLang as the rollout engine in verl?
1. verl initializes a SGlangRollout module during rollout, which is used
to evaluate/generate samples.

2. SGLangRollout will initialize VerlEngine, further initialize a torch.
Distributed. DeviceMesh, used to support the TP.

3. DeviceMesh.init () internally checks the free video memory of all
participating devices, and if the difference is too large (more than
about 10%), it directly reports an error, preventing initialization
failures or communication deadlock.

# Why might there be inconsistent graphic memory?
## Ray Distributed Actor loads the model at different times:
verl uses ray multi-process multi-gpu concurrent training, and each
`WorkerDict` may be called at different times:
`self.rollout = SGLangRollout(...)`
different workers initialize the model at different times → different
memory usage.

## Delayed initialization causes memory bias
Some workers enter the model loading/infer process earlier than others,
such as `generate_sequences()` or `compute_log_prob()`.
The early-loaded worker video memory has been eaten by the model, and
the late-loaded worker video memory is still empty → the graphic memory
gap is large.

## Verl+SGLang's TP initialization goes "all device broadcast", but
there is no uniform release timing
SGLangRollout only needs to involve the part of the graphics card used
by the rollout machine, but its VerlEngine initialization calls
torch.distribut.init process group() and broadcast a bunch of weights.
Result in:

Non-rollout cards also participate in communication;

Then initialize DeviceMesh, and the error "inconsistent memory" is
reported.

## Different loading modes of FSDP/TP models also cause deviations
if the following parameters are set
```
actor.fsdp_config.param_offload=True
ref.fsdp_config.param_offload=True
```

Some worker parameters are on the CPU, and some parameters are shard to
the GPU in advance. This also creates an asymmetric distribution of
video memory.

---------

Co-authored-by: ocss884 <ocss.lin@gmail.com>
2025-04-17 21:28:17 -07:00
ec59b8788c [misc] add sglang support for hdfs file load (#1060) 2025-04-18 10:39:08 +08:00
0bdf7f4698 [misc] qwen moe patch for not find 'load_weights' func (#1137) 2025-04-18 08:05:09 +08:00
ba988bbeb5 [dapo] fix: fix timer for dapo (#1075)
When training with Dapo, because there is a continuous filter and
dynamic sampling, each iteration involves multiple samplings. The time
should be summed up to represent the total sampling time for one
iteration.

---------

Co-authored-by: lilei <>
2025-04-17 15:00:16 -07:00
25b0f2262f Move entropy to comput log probs to reduce peak memory when calculating entropy. (#1100)
Actor do not calculate Entropy loss if `entropy_coeff==0`, and move the
calculation of entropy to `compute_log_probs`

Tested configuration:

```sh
    data.max_prompt_length=$((1024 * 2)) \
    data.max_response_length=$((1024 * 10)) \
    actor_rollout_ref.rollout.max_num_batched_tokens=$((1024 * 12)) \
    context_parallel_size=2 \
```
2025-04-17 17:35:59 +08:00
19d0d07329 [mcore] resharding model weights by per tensor (#1107)
## Motivation
This is an optimization approach using a per-tensor method to reduce the
additional memory required for model weights during the resharding
phase. Our ultimate goal is to enable mcore to have a method that aligns
with the `full_tensor()` function in FSDP and to deprecate the
`AllGatherPPModel` class in future versions. Currently, this task may
need to be broken down into several subtasks:

## Impact Analysis
1. The model accuracy has been tested on Qwen-7B in vllm version 0.8.2,
and it aligns with the accuracy of the previous method.
2. In terms of memory usage, the `pp_cache` in `AllGatherPPModel` has
been completely deprecated.
3. In terms of runtime, the performance is comparable to the original
method.

## TODO
- [x] Deprecate the `AllGatherPPModel` class in version 0.8.2.
- [x] Ensure forward compatibility for this method.
- [x] Completely deprecate the `AllGatherPPModel` class.
2025-04-17 14:57:10 +08:00
833e7d7878 refactor: main generation should also use pad/unpad from verl.protocol (#1103)
The main generation should use the padding/unpad from verl.protocol to
align with ray_trainer, instead of a seperate padding/unpad logic.

Also make small improvements to make code looks better.
2025-04-17 12:26:04 +08:00
f04a6dbdb7 fix: loading HF model in rank0 for mcore megatron model (#998)
[bug fix] Loading 72B Qwen model for mcore megatron is causing OOM. 
Extracted out the HF loading logic to a helper function (and disable
`device_map=auto`), and refactored the legacy function
`load_megatron_model_weights` and the new function
`load_megatron_gptmodel_weights`. `load_megatron_model_weights` should
be deprecated once the the class RewardModelWorker is also migrated to
Mcore.
2025-04-17 10:20:40 +08:00
be9def6900 [mcore] refactor (#1064)
refactor the mcore code, add registry for extensibility for more types
of model such as MoE or VLM.
clean some deprecated code such as megatron_config.
reward model worker uses GPTModel api now.
2025-04-17 09:49:30 +08:00
d6821a051a [sft] feat: Add WSD (Warmup-Stable-Decay) scheduler for SFT (#1041)
# Add WSD (Warmup-Stable-Decay) Learning Rate Scheduler

## Overview
This PR adds a new learning rate scheduler called WSD
(Warmup-Stable-Decay) that provides more control over the learning rate
schedule during training. The WSD scheduler extends the traditional
cosine scheduler by adding a stable phase where the learning rate
remains constant.

## Features
- **Three-phase schedule**: Warmup → Stable → Decay
- **Configurable stable phase**: Control what percentage of training
maintains a constant learning rate
- **Compatible with existing code**: Minimal changes to the trainer
infrastructure
- **Default to cosine**: Maintains backward compatibility with existing
configurations

## Implementation Details
1. Added `get_wsd_schedule_with_warmup` function to
`verl/utils/torch_functional.py`
2. Updated the SFT trainer to support the new scheduler type
3. Added `lr_scheduler: cosine` as the default in the SFT trainer config

Here's the reference implementation:
6397d56279/pytorch_optimizer/lr_scheduler/wsd.py (L8)

## Usage
To use the WSD scheduler, set the following in your configuration:
```yaml
optim:
  lr_scheduler: wsd  # Options: 'cosine' (default) or 'wsd'
```

## Benefits
- Better control over learning rate behavior during training
- Potentially improved training stability for certain tasks
- Allows experimentation with different learning rate schedules without
code changes

(trying to get this in to make sure my own branch don't end up with huge
chunk of git conflict 😓 )

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-16 11:26:19 -07:00
54fbd156b1 [feat] video inputs (#1116)
## What does this MR do?
Adds video input support for qwen2 vl models

## Changes
- process_image function is moved to vision_utils.py
- switch process_image & process_video to both use fetch_image and
fetch_video functions from qwen-vl-utils
- fixed a mrope bug in vllm rollout
2025-04-17 00:39:02 +08:00
3e3d9372a5 fix: missed loss_agg_mode in dp_actor (#945)
as titled
2025-04-17 00:28:48 +08:00
f845a46a17 Update install.rst to fix a typo (#1111)
Fix a typo
2025-04-16 18:25:54 +08:00
635997a5ec [misc] add dummy load for sglang (#1068) 2025-04-16 09:46:09 +08:00
d814965003 [vlm] data: use hf template for vlm models (#1085)
## What does this PR do?

In this PR, we use huggingface's chat template (i.e.,
`processor.apply_chat_template`) to compute input ids for VLMs, it could
be generalized to more model architectures compared with the earlier
implementation.

## Who can review?

@vermouth1992 @eric-haibin-lin
2025-04-15 15:05:58 -07:00
588404196b doc: upgrade to vllm 0.8.3 (#1081)
## What does this PR do?

- Upgrade docker image to vllm 0.8.3 to avoid memory leakage
- Add wake up tags to megatron rollout worker

## Who can review?

@vermouth1992 @BearBiscuit05 @ETOgaosion
2025-04-16 01:11:52 +08:00
189c87c37c [CI] feat: try HF_HUB_OFFLINE to fix network errors (#1098)
Trying to fix network errors like

```
huggingface_hub.errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url: ...
```
2025-04-15 22:27:27 +08:00
6bfa45c5c8 [doc] feat: adding CI tests (#1099) 2025-04-15 19:45:13 +08:00
HL
5c984b7748 docs: update awesome work (#1090) 2025-04-14 22:52:44 -07:00
4ec9974735 CI: add vlm CI for sglang rollout (#1088)
As titled.
2025-04-14 20:21:29 -07:00
68958ef877 [feat] vllm: add rollout config swap_space for vllm_rollout (#960)
When training big model and/or super-long seq len using vLLM rollout,
you may encounter the error
```
...
in _swap_out raise RuntimeError( RuntimeError: Aborted due to the lack of CPU swap space. Please increase the swap space to avoid this error
``` 


(updated)This can be fixed by setting bigger `swap_space` for vLLM.
E.g., in your training bash you can do the
```
...
actor_rollout_ref.rollout.engine_kwargs.swap_space=32 \
...
```
which sets the swap_space to 32GB. Note in most vLLM releases the
default value is 4GB.
2025-04-14 14:16:53 -07:00
ebc3294b7e [misc] ray: Fix typo in colocate (#1074)
- Force both usages of `colocate` and `collocate` to `colocate`, to be
consistent with [vllm
terminology](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html).
- both `ResourcePool` and `RayResourcePool` use the same default value
for `max_colocate_count` to avoid surprises.
2025-04-14 10:22:50 -07:00
1559f62d1e fix: remove output.txt (#1086)
Fixing
https://github.com/volcengine/verl/pull/1032#discussion_r2042521317
2025-04-14 10:19:40 -07:00
5ba1dbc606 [ci] feat: improve CI speed to 1-2min per test (#1032)
### Summary

#### Minimize Test Workloads

This PR minimizes the test workloads while keeping them meaningful,
reducing the time cost of a test from >10 min to 1~2 min. Specifically,
we

1. set batch sizes and steps as small but still meaningful numbers:

```bash
train_traj_micro_bsz_per_gpu=2 # b
n_resp_per_prompt=4 # g

train_traj_micro_bsz=$((train_traj_micro_bsz_per_gpu * NUM_GPUS)) # b * n
train_traj_mini_bsz=$((train_traj_micro_bsz * 2)) # 2 * b * n
train_prompt_mini_bsz=$((train_traj_mini_bsz * n_resp_per_prompt)) # 2 * b * n / g
train_prompt_bsz=$((train_prompt_mini_bsz * 2)) # 4 * b * n / g
# ...
TOT_TRAIN_STEPS=${TOT_TRAIN_STEPS:-1}
```

2. disable validation (this costs a lot!) / saving / resuming for
training tests by default and leave them to specialized tests

```bash
# Validation
VAL_BEFORE_TRAIN=${VAL_BEFORE_TRAIN:-False}
TEST_FREQ=${TEST_FREQ:--1}
# Save & Resume
RESUME_MODE=${RESUME_MODE:-disable}
SAVE_FREQ=${SAVE_FREQ:--1}
```

#### Improve Triggering Mode

This PRs introduces a more comprehensive triggering logic mode.
Specifically, we

1. consider all Python code by default
2. include related entrypoints (the workflow config, scripts used by it
and hydra config, etc.)
3. exclude unrelated Python code from other components (e.g., recipes,
examples, Megatron, SFT, generation, evaluation, etc. for FSDP training)

An example from `e2e_ppo_trainer`:

```yaml
on:
    paths:
      - "**/*.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/e2e/ppo_trainer"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_trainer.yaml"
      - "!examples"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Recipes
      - "!recipe"
      # Megatron
      - "!verl/workers/**/megatron_*.py"
```

#### Avoid missing out errors

Some test scripts didn't end with the main python command and might miss
out the error.

To address this issue, this PR introduces the following options:

```bash
set -xeuo pipefail
```

, which means

- `x`: Print each command before executing it (useful for debugging)
- `e`: Exit immediately if any command fails (returns non-zero exit
status)
- `u`: Treat unset variables as an error
- `o pipefail`: Return the exit status of the last command in a pipeline
that failed, or zero if all succeeded

Together, these options make the script fail fast and provide verbose
output, which helps with debugging and ensuring the script doesn't
continue after encountering errors.

#### Others

Besides, we also

1. unify runner labels into `"L20x8"` to enable preemptive scheduling of
jobs
2. reduce test scripts of minimal differences, grouping by entrypoint
(e.g. `ppo_trainer`, `ppo_megatron_trainer`, recipes, etc.), into a base
script with options
2025-04-14 09:48:10 -07:00
d7978b66d9 chore: update diagnose.py (#1078)
occured -> occurred
2025-04-14 21:35:57 +08:00
f6b9bcc359 [logger] fix: fix mlflow (#1073) 2025-04-14 18:13:17 +08:00
866e9808d4 [CI] feat: unify CI label to enbale preemptive schedule for jobs (#1072) 2025-04-14 16:52:30 +08:00
0a4f4b3cc1 mcore readme (#1071)
add a doc for mcore
2025-04-14 16:29:51 +08:00
c46d542772 fix: replace '@' with '_at_' in metric names to comply with MLflow naming constraints (#984)
Fix MLflow metric name errors by replacing '@' with '_at_' during MLflow
logging

MLflow rejects metric names with '@' as below

`mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid
metric name: 'val-aux/semantic_matching/reward/mean@1'. Names may only
contain alphanumerics, underscores (_), dashes (-), periods (.), spaces
( ), and slashes (/).`

Co-authored-by: wenjie zhao <aswenjie@amazon.com>
2025-04-14 16:25:09 +08:00
f976b1853d Update vllm 0.8.2 with megatron 0.11.0 (#1054)
Parts of #851 

Including minimal of upgrade:

1. vllm 0.8.2 with megatron
2. part of per-tensor allgather and load weights
3. fix bugs with context parallel, because of dataloader random seed,
seems behavior changed in torch 2.6.0
2025-04-14 09:27:35 +08:00
d9df9bbb5f Fix megatron default config (#1053)
#1047 may cause some case fail with tp=1, since megatron prohibit use
sequence parallel in that case.

Now we still default enable sp for user to write scripts conviniently,
and automatically enable and disable sp inside `_validata_config`
2025-04-14 01:33:18 +08:00
c4b5f097af docs: update sglang_worker author list and image (#1045) 2025-04-13 07:43:36 -07:00
d4cae44726 [mcore] option to use dist checkpoint (#1030)
mcore dist checkpointing is a parallel-invariant weight format, you can
save and load in arbitrary parallel settings. e.g. save in tp2pp2 and
load in tp4pp1.

This PR introduce an option to use dist checkpoint with mcore backend.
It is *disabled* by default for backward compatibility. But future
support for *mcore MoE models and VLM models* will work only when dist
ckpt is enabled for a easier implementation.

Before this PR, when initing actor and critic workers, each GPU would
load the entire huggingface weights and then re-shard to correct mcore
model state dict, making the procedure slow and complicated.
With this PR, we convert hf weight to dist ckpt by offline scripts, and
each GPU will only load its parts from dist ckpt. The speed is faster
and no more online resharding needed.

When loading `Qwen2-7B-Instruct` for critic worker, the loading time
reduced from 109s to 25s, speedup by *4.36x*

The `converter_hf_to_mcore.py` in this version use existing online
resharding function to convert weights. And it should be refactored for
better efficiency and MoE/VLM models.
Thanks to #998 for the optimization of loading hf weight only at GPU 0.

Future TODO:
* refactor the converter for efficiency
* support converting MoE models
* support converting VLM models
* re-design `megatron_checkpoint_manager.py` with dist ckpt
* implement converter from mcore dist ckpt to hf / `model_merger.py`
* add docs and example scripts
2025-04-13 17:59:43 +08:00
6dd5e39a11 fix: Megatron_workers batch_size config is not processed correctly (#1029)
The following two batch_sizes don't work correctly when using megatron
backend:
1. actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu for
update_actor()
2. actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu for
compute_ref_log_prob()
#1028
2025-04-13 17:27:21 +08:00
9830b17ba2 fix checkpoint rng_states confliction (#1046)
Only 1 node in a machine save rng_states to avoid conflicts and read
properly

New version of torch.save can cause races here.

FSDP also split the rng_states in extra states
2025-04-13 16:01:06 +08:00
eda9f0e9be reset default tp size (#1047) 2025-04-13 05:58:22 +08:00
HL
d882b62b01 tests: add import utils tests (#1042) 2025-04-11 18:55:54 -07:00
dc1714a428 docs: update sglang_worker authors (#1038)
Add full authors of SGLang RL team. Thanks!
2025-04-11 11:19:07 -07:00
7a4242324c [log] fix: val-core pattern (#1012)
This PR

1. fixes the problem that important metrics like `"mean@{n}"` can not be
recognized as `val-core` due to lack to `/...` at the end
2. removes `"std@{n}"` from `val-core`
2025-04-11 09:51:38 -07:00
379945b0d3 docs: update README.md for TRPA (#1034) 2025-04-11 09:50:43 -07:00
8491b9c56d docs: fix doc typo (#1035) 2025-04-11 09:49:53 -07:00
6cbfa48a90 fix: use packaging to compre versions instead of str comparing (#1027)
Use `packaging.version` to compare tensordict's version instead of
string comparing, string comparing sometimes will fail, for example,
"0.10.0" < "0.5.0" when using string comparing.

Also remove a unnecessary return_type since this return_type will always
be `DataProtoItem`.
2025-04-11 17:20:47 +08:00
a9bf431075 [recipe] fix: loss_agg_mode for dapo early (#1018) 2025-04-11 11:09:37 +02:00
d2e602a1a7 docs: add AdaRFT to awesome work using verl (#1024) 2025-04-10 22:11:33 -07:00
3256142434 [Breaking] dataset: support customized datasets for RayPPOTrainer (#924)
This PR enable user to specify their customized dataset for
RayPPOTrainer.

NOTE: the RLHFDataset interface has been broken into:
```
RLHFDataset(
    data_files: Union[str, List[str]],
    tokenizer: PreTrainedTokenizer,
    config: DictConfig,
    processor: Optional[ProcessorMixin] = None
)
```

and the custom dataset class MUST also use this interface.

cc @eric-haibin-lin
2025-04-10 22:07:42 -07:00
c9e3c57cf8 [megatron] feat: optimize entropy loss (#1007) 2025-04-11 09:37:37 +08:00
HL
3fbb1ad7ed [sglang] docs: fix README index (#1016) 2025-04-11 09:29:09 +08:00
c62e7ac7bc [tuning] docs: add more case for grpo train (#983)
Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-10 15:16:47 -07:00
aa58617c69 [sglang] docs: add quickstart doc to use sglang in verl (#1001)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>
2025-04-10 08:34:12 -07:00
550bbbbffe [vllm] fix oom when vllm wakeup (vllm >=0.8.3) (#987)
This is a memory optimization method implemented based on this
[fix](https://github.com/vllm-project/vllm/pull/15500). I just
successfully ran a 72B model on 8*H800 cards. Before the fix, I would
encounter an OOM issue. Please note that this fix is only effective for
vLLM >= 0.8.3.
2025-04-10 18:07:10 +08:00
9f405b48a4 [Mcore] context parallel (#970)
support context parallel for mcore backend.
Changes on:
* configs
* model loader
* checkpint
* single control dispatcher
* forward preprocess and postprocess

---------

Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-04-10 13:05:58 +08:00
90f5ce15de Change behaviour during raw prompt extraction (#989)
This PR suggest a fix on a bug that when `_switch_chat_template()`
method is called.

According to

https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L222
`data.non_tensor_batch['raw_prompt'][i]` is already a list if
`data.return_raw_chat=True`.

Calling `.tolist()` again will result an error. Now we check if it is a
list before run this method.
2025-04-10 09:04:20 +08:00
HL
babd2c183c docs: update recent talks (#996) 2025-04-10 09:03:43 +08:00
1ee730163f fix: add seed to vllm spmd 0.8.3 (#912)
See
8b664706aa
In summary, now when using external launcher in vLLM, a Seed must be
set.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-04-09 17:40:52 +08:00
fefe951f2a Add support to HSDP model merging. (#971)
Currently the model merger does not support HSDP (the `ddp` mesh dim is
not considered). This PR fixes this.
2025-04-09 07:55:39 +08:00
88db554073 fix: wrong pg_clipfrac_lower (#972)
Currently, `pg_clipfrac_lower` is always 0 by mistake.
2025-04-09 07:55:22 +08:00
fa23b696dd [tuning] docs: record the resource requirements for 70b model (#976)
Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-08 11:33:44 -07:00
1a42f14da0 fix: reward_fn_key for PRIME (#975) 2025-04-08 20:03:21 +08:00
713e99e6a1 fix: DAPO wandb link (#978) 2025-04-08 20:02:33 +08:00
6433fd4a97 fix: return list from bootstrap_metric (#969)
Fixing #950
2025-04-08 16:28:43 +08:00
HL
96f7177972 docs: add open-hands, vagen (#963) 2025-04-08 14:11:05 +08:00
fd0eba03cd fix: optim.warmup_style do not take effect (#418) (#959)
Support to set warmup_style=='cosine'.
2025-04-08 11:57:40 +08:00
8400beb87c [merger] fix: move megatron import into megatron related branch (#958)
users using fsdp backend may no have megatron installed, directly
running this script will lead to an import error.
2025-04-07 09:50:21 -07:00
c87e9f69e5 [distributed] enhancement: Make register_center named actor waiting time configurable & providing better error info (#947)
## Summary

As mentioned in #491, the `register_center` named actor could be `None`
after 2mins waiting time and crash the job for some verl users.

This might be due to (1) uncleaned ray resources from previous runs; or
(2) too short waiting time of 120s if the `named_actor` launching task
is delayed in the cluster.

This PR makes the `register_cetner` named actor waiting time
configurable and longer by default . This PR also provides better error
info to help users to self debug the issue.

## Related issues

#491

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-07 09:48:27 -07:00
f20f552873 fix: support non-DTensor when converting fsdp checkpoints to hf model (#925)
As mentioned in https://github.com/volcengine/verl/issues/903, the
model_merger script has some problem when dealing with saved fsdp
checkpoint trained with `trainer.n_gpus_per_node=1`. The loaded `weight`
is of type `Tensor` instead of `DTensor`. This PR supported this
situation.
2025-04-07 15:46:33 +08:00
d13434fd7b [megatron] feat: add gradient checkpointing in megatron backend (#944)
### Changes 

Add gradient checkpointing (aka `activation recomputation`) config and
support from Megatron core
(b7ec711cf6/megatron/core/transformer/transformer_config.py (L208-L233))
to make activation checkpointing more efficient for LLMs with 20B+
parameters.

```
 gradient_checkpointing_kwargs:
     activations_checkpoint_method: null
     activations_checkpoint_granularity: null
     activations_checkpoint_num_layers: null
```

### Test 
Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM
issues after adding gradient checkpointing.

### Next Step

Add one `ppo_trainer for megatron` doc to explain the config details in
https://verl.readthedocs.io/en/latest/examples/config.html
2025-04-06 20:49:20 -07:00
82cbc43dc7 feat: Batch Rewards (#871) 2025-04-07 11:15:48 +08:00
6efa0181fa [sglang] feat: enhence sglang_rollout to handle image input (#824)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: GeLee <leege233@gmail.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
2025-04-06 12:01:55 -07:00
f8d19735c5 [vllm] feat: enable FSDP and vLLM 0.8.2 to support DPSK v3 training. (#923)
This is a solution to an error that occurs when vLLM 0.8.2 loads models
in the ds_v3 MoE format. For MoE models, directly calling vLLM's
`load_weights· function will result in an error, primarily because the
model params lose the related methods of the `FusedMoe` class.
Therefore, it is necessary to identify and call the specific parameter
loading method based on the model parameter names at runtime. However,
the current version is a dirty implementation because, in reality, if
invasive changes were made to vLLM, it would only require modifying 2
lines of code and adding a new function to identify the layer count (I
have already marked comments in the code). I’m not sure if there’s a
better implementation and would like some suggestions.
2025-04-06 10:54:31 -07:00
d2c60642e7 [log] fix: validation metrics of reward and maj voting (#927)
This PR:

1. calculate metrics for "reward" by default if "acc"s are not
avaiblable
2. don't calculate majority voting metrics if "pred"s are not available
2025-04-06 10:42:03 -07:00
7753d37d71 [sglang] ci: upgrade to sglang 0.4.4.post4 (#941) 2025-04-06 10:33:53 -07:00
7afa6c6225 [rf++] style: eos_mask to response_mask for reinforce++-baseline method (#938)
[fix] misleading eos_mask->response_mask for
reinforce_plus_plus_baseline.
2025-04-06 10:10:07 -07:00
HL
40c00c5d52 [ci] chore: reduce CI load part-2 (#942) 2025-04-07 01:07:58 +08:00
HL
526c0908be [ci] chore: reduce CI load (#934) 2025-04-06 10:06:10 -07:00
15263cb86a prompt: Fix computer_score in fsdp_workers.py (#629)
Fix issue when not switching the chat template, `rm_data` is undefined

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-04-05 23:00:54 -07:00
HL
7471e015d2 docs: add triton compile err to faq (#809) 2025-04-06 13:16:20 +08:00
4d722b1768 fix: seq-mean-token-sum loss (#931) 2025-04-06 06:54:39 +02:00
7fc8330d99 [sglang] feat: SGLang rollout multinode support (#915)
Allow multinode tensor parallel for furture plan

---------

Co-authored-by: zobinHuang <zobin1999@gmail.com>
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-04-05 20:17:35 -07:00
de46048420 [vllm] fix: skip vllm initialization with weight loading (#922)
In version 0.8.2, forgetting to add `dummy` parameter resulted in
repeated loading. And it also needs to be compatible with the default
parameter `dummy_dtensor`.
2025-04-05 19:42:34 -07:00
cc6dd901f7 docker: add verl-sglang dockerfile (#930)
As stated in #915 , add the dockerfile for building verl-sglang image
2025-04-05 10:03:50 -07:00
8447937cb8 [math-verify] fix: TimeoutException (#929) 2025-04-05 08:41:05 -07:00
30259d2c0b Feat: support REINFORCE++-baseline and add script for REINFORCE++ (#908)
Refer to the paper *REINFORCE++* (https://arxiv.org/abs/2501.03262) and
the OpenRLHF project (https://github.com/OpenRLHF/OpenRLHF). We find
that the RF++-baseline demonstrates greater stability than GRPO,
particularly in mathematical scenarios and reasoning tasks.
2025-04-04 18:59:57 -07:00
fb0394143f feat: Add multi-turn SFT support (#195) 2025-04-04 16:17:06 -07:00
HL
4f245a3bd7 tool: add diagnosis script (#918)
add dependency detector for vllm/sglang, as well as cuda info
usage: `python3 scripts/diagnose.py`
2025-04-04 22:51:07 +02:00
HL
0fc8e77b59 docs: update installation and adoption docs (#921) 2025-04-04 22:35:48 +02:00
0407cad23b [dataset] refactor: remove unused filter_prompts parameter from RLHFDataset (#889)
`filter_prompts` has never been used, I think this parameter has been
replaced by `filter_overlong_prompts` so we can simply remove this.
2025-04-04 09:32:49 -07:00
d5a1c810bd fix: set gen_batch_size based on config (#909) 2025-04-04 09:31:07 -07:00
6974bbaeea [dataset] refactor: use hf Dataset instead of pandas DataFrame in RLHFDataset for speedup (#890)
HF Dataset provides better memory management and can handle larger
datasets. It also supports multi-process acceleration during map/filter
operations (while pandas requires version >2.0).

Now we can specify `filter_overlong_prompts` on large-scale datasets
when set `filter_overlong_prompts_workers` to a appreciate num.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-04-03 21:51:53 -07:00
6d931df9ad [log] fix: log after generate_sequences (#819) 2025-04-03 21:39:22 -07:00
f9256b8dbf [algo] misc: remove redundant tile([1, response_length]), efficient broadcast instead (#868)
as titled
2025-04-04 10:29:11 +08:00
3a27a98647 [recipe] feat: integrate DAPO and provide reproduction script (#623)
> [!WARNING]
> As mentioned in
https://github.com/volcengine/verl/pull/623#issuecomment-2733688593, the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
2025-04-04 05:46:47 +08:00
cc612dbae6 [dev] feat: default VSCode repo settings to help consistency with CI (#894)
This PR adds default VSCode repo settings to help keep consistent with
the CI, which:

1. enable the `pylint` linter extension
2. set the default formatter as `yapf`
3. but don't organize imports for now (since we haven't got a
functionality for this)
2025-04-04 03:36:47 +08:00
1a7e53d076 test: update vllm_spmd test for > 0.7.3 (#861)
I tested the `Deepseek-7B-chat` and `Qwen2-7B` these two models, the
former showed a 0% difference in output, while the latter exhibited a
10.25% difference in output, with no significant issues in the output.
So I manually adjusted the error tolerance to 15%. I’m not sure if this
will work.
2025-04-04 00:53:58 +08:00
0338805954 reuse GPTModel, try to fix CI issue (#884)
Also try to reduce CI time in this version, grpo hangs too much tasks in
L20-1
In current CI device mapping:

Ckpt 0 16m
Dataset 1 1m
dcf 1 6m
dc 0 6m
Ea 1 8m
grpo 0 15m
grpo 1 15m
Mega 0 15m
Gp 1 2m
Gsm8k 1 24m
Lora 1 1m
sft 1 2m
sglang 1 3m
Vlm 1 5m
Model 1 3m
Ray 0 3m
Sandbox 0 1m
Vllm 0 12m

0 16+6+15+15+3+1+12=68
1 1+6+8+15+2+24+1+2+3+5+3=70
2025-04-03 23:52:52 +08:00
b6dc157202 fix: the error is not raised when using both megatron and hf inference (#885)
Hi there, when using both megatron and
`actor_rollout_ref.rollout.name=hf`, the NotImplementedError is not
raised. The PR fixes it.
```
(TaskRunner pid=229016)   File "/root/verl/verl/workers/megatron_workers.py", line 285, in _build_rollout                                                                            
(TaskRunner pid=229016)     return rollout, sharding_manager                                                                                                                         
(TaskRunner pid=229016) UnboundLocalError: local variable 'rollout' referenced before assignment 
```
2025-04-03 17:17:04 +08:00
b0e0ac5da7 docs: add config docs for evaluation.yaml (#886)
https://github.com/volcengine/verl/pull/777#discussion_r2024195591
2025-04-03 17:16:30 +08:00
8cae42dc29 fix: misleading eos_mask->response_mask (#878)
https://github.com/volcengine/verl/pull/868#discussion_r2024416560
2025-04-03 13:01:07 +08:00
HL
7895c1f472 docs: add megatron grpo qwen2 training logs (#881) 2025-04-03 13:00:24 +08:00
HL
81a15ed78a revert: "Use Mcore GPTModel" (#883)
Reverts volcengine/verl#706 temporarily as it breaks CI 

https://github.com/volcengine/verl/actions/runs/14220739954/attempts/2

```
(TaskRunner pid=10086) 'Initial validation metrics: {}'
(TaskRunner pid=10086) step:0
(TaskRunner pid=10086) list(reward_extra_infos_dict.keys())=[]
(TaskRunner pid=10086) test_gen_batch meta info: {'eos_token_id': 32021, 'pad_token_id': 32014, 'recompute_log_prob': False, 'do_sample': False, 'validate': True}
(TaskRunner pid=10086) validation generation end
(TaskRunner pid=10086) [prompt] You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer
(TaskRunner pid=10086) ### Instruction:
(TaskRunner pid=10086) 
Training Progress:  33%|███▎      | 1/3 [02:39<05:18, 159.11s/it]
(WorkerDict pid=18977) /root/miniconda3/lib/python3.10/site-packages/torch/autograd/graph.py:768: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) [repeated 7x across cluster]
(WorkerDict pid=18977)   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass [repeated 7x across cluster]
(TaskRunner pid=10086) 
Training Progress:  33%|███▎      | 1/3 [04:51<09:43, 291.93s/it]
(WorkerDict pid=18980) [rank4]:[E402 16:49:38.988158820 ProcessGroupNCCL.cpp:1515] [PG 97 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
(WorkerDict pid=18980) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(WorkerDict pid=18980) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(WorkerDict pid=18980) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(WorkerDict pid=18980) 
(WorkerDict pid=18980) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
(WorkerDict pid=18980) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc6e4177f86 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so)
(WorkerDict pid=18980) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fc6e4126d10 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so)
(WorkerDict pid=18980) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fc6e4594f08 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
(WorkerDict pid=18980) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fc6927d2a56 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fc6927d7c70 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fc6927de92a in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc6927e0d6c in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #7: <unknown function> + 0xdbbf4 (0x7fc9fd477bf4 in /root/miniconda3/bin/../lib/libstdc++.so.6)
(WorkerDict pid=18980) frame #8: <unknown function> + 0x94ac3 (0x7fc9ff2f0ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
(WorkerDict pid=18980) frame #9: clone + 0x44 (0x7fc9ff381a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
(WorkerDict pid=18980) 
(WorkerDict pid=18980) [2025-04-02 16:49:38,666 E 18980 20767] logging.cc:97: Unhandled exception: N3c1016DistBackendErrorE. what(): [PG 97 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
(WorkerDict pid=18980) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(WorkerDict pid=18980) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(WorkerDict pid=18980) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(WorkerDict pid=18980) 
(WorkerDict pid=18980) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
(WorkerDict pid=18980) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc6e4177f86 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so)
(WorkerDict pid=18980) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fc6e4126d10 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so)
(WorkerDict pid=18980) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fc6e4594f08 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
(WorkerDict pid=18980) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fc6927d2a56 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fc6927d7c70 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fc6927de92a in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc6927e0d6c in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #7: <unknown function> + 0xdbbf4 (0x7fc9fd477bf4 in /root/miniconda3/bin/../lib/libstdc++.so.6)
(WorkerDict pid=18980) frame #8: <unknown function> + 0x94ac3 (0x7fc9ff2f0ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
(WorkerDict pid=18980) frame #9: clone + 0x44 (0x7fc9ff381a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
(WorkerDict pid=18980) 
(WorkerDict pid=18980) Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1521 (most recent call first):
(WorkerDict pid=18980) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc6e4177f86 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so)
(WorkerDict pid=18980) frame #1: <unknown function> + 0xe1a5e4 (0x7fc6924625e4 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
(WorkerDict pid=18980) frame #2: <unknown function> + 0xdbbf4 (0x7fc9fd477bf4 in /root/miniconda3/bin/../lib/libstdc++.so.6)
(WorkerDict pid=18980) frame #3: <unknown function> + 0x94ac3 (0x7fc9ff2f0ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
(WorkerDict pid=18980) frame #4: clone + 0x44 (0x7fc9ff381a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
(WorkerDict pid=18980) 
(WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:104: Stack trace: 
(WorkerDict pid=18980)  /root/miniconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe543a) [0x7fc9fe5a143a] ray::operator<<()
(WorkerDict pid=18980) /root/miniconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe7b78) [0x7fc9fe5a3b78] ray::TerminateHandler()
(WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xb135a) [0x7fc9fd44d35a] __cxxabiv1::__terminate()
(WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7fc9fd44d3c5]
(WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xb134f) [0x7fc9fd44d34f]
(WorkerDict pid=18980) /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so(+0xe1a695) [0x7fc692462695] c10d::ProcessGroupNCCL::ncclCommWatchdog()
(WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xdbbf4) [0x7fc9fd477bf4] execute_native_thread_routine
(WorkerDict pid=18980) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fc9ff2f0ac3]
(WorkerDict pid=18980) /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fc9ff381a04] __clone
(WorkerDict pid=18980) 
(WorkerDict pid=18980) *** SIGABRT received at time=1743612578 on cpu 118 ***
(WorkerDict pid=18980) PC: @     0x7fc9ff2f29fc  (unknown)  pthread_kill
(WorkerDict pid=18980)     @     0x7fc9ff29e520  (unknown)  (unknown)
(WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:361: *** SIGABRT received at time=1743612578 on cpu 118 ***
(WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:361: PC: @     0x7fc9ff2f29fc  (unknown)  pthread_kill
(WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:361:     @     0x7fc9ff29e520  (unknown)  (unknown)
(WorkerDict pid=18980) Fatal Python error: Aborted
(WorkerDict pid=18980) 
(WorkerDict pid=18980) 
(WorkerDict pid=18980) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, _brotli, zstandard.backend_c, uvloop.loop, ray._raylet, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, msgspec._core, sentencepiece._sentencepiece, PIL._imagingft, regex._regex, multidict._multidict, yarl._helpers_c, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, pyarrow._json, zmq.backend.cython.context, zmq.backend.cython.message, zmq.backend.cython.socket, zmq.backend.cython._device, zmq.backend.cython._poll, zmq.backend.cython._proxy_steerable, zmq.backend.cython._version, zmq.backend.cython.error, zmq.backend.cython.utils (total: 96)
Error executing job with overrides: ['algorithm.adv_estimator=gae', 'data.train_files=/github/home/data/gsm8k/train.parquet', 'data.val_files=/github/home/data/gsm8k/test.parquet', 'data.train_batch_size=1024', 'data.max_prompt_length=512', 'data.max_response_length=512', 'actor_rollout_ref.model.path=/github/home/models/deepseek-ai/deepseek-coder-1.3b-instruct', 'actor_rollout_ref.actor.optim.lr=2e-6', 'actor_rollout_ref.actor.ppo_mini_batch_size=256', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4', 'actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=2', 'actor_rollout_ref.actor.megatron.virtual_pipeline_model_parallel_size=2', 'actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4', 'actor_rollout_ref.actor.use_kl_loss=False', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8', 'actor_rollout_ref.rollout.tensor_model_parallel_size=2', 'actor_rollout_ref.rollout.name=vllm', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.5', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16', 'actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=2', 'actor_rollout_ref.ref.megatron.virtual_pipeline_model_parallel_size=2', 'actor_rollout_ref.ref.megatron.tensor_model_parallel_size=2', 'critic.optim.lr=2e-5', 'critic.model.path=/github/home/models/deepseek-ai/deepseek-coder-1.3b-instruct', 'critic.model.enable_gradient_checkpointing=False', 'critic.ppo_micro_batch_size_per_gpu=4', 'critic.megatron.pipeline_model_parallel_size=2', 'critic.megatron.virtual_pipeline_model_parallel_size=2', 'critic.megatron.tensor_model_parallel_size=2', 'algorithm.use_kl_in_reward=True', 'algorithm.kl_penalty=kl', 'algorithm.kl_ctrl.kl_coef=0.001', 'trainer.critic_warmup=0', 'trainer.logger=[console]', 'trainer.project_name=verl_megatron_gsm8k_examples', 'trainer.experiment_name=deepseek_llm_1b3_function_rm', 'trainer.n_gpus_per_node=8', 'trainer.nnodes=1', 'trainer.save_freq=-1', 'trainer.test_freq=1', 'trainer.total_epochs=15', 'trainer.total_training_steps=3']
(TaskRunner pid=10086) Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market? Let's think step by step and output the final answer after "####".
(TaskRunner pid=10086) ### Response:
(TaskRunner pid=10086) 
(TaskRunner pid=10086) [response] I'm sorry, but as an AI programming assistant, I'm specialized in answering questions related to computer science. I'm not equipped to provide answers to questions about economics or business calculations. I recommend using a calculator or a business-oriented tool for this type of question.
(TaskRunner pid=10086) 
(TaskRunner pid=10086) [ground_truth] 18
(TaskRunner pid=10086) [score] 0.0
(TaskRunner pid=10086) step:1 - global_seqlen/min:[486](https://github.com/volcengine/verl/actions/runs/14220739954/job/39861249946#step:6:487)35.000 - global_seqlen/max:51694.000 - global_seqlen/minmax_diff:3059.000 - global_seqlen/balanced_min:49636.000 - global_seqlen/balanced_max:49637.000 - global_seqlen/mean:49636.125 - actor/reward_kl_penalty:0.000 - actor/reward_kl_penalty_coeff:0.001 - critic/vf_loss:0.015 - critic/vf_clipfrac:0.001 - critic/vpred_mean:0.007 - perf/mfu/critic:0.105 - actor/entropy_loss:0.550 - actor/pg_loss:-0.000 - actor/pg_clipfrac:0.018 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - perf/mfu/actor:0.106 - critic/score/mean:0.000 - critic/score/max:0.000 - critic/score/min:0.000 - critic/rewards/mean:0.000 - critic/rewards/max:0.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.000 - critic/advantages/max:4.994 - critic/advantages/min:-5.666 - critic/returns/mean:-0.000 - critic/returns/max:0.000 - critic/returns/min:-0.000 - critic/values/mean:-0.164 - critic/values/max:0.785 - critic/values/min:-1.000 - critic/vf_explained_var:-2803.085 - response_length/mean:239.112 - response_length/max:512.000 - response_length/min:11.000 - response_length/clip_ratio:0.029 - prompt_length/mean:148.670 - prompt_length/max:275.000 - prompt_length/min:106.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:18.608 - timing_s/old_log_prob:15.249 - timing_s/ref:14.[488](https://github.com/volcengine/verl/actions/runs/14220739954/job/39861249946#step:6:489) - timing_s/values:16.315 - timing_s/adv:0.264 - timing_s/update_critic:33.651 - timing_s/update_actor:33.472 - timing_s/testing:25.497 - timing_s/step:157.587 - timing_per_token_ms/adv:0.001 - timing_per_token_ms/gen:0.076 - timing_per_token_ms/update_actor:0.084 - timing_per_token_ms/values:0.041 - timing_per_token_ms/update_critic:0.085 - timing_per_token_ms/ref:0.036 - perf/total_num_tokens:397089.000 - perf/time_per_step:157.587 - perf/throughput:314.976
(TaskRunner pid=10086) list(reward_extra_infos_dict.keys())=[]
(TaskRunner pid=10086) test_gen_batch meta info: {'eos_token_id': 32021, 'pad_token_id': 32014, 'recompute_log_prob': False, 'do_sample': False, 'validate': True}
(WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
(WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] 
Traceback (most recent call last):
  File "/data00/tiger/huggingface/verl/verl/verl/trainer/main_ppo.py", line 54, in main
    run_ppo(config)
  File "/data00/tiger/huggingface/verl/verl/verl/trainer/main_ppo.py", line 72, in run_ppo
    ray.get(runner.run.remote(config))
  File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 2667, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 864, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::TaskRunner.run() (pid=10086, ip=172.20.0.2, actor_id=11bc451866f5759f3a7f540[501](https://github.com/volcengine/verl/actions/runs/14220739954/job/39861249946#step:6:502)000000, repr=<main_ppo.TaskRunner object at 0x7fd00c61a110>)
  File "/data00/tiger/huggingface/verl/verl/verl/trainer/main_ppo.py", line 184, in run
    trainer.fit()
  File "/data00/tiger/huggingface/verl/verl/verl/trainer/ppo/ray_trainer.py", line 950, in fit
    val_metrics: dict = self._validate()
  File "/data00/tiger/huggingface/verl/verl/verl/trainer/ppo/ray_trainer.py", line 545, in _validate
    test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded)
  File "/data00/tiger/huggingface/verl/verl/verl/single_controller/ray/base.py", line 42, in func
    output = ray.get(output)
ray.exceptions.RayTaskError(RuntimeError): ray::WorkerDict.actor_rollout_generate_sequences() (pid=18980, ip=172.20.0.2, actor_id=4f21075809bd462a5907ebea01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fc62ae1ce20>)
  File "/root/miniconda3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1708, in execute_model
    output: SamplerOutput = self.model.sample(
  File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 571, in sample
    next_tokens = self.sampler(logits, sampling_metadata)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 231, in forward
    self._init_sampling_tensors(logits, sampling_metadata)
  File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 195, in _init_sampling_tensors
    do_min_p) = SamplingTensors.from_sampling_metadata(
  File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 471, in from_sampling_metadata
    sampling_tensors = SamplingTensors.from_lists(
  File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 529, in from_lists
    temperatures_t = torch.tensor(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```
2025-04-03 07:38:21 +08:00
233c11c173 [recipe] r1: support R1 Benchmark Evaluation (#777)
https://github.com/volcengine/verl/issues/708

Support Evaluaton:
- [x] GPQA Diamond (english)
- [x] LiveCodeBench (code)
- [x] AIME 2024 (math)
- [x] CNMO 2024 (math)

Test
- [x] DS-R1-Distill-Qwen2.5-1.5B
- [x] DS-R1

---

Example Eval Scripts in `recipes/r1/run_r1_distill_qwen.sh`

---

Eval Results of DS-R1-Distill-Qwen2.5-1.5B (k=8)

Dataset | Test Results | Reported
-- | -- | --
GPQA Diamond | 35.3 | 33.8
LiveCodeBench | 16.9 | 16.9
AIME 2024 | 30.4 | 28.9
CNMO 2024 (en) | 45.1 | -
CNMO 2024 (zh) | 41.0 | -

---

Eval Results (DS-R1)

Dataset | Test Results (k=1) | Test Results (k=4) | Reported
-- | -- | -- | --
GPQA Diamond | 67.7 | 69.6 | 71.5
LiveCodeBench | 64.7 | 63.1 | 65.9
AIME 2024 | 86.7 | 79.2 | 79.8
CNMO 2024 | 75.0 | 78.5 | 78.8

The final eval results will be placed
[here](https://huggingface.co/datasets/dyyyyyyyy/r1-benchmark-eval).
2025-04-02 10:02:15 -07:00
b6cd6b759e Use Mcore GPTModel (#706)
Use official GPTModel in megatron worker, supporting actor and critic workers.
2025-04-02 15:19:28 +08:00
4fec38c5b3 [fix] add dpsk v3 type in config for mfu compute (#872)
The MFU has been tested with the relevant model and can be calculated
normally.
2025-04-02 11:45:45 +08:00
6272b8ce1d Implement Dual-Clip PPO Algorithm (#784)
Add the [Dual-Clip PPO](https://arxiv.org/pdf/1912.09729) algorithm to
enhance the current PPO implementations. The Dual-Clip PPO introduces a
approach by applying a lower bound to the policy ratio when the
advantage is less than zero, when multiplied by a huge raito, does not
exceed a specified lower bound. The concept is illustrated in the figure
below:
<img width="626" alt="Clipboard_Screenshot_1743047374"
src="https://github.com/user-attachments/assets/93952edc-30c8-477e-bc3d-4770fabe55b8"
/>
So, the finall loss of the ppo is 
<img width="624" alt="Clipboard_Screenshot_1743047410"
src="https://github.com/user-attachments/assets/5900490b-f64a-4bde-87d6-8359615b3337"
/>
This adjustment leads to a modified final loss calculation for the PPO,
which could potentially improve training stability and performance in
certain scenarios. I believe integrating this feature could provide
significant benefits, and I look forward to feedback on this suggestion.
2025-04-02 10:13:22 +08:00
05bdeadc3d [misc] add trust_remote_code param for loading custom tokenizer (#865) 2025-04-02 07:49:17 +08:00
45e02c88e0 Docs: add Rec-R1 in Readme (#869) 2025-04-01 15:19:39 -07:00
437e96bc02 [sglang] doc: Update the SGLang installation instructions to the latest version. (#867) 2025-04-01 09:47:06 -07:00
9dcac14f1b [critic] fix: normalize mini batch size for critic (#853)
To keep consistent with 816dacc7da/verl/workers/fsdp_workers.py (L117)
2025-04-01 19:20:05 +08:00
HL
776b0a9ddc docs: improve installation and ulysses docs (#854) 2025-04-01 10:37:35 +08:00
072fc9feed feat: support no reference model; fix KL issues (#644)
### Before get started

Difference between KL penalty in reward and KL loss

>  [!TIP]
>
>  1. In-reward KL penalty
>
>
>  $$
> r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log
\frac{\pi_{\theta}(o_t | q, o_{<t})}{\pi_{\text{ref}}(o_t | q, o_{<t})}}
>  $$
>
>  2. KL Loss
>
>  $$
> L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t,
\text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ]
>  $$
>
>  $$
>  \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} || \pi_{\text{ref}})}
>  $$

### Problems

1. The current code doesn't support not using reference model

This feature is half-implemented since the very first commit but never
completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy`
but it's always True since role_worker_mapping always has
`Role.RefPolicy`.

2. Restriction of `use_kl_loss` 

Currently, `use_kl_loss` determines whether to use in-reward kl penalty
or kl loss. So we can not use **both or neither**.


87a813658f/verl/trainer/ppo/ray_trainer.py (L875-L879)


87a813658f/verl/workers/actor/dp_actor.py (L299-L307)

>  [!CAUTION]  
>
>  ### You may have unintentionally adopted in-reward KL penalty
>
> For the experiments you've conducted, if you set
`actor.use_kl_loss`=False or didn't set it (Default is False),***You
unintentionally used in-reward KL penalty.*** If you don't want any KL,
you should set `actor_rollout_ref.actor.use_kl_loss=False` and
`algorithm.use_kl_in_reward=False` (or not to set them because they are
the default config) after this commit.

3. Deprecated config

After investigation, I guess Critic may used to be responsible for
in-reward KL. But this feature seems paralyzed.

1. Line 290, there may used to be `config.algorithm.kl_ctrl.target_kl`
and `config.critic.kl_ctrl.horizon` , which are not supported currently.


3ec83117c3/verl/trainer/ppo/ray_trainer.py (L289-L293)

2. In `verl/workers/critic/megatron_critic.py` : redundant set of
`self.kl_ctrl`


3b18b0eb74/verl/workers/critic/megatron_critic.py (L69-L73)


### What’s Changed?

1. Add support for not using reference model
2. Fixed the incomplete code of the KL controller.
3. A test case for using both kl terms
4. Some other misc issues in the code.

### How to disable reference model

* set `actor_rollout_ref.actor.use_kl_loss=False` and
`algorithm.use_kl_in_reward=False` (They are by default False, so you
can simply not set them)
2025-04-01 10:14:38 +08:00
c0621e1bcd [ulysses] fix: repeat kv heads by sp_size//nheads_k if nheads_k is less than sp_size (#850) 2025-03-31 16:25:53 -07:00
HL
77babf1956 [BREAKING] feat: support custom datasets for SFT trainer (#832)
This PR breaks the SFTDataset interface, but provides more flexibility
on dataset type and arguments passed in.
Usage:
```
--data.custom_cls.path=/path/to/dataset.py --data.custom_cls.name=MyDataset
```
2025-04-01 05:36:33 +08:00
d5fbf42b67 [doc] add log_val_generations in trainer (#844) 2025-03-31 12:22:40 -07:00
816dacc7da [doc] feat: doc for val_before_train (#840) 2025-03-31 09:38:15 -07:00
1f78e8b09c [fix] Add param to resolve custom model loading failure (#845) 2025-03-31 19:25:25 +08:00
a03a72a35a [doc] fix: typo for REINFORCE (#846) 2025-03-31 19:24:53 +08:00
7646e08fca [example] rollout: add vllm 0.8.2 mutli nodes generation bash (#838) 2025-03-30 23:07:42 -07:00
64bddb68f5 [BREAKING config] fix: move val_before_train to config yaml. Using trainer.val_before_train instead of +trainer.val_before_train going forward (#820) 2025-03-30 23:05:48 -07:00
7fbf609197 [BREAKING config] feat: add mlflow val generation log and uri config (#822)
### Changes

- Add mlflow validation generation in `ValidationGenerationsLogger` in
the form of MLFlow artifact files.
- Add the config of `MLFLOW_TRACKING_URI` in mlflow tracking. 
- rename `val_generations_to_log_to_wandb` to `log_val_generations`

### Test
Tested in the self-host mlflow servers.
2025-03-30 08:44:36 -07:00
0e99caa2b3 docs: add PURE to README.md (#826)
add our work,
[PURE](https://tungsten-ink-510.notion.site/Stop-Gamma-Decay-Min-Form-Credit-Assignment-Is-All-Process-Reward-Model-Needs-for-Reasoning-19fcb6ed0184804eb07fd310b38af155?pvs=4),
to the "Awesome work using verl" section in README
2025-03-30 15:38:10 +00:00
5138a22c66 [sglang] fix: add memory saver support to sglang rollout to avoid OOMs (#756)
as title

---------

Co-authored-by: ocss884 <ocss.lin@gmail.com>
2025-03-30 08:36:16 -07:00
ccab83654c Megatron checkpoint default not save hf_models, and provide model merge tool. (#780)
Because CI is too slow, combine the features and functions of checkpoint
here in 1 PR.

# Add Layer idx to decode layers

But it seems to be hard to attach a "correct" layer number to each
layer, now verl implemented megatron each pp and vpp rank's layers start
from index 0, leading to some inconvenience for merging tool.

The difficulty mainly comes from `torch.nn.ModuleList` implementation,
[it suggests and forces to directly use index rather than custom layer
number](8a40fca9a1/torch/nn/modules/container.py (L302C5-L324C66)).

Current solution is that we modify the layer number to actual number
starts from pp and vpp offset when saving megatron checkpoint, and
recover when loading. When use merging tool, there is no need for extra
scans.

# Huggingface Model loader logic simplified

Since every rank can have access to state_dict, there is actually no
need to broadcast the weights among mp and dp groups at all, and all
from rank 0. The implementation before is too costly and may cause OOM
issue because each rank can take up whole model space in GPU.

And the loader logic is not straight-forward, since everyone only need
to load its vpp_size number of layers, why iterate over whole
num_layers.

So current solution is every rank load itself's sharded weights from
`state_dict`.

But this requires users having storage nodes available to connect with
every calculation nodes. For those who can only use rank 0 to store
huggingface model, we move original implementation to deperacated
besides new version of file.

# Modify test scripts to reuse downloaded huggingface model

Avoid errors when connecting with huggingface to access metadata.

# Modify CI workflows to enable load-balance of CI machines

Currently L20-0 takes up 6 more jobs than L20-1, try reduce the pipeline
bubble of each task.
2025-03-30 10:39:40 +08:00
797f9994b7 Fix typo on installation guide (#813)
Modify the version number of Megatron-llm from ``core_v0.11.0`` to
``core_r0.11.0``
2025-03-29 17:27:10 +08:00
0cf4ca4757 [misc] add deepseek v3 flops compute func (#814) 2025-03-29 17:26:41 +08:00
f3913d0014 [megatron] fix: remove redundant return value for hf_config (#722) 2025-03-28 21:53:54 -07:00
50cba4aab9 docs: update checkpoint doc (#800)
Also fix some APIs.
2025-03-28 21:27:01 -07:00
4f32b32c99 ci/cd: add pylint to CI (#811)
* add a workflow to run pylint
* add a section to `pyproject.toml` that blacklists all rules which
would trigger given the current code
* pin a version of pylint in `requirements.txt` for reproducability

In a followup PR I will remove some rules from the blacklist and fix
some bugs.
2025-03-28 14:59:38 -07:00
093e9599dd [trainer] fix: skip the update step when encountering gradient overflow (#789)
due to issues such as mixed precision updates or corrupted data, model
training may crash. to prevent abnormal updates, you can check grad_norm
when updating the model, which might be a temporarily effective
solution. however, if similar issues occur frequently, it is necessary
to further investigate the data and loss design for a more thorough
troubleshooting
cover:  #637 #747 #751
2025-03-28 09:48:20 -07:00
52e80fc143 Fix padding length for sglang rollout in veRL (#773)
Fixed a portion of the issues encountered during VLM GPTO training as
mentioned in the article.

https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/veRL-VLM.md

When do_sample=False, different models under DP output sequences of
inconsistent lengths, which may be padded to different lengths,
ultimately causing shape inconsistencies during output and resulting in
errors in collect_dp_compute_data_proto. The following situation
occurred:
```
DataProto(batch=TensorDict(
    fields={
        attention_mask: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False),
        input_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False),
        position_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False),
        prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False),
        responses: Tensor(shape=torch.Size([151, 4096]), device=cpu, dtype=torch.int64, is_shared=False)},
    batch_size=torch.Size([151]),
    device=cpu,
    is_shared=False), non_tensor_batch={}, meta_info={}), DataProto(batch=TensorDict(
    fields={
        attention_mask: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False),
        input_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False),
        position_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False),
        prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False),
        responses: Tensor(shape=torch.Size([151, 4096]), device=cpu, dtype=torch.int64, is_shared=False)},
    batch_size=torch.Size([151]),
    device=cpu,
    is_shared=False), non_tensor_batch={}, meta_info={}), DataProto(batch=TensorDict(
    fields={
        attention_mask: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False),
        input_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False),
        position_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False),
        prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False),
        responses: Tensor(shape=torch.Size([151, 2048]), device=cpu, dtype=torch.int64, is_shared=False)},
    batch_size=torch.Size([151]),
    device=cpu,
    is_shared=False), non_tensor_batch={}, meta_info={}), DataProto(batch=TensorDict(
    fields={
        attention_mask: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False),
        input_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False),
        position_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False),
        prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False),
        responses: Tensor(shape=torch.Size([151, 2048]), device=cpu, dtype=torch.int64, is_shared=False)},
    batch_size=torch.Size([151]),
    device=cpu,
    is_shared=False), non_tensor_batch={}, meta_info={})]
```

This modification resolves this issue.

---------

Co-authored-by: GeLee-Q <8650386969@qq.com>
2025-03-28 22:28:02 +08:00
36a0f06d8a Update README.md (#797)
For bolding some key words in description of MetaSpatial.
2025-03-28 15:18:09 +08:00
1106 changed files with 130713 additions and 32788 deletions

10
.gemini/config.yaml Normal file
View File

@ -0,0 +1,10 @@
have_fun: false
code_review:
disable: false
comment_severity_threshold: HIGH
max_review_comments: -1
pull_request_opened:
help: false
summary: false
code_review: true
ignore_patterns: []

30
.github/CODEOWNERS vendored Normal file
View File

@ -0,0 +1,30 @@
/docs @eric-haibin-lin @zhaochenyang20 @hongpeng-guo
/docs/amd_tutorial @yushengsu-thu
/docs/slang_multiturn @zhaochenyang20 @SwordFaith
/docs/ascend_tutorial @FightingZhen
/recipe/dapo @tongyx361 @PeterSH6 @vermouth1992 @tardis-key @FightingZhen @ji-huazhong
/recipe/spin @zhaochenyang20
/recipe/sppo @zhaochenyang20
/third_party/sglang @zhaochenyang20 @SwordFaith
/third_party/vllm @PeterSH6 @wuxibin89
/examples/grpo_trainer @vermouth1992 @PeterSH6 @tardis-key @FightingZhen @ji-huazhong
/verl/single_controller @zw0610 @wuxibin89 @hongpeng-guo
/verl/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
/verl/models/mcore @ISEEKYAN @vermouth1992
/verl/models/transformers @vermouth1992 @PeterSH6 @tardis-key @FightingZhen @ji-huazhong
/verl/workers/engine @eric-haibin-lin @vermouth1992 @ZihengJiang
/verl/workers/roles @eric-haibin-lin @vermouth1992 @ZihengJiang
/verl/workers/engine/fsdp @eric-haibin-lin @vermouth1992 @ZihengJiang
/verl/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
/verl/workers/rollout/sglang_rollout @zhaochenyang20 @SwordFaith @chenhaiq
/verl/workers/actor/megatron_actor.py @ISEEKYAN @vermouth1992
/verl/workers/critic/megatron_critic.py @ISEEKYAN @vermouth1992
/verl/workers/megatron_workers.py @ISEEKYAN @vermouth1992
/tests/single_controller @zw0610 @wuxibin89
/tests/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
/tests/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq

65
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@ -0,0 +1,65 @@
# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/bug-report.yml?plain=1
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve verl
labels: [ "bug" ]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this bug report! 🤗
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your system info with us. You can run the command `python scripts/diagnose.py` and copy-paste its output below.
placeholder: verl version, platform, python version, ...
validations:
required: true
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"
- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The tasks I am working on are:"
options:
- label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
- label: "My own task or dataset (give details below)"
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
Please include relevant config information with your code.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder: |
Steps to reproduce the behavior:
1.
2.
3.
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."

2
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,2 @@
blank_issues_enabled: true
version: 0.1

View File

@ -0,0 +1,32 @@
# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/feature-request.yml?plain=1
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new verl feature
labels: [ "Feature request" ]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)

40
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,40 @@
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s) if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
- [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

View File

@ -0,0 +1,147 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_eval_aime24
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!*.md"
- "!docker/**"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
- "!recipe/**"
- "recipe/r1"
- "!recipe/r1/README.md"
pull_request:
branches:
- main
paths:
- "**/*.py"
# Other entrypoints
- "!*.md"
- "!docker/**"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Home
- "recipe/r1"
- "!recipe/r1/README.md"
# Other recipes
- "!recipe/**"
# Entrypoints
- ".github/workflows/e2e_eval_aime24.yml"
- "tests/special_e2e/run_r1_distill_qwen_aime24_eval.sh"
- "verl/trainer/main_generation.py"
- "verl/trainer/config/generation.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_eval_aime24:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu,math]
pip3 install math-verify transformers==4.56.2
- name: Prepare aime24 dataset
run: |
ray stop --force
python3 recipe/r1/data_process.py --task aime2024
- name: Running generation and evaluation in AIME 2024
run: |
ray stop --force
bash tests/special_e2e/run_r1_distill_qwen_aime24_eval.sh
cleanup:
runs-on: ubuntu-latest
needs: [setup, e2e_eval_aime24]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,133 @@
name: e2e_ppo_trainer_deprecate
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- disabled_ci
pull_request:
branches:
- disabled_ci
paths:
- "**/*.py"
# Other entrypoints
- "!**/*.md"
- "!docker/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Docs
- "!docs/**"
# Recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Entrypoints
- ".github/workflows/e2e_ppo_trainer.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/ppo_trainer"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
pre_commit_for_ppo:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12"]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install the current repository
run: |
pip install -e .
- name: Set ruff --output-format=github
run: |
sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
git add .pre-commit-config.yaml
- uses: pre-commit/action@v3.0.1
with:
extra_args: "" # Overriding default "--all-files"
e2e_ppo_trainer_sglang_multiturn_with_tool:
runs-on: [L20x8]
needs: pre_commit_for_ppo
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
container:
image: verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test,gpu,sglang]
- name: Prepare gsm8k dataset with tool
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k_multiturn_w_tool.py --local_save_dir $HOME/data/gsm8k_verl_sgl_multi_turn_preprocessed
- name: Running GSM8K with tool E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt with sglang
run: |
ray stop --force
bash tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
- name: Running GSM8K with tool E2E training tests with FSDP2
run: |
ray stop --force
FSDP_STRATEGY=fsdp2 bash tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
e2e_ppo_trainer_sglang_vlm_multiturn_with_tool:
runs-on: [L20x8]
needs: pre_commit_for_ppo
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
container:
image: verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test,geo,gpu,sglang]
- name: Prepare geo3k dataset with tool
run: |
ray stop --force
python3 examples/data_preprocess/geo3k_multiturn_w_tool.py --local_dir $HOME/data/geo3k_verl_sgl_multi_turn_preprocessed
- name: Running GEO3K with tool E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt with sglang
run: |
ray stop --force
bash tests/special_e2e/run_geo3k_fsdp_sgl_multiturn_w_tool.sh
- name: Running GEO3K with tool E2E training tests with FSDP2
run: |
ray stop --force
FSDP_STRATEGY=fsdp2 bash tests/special_e2e/run_geo3k_fsdp_sgl_multiturn_w_tool.sh

View File

@ -0,0 +1,155 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_ppo_trainer_megatron_sglang_deprecate
on:
# Trigger the workflow on push or pull request,
# but only for the main branch.
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- disabled_ci
pull_request:
branches:
- disabled_ci
paths:
- "**/*.py"
# Other entrypoints
- "!docker/**"
# Docs
- "!**/*.md"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_ppo_trainer_megatron-qwen3:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) with validation and saving
run: |
ray stop --force
ENGINE=sglang ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler
run: |
ray stop --force
ENGINE=sglang LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Test Megatron checkpoints merging function (Qwen3 Actor and Critic)
run: |
exp_name="qwen3-0.6b-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: clean up
run: |
rm -rf checkpoints
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_ppo_trainer_megatron-deepseek,
e2e_ppo_trainer_megatron-qwen3,
e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,
e2e_ppo_trainer_megatron-qwen-override-transformer-config,
e2e_ppo_trainer_megatron-deepseek-override-transformer-config,
e2e_ppo_trainer_megatron-moe-expert-parallel,
e2e_ppo_trainer_megatron-qwen2_5vl-3b,
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,66 @@
name: e2e_prime_deprecate
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- disabled_ci
pull_request:
branches:
- disabled_ci
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/prime"
# Entrypoints
- ".github/workflows/e2e_prime.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_prime.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_prime:
runs-on: [L20x8]
timeout-minutes: 50 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
container:
image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu]
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running GSM8K E2E with prime alg
run: |
ray stop --force
bash tests/special_e2e/run_prime.sh

View File

@ -0,0 +1,119 @@
name: e2e_spin
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/spin"
# Entrypoints
- ".github/workflows/e2e_spin.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_spin.sh"
- "!examples"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/spin"
# Entrypoints
- ".github/workflows/e2e_spin.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_spin.sh"
- "!examples"
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_spin:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test,gpu,sglang]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running the E2E test with the spin algorithm
run: |
ray stop --force
bash tests/special_e2e/run_spin.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_spin
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,118 @@
name: e2e_sppo
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/sppo"
# Entrypoints
- ".github/workflows/e2e_sppo.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_sppo.sh"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/sppo"
# Entrypoints
- ".github/workflows/e2e_sppo.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_sppo.sh"
# Declare permissions just read content.
permissions:
contents: read
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
TRANSFORMERS_VERSION: "4.56.2"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_sppo:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test,gpu,sglang]
- name: Prepare MATH dataset
run: |
python3 examples/data_preprocess/math_dataset.py --local_dataset_path $HOME/models/hf_data/DigitalLearningGmbH/MATH-lighteval
- name: Running the E2E test with the SPPO algorithm
run: |
ray stop --force
bash tests/special_e2e/run_sppo.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_sppo
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

73
.github/workflows/README.md vendored Normal file
View File

@ -0,0 +1,73 @@
### Adding a New Workflow
When adding a new workflow for continuous integration (CI), you have two runner options: a fixed runner or a machine from the vemlp.
- **Fixed Runner**: To use a fixed runner, specify it in your workflow using the `runs-on` keyword, like `runs-on: [L20x8]`.
- **Vemlp Runner**: Opting for a Vemlp machine allows you to launch tasks elastically.
Here is a template to assist you. This template is designed for using Vemlp machines. Currently, for each workflow, you need to create a `setup` and a `cleanup` job. When using this template, the main parts you need to modify are the `IMAGE` environment variable and the specific `job steps`.
```yaml
name: Your Default Workflow
on:
push:
branches:
- main
- v0.*
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
- ".github/workflows/template.yml"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
permissions:
contents: read
env:
IMAGE: "your vemlp image" # e.g. "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.2"
DYNAMIC_RUNNER_URL: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner" # public veFaas api
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
task-id: ${{ steps.create-runner.outputs.task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"
image: "${{ env.DEFAULT_IMAGE }}"
your_job:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'default-runner' }}"]
steps:
xxxx # your jobs
cleanup:
runs-on: ubuntu-latest
needs: [setup, your_job]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"
task-id: "${{ needs.setup.outputs.task-id }}"
```
### Model and Dataset
To avoid CI relies on network, we pre-download dataset on a NFS on the CI machine. The path for models are \${HOME}/models and the path for dataset is \${HOME}/models/hf_data.

58
.github/workflows/check-pr-title.yml vendored Normal file
View File

@ -0,0 +1,58 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
on:
pull_request:
types: [opened, edited, synchronize]
jobs:
check-title:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Run PR title checker
run: python3 tests/special_sanity/check_pr_title.py
env:
PR_TITLE: ${{ github.event.pull_request.title }}
- name: Run PR description checker
run: python3 tests/special_sanity/check_pr_description.py
env:
PR_TITLE: ${{ github.event.pull_request.title }}
GITHUB_EVENT_PATH: ${{ github.event_path }}

View File

@ -0,0 +1,175 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: checkpoint_converter
# latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- ".github/workflows/checkpoint_converter.yml"
- ".github/workflows/e2e_ppo_trainer_megatron.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
checkpoint_converter:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test]
# - name: Download Model to Use
# run: |
# huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B
# huggingface-cli download deepseek-ai/deepseek-coder-1.3b-instruct --local-dir ${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct
# export HF_HUB_OFFLINE=1
- name: Running Huggingface to Megatron dist_ckpt converter (Qwen/Qwen2.5-0.5B)
run: |
ray stop --force
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/Qwen/Qwen2.5-0.5B --test
- name: Running Huggingface to Megatron dist_ckpt converter (deepseek-ai/deepseek-coder-1.3b-instruct)
run: |
ray stop --force
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct --output_path checkpoints/deepseek-ai/deepseek-coder-1.3b-instruct --test
- name: Clean up
run: |
rm -rf checkpoints
checkpoint_converter_large_moe_models:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 30 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
HF_ENDPOINT: "https://hf-mirror.com"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test]
# - name: Download Model to Use
# run: |
# huggingface-cli download Qwen/Qwen1.5-MoE-A2.7B-Chat --local-dir ${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat
# export HF_HUB_OFFLINE=1
- name: Running Huggingface to Megatron dist_ckpt CPU converter (Qwen/Qwen1.5-MoE-A2.7B-Chat)
run: |
ray stop --force
python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat --use_cpu_initialization
- name: Running distributed Huggingface to Megatron dist_ckpt CPU converter (Qwen/Qwen1.5-MoE-A2.7B-Chat)
run: |
ray stop --force
torchrun --nproc_per_node 8 --nnodes 1 scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat_dist --use_cpu_initialization
- name: clean up
run: |
rm -rf checkpoints
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
checkpoint_converter,
checkpoint_converter_large_moe_models
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,64 +0,0 @@
name: checkpoints
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/checkpoints.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/checkpoints.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_gsm8k_megatron:
runs-on: [self-hosted, l20-0]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test]
- name: Prepare gsm8k dataset
run: |
python3 examples/data_preprocess/gsm8k.py
- name: Running Checkpoint Integration Test (Qwen Megatron)
run: |
ray stop --force
export PYTHONPATH=$PYTHONPATH:/opt/nvidia/Megatron-LM
bash tests/checkpoint/run_qwen_megatron_ckpt.sh
- name: Running Checkpoint Integration Test (Deepseek Megatron)
run: |
ray stop --force
export PYTHONPATH=$PYTHONPATH:/opt/nvidia/Megatron-LM
bash tests/checkpoint/run_deepseek_megatron_ckpt.sh

89
.github/workflows/cpu_unit_tests.yml vendored Normal file
View File

@ -0,0 +1,89 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: cpu_unit_tests
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
- .github/workflows/cpu_unit_tests.yml
- "!recipe/**/*.py"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
cpu_unit_tests:
if: github.repository_owner == 'volcengine'
runs-on: [L20x8]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
container:
image: verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip install -e .[test,prime,geo]
pip install --upgrade "ray>=2.40.0" pillow
- name: Download datasets
run: |
huggingface-cli download verl-team/gsm8k-v0.4.1 --repo-type dataset --local-dir ~/verl-data/gsm8k
python3 examples/data_preprocess/geo3k.py
- name: Running CPU unit tests
run: |
echo '[pytest]' > pytest.ini
echo 'python_files = *_on_cpu.py' >> pytest.ini
pytest -s -x --asyncio-mode=auto tests/

View File

@ -1,61 +0,0 @@
name: dataset
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/dataset.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/dataset.yml
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
ray:
runs-on: [self-hosted, l20-1]
timeout-minutes: 10 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip install hf_transfer
pip install -e .[test]
pip install --upgrade "ray>=2.40.0"
pip install cupy-cuda12x
- name: Running dataset tests
run: |
[ ! -d "$HOME/verl-data" ] && git clone --depth 1 https://github.com/eric-haibin-lin/verl-data ~/verl-data
pytest -s -x tests/verl/utils/dataset/test_rl_dataset.py
pytest -s -x tests/verl/utils/dataset/test_sft_dataset.py
# pytest -s -x tests/verl/utils/dataset/test_rm_dataset.py
- name: Running ray test using cupy (move it to L20 when dockerfile ready)
run: |
cd tests/ray
pytest -s -x test_rvdz.py

100
.github/workflows/doc.yml vendored Normal file
View File

@ -0,0 +1,100 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: doc_test
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
- "docs/**"
- .github/workflows/doc.yml
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read # for checkout
pages: write # for deploy-pages
id-token: write # for deploy-pages
jobs:
doc_test:
runs-on: ubuntu-latest
timeout-minutes: 5 # Increase this timeout value as needed
strategy:
matrix:
python-version: ["3.10"]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install the current repository
run: |
pip install -e .[test] --no-deps
pip install -r docs/requirements-docs.txt
- name: Run doc make html
run: |
cd docs
make clean
make html SPHINXOPTS="--keep-going -w _build/sphinx.log"
if grep -q ": ERROR:" _build/sphinx.log; then
echo "🚨 Sphinx doc build contained ERRORs - see _build/sphinx.log"
exit 1
fi
if grep -q "WARNING: document isn't included in any toctree" _build/sphinx.log; then
echo "🚨 Sphinx doc build contained WARNING. Please include newly added docs in index.rst. See _build/sphinx.log for details"
exit 1
fi
if grep -q "WARNING: Inline emphasis" _build/sphinx.log; then
echo "🚨 Sphinx doc build contained WARNING. Please check inline emphasis is correct. See _build/sphinx.log for details"
exit 1
fi
if grep -q "WARNING: Definition list ends without a blank line" _build/sphinx.log; then
echo "🚨 Sphinx doc build contained WARNING. Please check if the indentation is correct. See _build/sphinx.log for details"
exit 1
fi

View File

@ -1,3 +1,35 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_ascend
on:
@ -6,34 +38,47 @@ on:
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_ascend.yml
- v0.*
pull_request:
branches:
- main
- v0.2.x
paths:
- ".github/workflows/e2e_ascend.yml"
- "**/*.py"
- .github/workflows/e2e_ascend.yml
- "docs/ascend_tutorial/**"
- "examples/**"
- "recipe/**"
- "tests/special_npu/**"
- "tests/special_sanity/**"
- "verl/**"
- "pyproject.toml"
- "requirements-npu.txt"
- "setup.py"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
permissions:
contents: read
jobs:
test:
if: github.repository_owner == 'volcengine'
name: verl Ascend test (self-host)
runs-on: [self-hosted, npu-0]
timeout-minutes: 5 # Increase this timeout value as needed
env:
HF_HUB_ENABLE_HF_TRANSFER: 1
timeout-minutes: 40 # Increase this timeout value as needed
container:
image: quay.io/ascend/cann:8.0.0-910b-ubuntu22.04-py3.10
image: crispig/verl_npu:cann8.1rc1-py3.10-torch2.5.1-vllm-ascend0.7.3.post1-mindspeed0121-250731
volumes:
- /usr/local/dcmi:/usr/local/dcmi
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
- /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/
- /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info
- /etc/ascend_install.info:/etc/ascend_install.info
- /data00/dataset:/github/home/dataset
- /data00/models:/github/home/models
# Use self-host cache speed up pip and model download
# - /home/action/actions-runner/_work/cache:/github/home/.cache/
options: >-
@ -41,8 +86,15 @@ jobs:
--device /dev/davinci_manager
--device /dev/devmm_svm
--device /dev/hisi_hdc
--network host
--privileged
--network "host"
--shm-size 16g
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- name: Check npu and CANN info
run: |
@ -50,6 +102,55 @@ jobs:
npu-smi info
- name: Checkout volcengine/verl repo
uses: actions/checkout@v4
- name: Run test
- name: Install the current repository
run: |
lscpu
pip3 install hf_transfer peft
pip3 install -r requirements-npu.txt
pip install -e .
- name: Install torchvision
run: |
pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu
- name: Uninstall Triton
run: |
pip uninstall -y triton
- name: Preprocess gsm8k dataset
run: |
python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/dataset/openai/gsm8k
- name: Preprocess geo3k dataset
run: |
python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/dataset/hiyouga/geometry3k
- name: Running gsm8k e2e qwen3 training tests with PPO on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen3_06b_ppo.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with peft sft on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_05b_sft_peft_sp2.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with GRPO on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_05b_grpo.sh
rm -rf $HOME/ckpts
- name: Running geo3k e2e training tests with GRPO on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_vl_3b_npu.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with DAPO on ASCEND NPU
run: |
ray stop --force
bash tests/special_npu/run_qwen2_5_05b_dapo.sh
rm -rf $HOME/ckpts
- name: Running gsm8k e2e training tests with GRPO MindSpeed on ASCEND NPU
run: |
ray stop --force
USE_DIST_CKPT=True bash tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh
rm -rf $HOME/dist_ckpt/qwen2_5_05b_grpo_mindspeed
rm -rf $HOME/ckpts
- name: Running NPU profiling unit tests
run: |
ray stop --force
pytest -s -x tests/utils/test_special_mstx_profile.py

145
.github/workflows/e2e_dapo.yml vendored Normal file
View File

@ -0,0 +1,145 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_dapo
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "verl/*.py"
# Other entrypoints
- "!examples/*trainer*"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Megatron
- "!verl/workers/**/megatron_*.py"
- "!recipe/**"
- "recipe/dapo"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/dapo"
# Entrypoints
- ".github/workflows/e2e_dapo.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_dapo.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_dapo:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running the E2E test with the DAPO algorithm
run: |
ray stop --force
bash tests/special_e2e/run_dapo.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_dapo
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,55 +0,0 @@
name: e2e_digit_completion
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_digit_completion.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/e2e_digit_completion.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_digit_completion:
runs-on: [self-hosted, l20-0]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test]
- name: Running digit completon e2e training tests on 8 L20 GPUs
run: |
ray stop --force
bash tests/e2e/run_ray_trainer.sh

View File

@ -1,47 +0,0 @@
name: e2e_digit_completion_fire
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
paths:
- "**/*.py"
- .github/workflows/e2e_digit_completion_fire.yml
pull_request:
branches:
- main
paths:
- "**/*.py"
- .github/workflows/e2e_digit_completion_fire.yml
- "tests/e2e/*.sh"
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_digit_completion:
runs-on: [self-hosted, l20-0]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test]
- name: Running digit completon e2e training tests on 8 L20 GPUs
run: |
ray stop --force
bash tests/e2e/run_ray_trainer_fire_sampling.sh

141
.github/workflows/e2e_genrm_remote.yml vendored Normal file
View File

@ -0,0 +1,141 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_genrm_remote
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
- "tests/**"
- "!recipe/**"
- "recipe/genrm_remote"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Home
- "recipe/genrm_remote"
- "!recipe/genrm_remote/README.md"
# Entrypoints
- ".github/workflows/e2e_genrm_remote.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_genrm_remote.sh"
- "tests/special_e2e/generation/run_gen_qwen05_server.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_genrm_remote:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running the E2E test with the Generative Reward Model
run: |
ray stop --force
bash tests/special_e2e/run_genrm_remote.sh
ray stop --force
bash tests/special_e2e/generation/run_gen_qwen05_server.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_genrm_remote
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,70 +0,0 @@
name: e2e_grpo
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_grpo.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/e2e_grpo.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_gsm8k_megatron:
runs-on: [self-hosted, l20-0]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test]
- name: Prepare gsm8k dataset
run: |
python3 examples/data_preprocess/gsm8k.py
- name: Running GRPO gsm8k e2e training tests with FSDP on 8 L20 GPUs (Deepseek)
run: |
ray stop --force
bash tests/e2e/run_deepseek_grpo.sh
- name: Running GRPO gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)
run: |
ray stop --force
bash tests/e2e/run_deepseek_grpo_megatron.sh
- name: Running GRPO gsm8k e2e training tests with FSDP on 8 L20 GPUs (Qwen)
run: |
ray stop --force
bash tests/e2e/run_qwen_grpo.sh
- name: Running GRPO gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
run: |
ray stop --force
bash tests/e2e/run_qwen_grpo_megatron.sh

View File

@ -1,97 +0,0 @@
name: e2e_gsm8k
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_gsm8k.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/e2e_gsm8k.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_gsm8k:
runs-on: [self-hosted, l20-1]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,gpu]
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm and save ckpt
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_function_rm.sh
- name: Running gsm8k e2e without rmpad using function rm and load ckpt from previous step
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh
rm -rf ~/ckpt/*
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm (GRPO)
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_function_rm_grpo.sh
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm (ReMax)
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_function_rm_remax.sh
- name: Running gsm8k e2e with rmpad using model rm
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_model_rm.sh
- name: Running gsm8k e2e without rmpad using model rm
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh
- name: Running gsm8k e2e with rmpad using model rm and ulysses sp=2
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_model_rm_ulysses.sh
- name: Running gsm8k e2e with rmpad using model rm and dynamic batch size
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_model_rm_seq_balance.sh
- name: Running gsm8k e2e with rmpad using model rm with Liger Kernel enabled
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using customized reward function
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_custom_function_rm.sh

View File

@ -1,63 +0,0 @@
name: e2e_gsm8k_megatron
# latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_gsm8k_megatron.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/e2e_gsm8k_megatron.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_gsm8k_megatron:
runs-on: [self-hosted, l20-0]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test]
- name: Prepare gsm8k dataset
run: |
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)
run: |
ray stop --force
bash tests/e2e/run_deepseek_megatron_parallelism.sh
- name: Running gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
run: |
ray stop --force
bash tests/e2e/run_qwen_megatron_parallelism.sh

View File

@ -1,54 +0,0 @@
name: e2e_gsm8k_prime
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_gsm8k_prime.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/e2e_gsm8k_prime.yml
- "tests/e2e/*.sh"
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_gsm8k:
runs-on: [self-hosted, l20-1]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,gpu]
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e with prime alg
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_prime.sh

View File

@ -1,59 +0,0 @@
name: e2e_lora
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_lora.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_lora.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_lora:
runs-on: [self-hosted, l20-1]
timeout-minutes: 5 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer peft
pip3 install -e .[test]
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e training tests with LoRA
run: |
ray stop --force
bash tests/sft/run_sft_qwen05_peft.sh 8 $HOME/ckpts/
rm -rf $HOME/ckpts/*

View File

@ -0,0 +1,178 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_one_step_off_policy
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
- "!**/*.md"
- "!**/*.sh"
# Other entrypoints
- "!examples/*trainer*"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
- "!recipe/**"
- "recipe/one_step_off_policy"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
- "!**/*.md"
- "!**/*.sh"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Other recipes
- "!recipe/**"
# Home
- "recipe/one_step_off_policy"
# Entrypoints
- ".github/workflows/e2e_one_step_off_policy.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/run_one_step_off_policy.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
TRANSFORMERS_VERSION: "4.56.2"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
# Test FSDP2 strategy
e2e_one_step_off_policy_fsdp2:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 10 # Increase timeout for async training
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
ACTOR_STRATEGY: "fsdp2"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu]
pip3 install transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running the E2E test with one_step_off_policy algorithm (FSDP2)
run: |
ray stop --force
bash tests/special_e2e/run_one_step_off_policy.sh
# Test Megatron strategy
e2e_one_step_off_policy_megatron:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 10 # Increase timeout for async training
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
ACTOR_STRATEGY: "megatron"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu]
pip3 install transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running the E2E test with one_step_off_policy algorithm (Megatron)
run: |
ray stop --force
bash tests/special_e2e/run_one_step_off_policy.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_one_step_off_policy_fsdp2,
e2e_one_step_off_policy_megatron
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

79
.github/workflows/e2e_ppo_trainer.yml vendored Normal file
View File

@ -0,0 +1,79 @@
name: e2e_ppo_trainer
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!**/*.md"
- "!docker/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Docs
- "!docs/**"
# Recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Entrypoints
- ".github/workflows/e2e_ppo_trainer.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/ppo_trainer"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
pre_commit_for_ppo:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12"]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install the current repository
run: |
pip install -e .
- name: Set ruff --output-format=github
run: |
sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
git add .pre-commit-config.yaml
- uses: pre-commit/action@v3.0.1
with:
extra_args: "" # Overriding default "--all-files"

View File

@ -0,0 +1,281 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_ppo_trainer_megatron_sglang
on:
# Trigger the workflow on push or pull request,
# but only for the main branch.
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!docker/**"
# Docs
- "!**/*.md"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- "verl/worksers/rollout/sglang_rollout/*"
- ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_ppo_trainer_megatron-deepseek:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
OPTIM_MEMORY_EFFICIENT=True ENGINE=sglang SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
export VLLM_USE_V1=1
ray start --head
ENGINE=sglang MODE=async RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)
run: |
exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: Profiling GRPO GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)
run: |
ray stop --force
PROFILE_ENABLE=True ENGINE=sglang ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
if [ -z "$( ls -A '/tmp/ray/session_latest/logs/nsight/' )" ]; then
echo "[ERROR] not found any profiling files"
exit 1
else
echo "[SUCCESS] profile success"
fi
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with tie-embedding Megatron (Qwen) with train tp > infer tp
run: |
ray stop --force
ENGINE=sglang VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=2 INFER_TP=1 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen) with train tp < infer tp
run: |
ray stop --force
ENGINE=sglang VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=1 INFER_TP=2 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-qwen-override-transformer-config:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
# - name: Download Model to Use
# run: |
# huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B
# export HF_HUB_OFFLINE=1
- name: Prepare dist_ckpt of Qwen2.5-0.5B, uneven layer distribution only supports dist_ckpt
run: |
python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/verl-test/qwen2.5-0.5b-megatron
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
run: |
ray stop --force
ENGINE=sglang SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=8 +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=4 actor_rollout_ref.actor.megatron.use_dist_checkpointing=true actor_rollout_ref.actor.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron actor_rollout_ref.ref.megatron.use_dist_checkpointing=true actor_rollout_ref.ref.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron critic.megatron.use_dist_checkpointing=true critic.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron reward_model.megatron.use_dist_checkpointing=true reward_model.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron
cp -r checkpoints checkpoints-dut
ENGINE=sglang SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Test Megatron checkpoints merging function (Qwen Actor and Critic)
run: |
exp_name="qwen2.5-0.5b-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-deepseek-override-transformer-config:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
ENGINE=sglang SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=2 COMMON_VPP=null bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_embedding_in_pipeline_split=true +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_loss_in_pipeline_split=true
- name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)
run: |
exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: clean up
run: |
rm -rf checkpoints
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_ppo_trainer_megatron-deepseek,
e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,
e2e_ppo_trainer_megatron-qwen-override-transformer-config,
e2e_ppo_trainer_megatron-deepseek-override-transformer-config,
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,275 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_ppo_trainer_megatron_sglang_2
on:
# Trigger the workflow on push or pull request,
# but only for the main branch.
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!docker/**"
# Docs
- "!**/*.md"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- "verl/worksers/rollout/sglang_rollout/*"
- ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_ppo_trainer_megatron-moe-expert-parallel:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
MEGATRON_CI_DISABLE_EXPANDABLE_SEGMENTS=1 \
ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \
PPO_MAX_TOKEN_LEN=512 FWD_MAX_TOKEN_LEN=512 \
MAX_PROMPT_LENGTH=256 MAX_RESPONSE_LENGTH=256 \
MODEL_ID=Qwen/Qwen1.5-MoE-A2.7B-Chat \
ENGINE=sglang COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=8 \
USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-qwen2_5vl-3b:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
- name: Prepare Geo3k dataset
run: |
python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
- name: Prepare dist_ckpt of Qwen2.5-VL-3B, only supports dist_ckpt
run: |
python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct --output_path checkpoints/verl-test/qwen2.5-vl-3b-megatron
- name: Running Geo3k E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
run: |
ray stop --force
ENGINE=sglang TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/geo3k/test.parquet MAX_PROMPT_LENGTH=1024 MAX_RESPONSE_LENGTH=2048 MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False SKIP_SAVE_HF_MODEL=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 COMMON_TP=2 USE_DIST_CKPT=true DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-vl-3b-megatron bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_sglang:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test,gpu,sglang]
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt
run: |
ray stop --force
ENGINE=sglang bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on sglang async
run: |
ray stop --force
TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh
e2e_ppo_trainer_sglang_vlm:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test,geo,gpu,sglang] --no-deps
# Geo3k
- name: Prepare GEO3K dataset
run: |
ray stop --force
python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
- name: Running GEO3K VLM E2E training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GEO3K VLM E2E with rmpad using torch fused kernel (Qwen2.5-VL)
run: |
ray stop --force
FUSED_KERNELS=True TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GEO3K VLM E2E with rmpad using triton fused kernel (Qwen2.5-VL)
run: |
ray stop --force
FUSED_KERNELS=True FUSED_KERNEL_BACKEND=triton \
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
bash tests/special_e2e/ppo_trainer/run_function_reward.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_ppo_trainer_megatron-moe-expert-parallel,
e2e_ppo_trainer_megatron-qwen2_5vl-3b,
e2e_ppo_trainer_sglang,
e2e_ppo_trainer_sglang_vlm
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,292 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_ppo_trainer_megatron_vllm
on:
# Trigger the workflow on push or pull request,
# but only for the main branch.
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!docker/**"
# Docs
- "!**/*.md"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- ".github/workflows/e2e_ppo_trainer_megatron_vllm.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
TRANSFORMERS_VERSION: "4.56.2"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_ppo_trainer_megatron-deepseek:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install math-verify transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use mbridge e2e to pre-load and save (Deepseek)
run: |
ray stop --force
ALL_OFFLOAD=True SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True USE_DIST_CKPT=False \
bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use mbridge e2e to pre-load and save (Deepseek)
run: |
ray stop --force
RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True USE_DIST_CKPT=False \
bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
export VLLM_USE_V1=1
ray start --head
MODE=async USE_FUSED_KERNELS=True MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 SAVE_FREQ=2 bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)
run: |
exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_2/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_2/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_2/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_2/critic/huggingface
- name: Test Megatron distributed checkpoints merging function (DeepSeek)
run: |
exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
torchrun --nproc_per_node 4 --nnodes 1 -m verl.model_merger merge --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_2/actor --target_dir checkpoints/verl-test/${exp_name}/global_step_2/actor/hf_model
- name: Running GRPO GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)
run: |
ray stop --force
ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-qwen3:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install math-verify transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) with validation and saving
run: |
ray stop --force
ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler
run: |
ray stop --force
LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Test Megatron checkpoints merging function (Qwen3 Actor and Critic)
run: |
exp_name="qwen3-0.6b-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install math-verify transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with tie-embedding Megatron (Qwen) with train tp > infer tp
run: |
ray stop --force
VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=2 INFER_TP=1 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen) with train tp < infer tp
run: |
ray stop --force
VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=1 INFER_TP=2 ALL_OFFLOAD=True MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-qwen-override-transformer-config:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install math-verify transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
# - name: Download Model to Use
# run: |
# huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B
# export HF_HUB_OFFLINE=1
- name: Prepare dist_ckpt of Qwen2.5-0.5B, uneven layer distribution only supports dist_ckpt
run: |
python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/verl-test/qwen2.5-0.5b-megatron
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
run: |
ray stop --force
SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 SKIP_SAVE_HF_MODEL=1 USE_DIST_CKPT=True DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-0.5b-megatron \
bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=8 +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=4
cp -r checkpoints checkpoints-dut
SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: Test Megatron checkpoints merging function (Qwen Actor and Critic)
run: |
exp_name="qwen2.5-0.5b-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: clean up
run: |
rm -rf checkpoints
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_ppo_trainer_megatron-deepseek,
e2e_ppo_trainer_megatron-qwen3,
e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,
e2e_ppo_trainer_megatron-qwen-override-transformer-config,
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,420 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_ppo_trainer_megatron_vllm_2
on:
# Trigger the workflow on push or pull request,
# but only for the main branch.
# For push, for now only anti-patterns are specified so it is more conservative
# and achieves higher coverage.
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!docker/**"
# Docs
- "!**/*.md"
- "!docs/**"
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Entrypoints
- ".github/workflows/e2e_ppo_trainer_megatron_vllm.yml"
- "examples/data_preprocess/gsm8k.py"
- "examples/data_preprocess/geo3k.py"
- "tests/special_e2e/run_ppo_trainer_megatron.sh"
- "verl/trainer/main_ppo.py"
- "verl/trainer/config/ppo_megatron_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
TRANSFORMERS_VERSION: "4.56.2"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_ppo_trainer_megatron-deepseek-override-transformer-config:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=2 COMMON_VPP=null bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_embedding_in_pipeline_split=true +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_loss_in_pipeline_split=true
- name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)
run: |
exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"
python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-moe-expert-parallel:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install mbridge
pip3 install transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
run: |
ray stop --force
ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \
PPO_MAX_TOKEN_LEN=512 FWD_MAX_TOKEN_LEN=512 \
MAX_PROMPT_LENGTH=256 MAX_RESPONSE_LENGTH=256 \
MODEL_ID=Qwen/Qwen1.5-MoE-A2.7B-Chat USE_MBRIDGE=True \
COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=8 \
USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_megatron-qwen2_5vl-3b:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install transformers==$TRANSFORMERS_VERSION
- name: Prepare Geo3k dataset
run: |
python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
- name: Prepare dist_ckpt of Qwen2.5-VL-3B, only supports dist_ckpt
run: |
python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct --output_path checkpoints/verl-test/qwen2.5-vl-3b-megatron
- name: Running Geo3k E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
run: |
ray stop --force
TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/geo3k/test.parquet \
MAX_PROMPT_LENGTH=1024 MAX_RESPONSE_LENGTH=2048 MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct ADV_ESTIMATOR=grpo \
USE_DYNAMIC_BSZ=False USE_FUSED_KERNELS=True SKIP_SAVE_HF_MODEL=1 \
COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 COMMON_TP=2 USE_DIST_CKPT=true \
DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-vl-3b-megatron bash tests/special_e2e/run_ppo_trainer_megatron.sh
- name: clean up
run: |
rm -rf checkpoints
e2e_ppo_trainer_vllm:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,vllm]
pip3 install transformers==$TRANSFORMERS_VERSION
- name: Prepare GSM8K dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
# HF sanity
# - name: Running GSM8K E2E training tests on 1 L20 GPU with hf for sanity
# run: |
# ray stop --force
# bash tests/special_e2e/ppo_trainer/run_single_gpu.sh
# # HF sanity
# - name: Running GSM8K E2E training tests on 1 L20 GPU with engine interface for sanity.
# run: |
# ray stop --force
# bash tests/special_e2e/ppo_trainer/run_single_gpu_with_engine.sh
# Function RM
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP_SIZE=8)
run: |
ray stop --force
VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm after resuming
run: |
ray stop --force
RESUME_MODE=auto VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Test merging FSDP checkpoints (Qwen Actor)
run: |
exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp-size8"
python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (DDP_SIZE=2, FSDP_SIZE=4)
run: |
ray stop --force
VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Test merging DDP+FSDP checkpoints (Qwen Actor)
run: |
exp_name="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4"
python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP2)
run: |
ray stop --force
VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8" STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Test merging FSDP2 checkpoints (Qwen Actor)
run: |
exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8"
python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
- name: Running GSM8K E2E without rmpad using function rm
run: |
ray stop --force
RM_PAD=False bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (GRPO)
run: |
ray stop --force
ADV_ESTIMATOR=grpo USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (ReMax)
run: |
ray stop --force
ADV_ESTIMATOR=remax USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using customized reward function
run: |
ray stop --force
CUSTOM_REWARD_FN=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with in-reward kl and kl loss
run: |
ray stop --force
USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
# LoRA tests
- name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm
run: |
ray stop --force
ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon
run: |
ray stop --force
ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True TOTAL_TRAIN_STEPS=1 SAVE_FREQ=1 FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Test GRPO LoRA checkpoints merging function
run: |
export EXP_NAME="qwen2.5-0.5b-function-reward-minimal"
ls checkpoints/verl-test/${EXP_NAME}/global_step_1/actor
cat checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface/config.json
python3 -m verl.model_merger merge --backend fsdp --local_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/ --target_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface
- name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon with fsdp2
run: |
ray stop --force
ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh
# Model RM
- name: Running GRPO GSM8K E2E training tests with FSDP on 8 L20 GPUs (DeepSeek)
run: |
ray stop --force
MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GSM8K E2E with rmpad using model rm
run: |
ray stop --force
bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E without rmpad using model rm
run: |
ray stop --force
RM_PAD=False bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm and ulysses sp=2
run: |
ray stop --force
SP_SIZE=2 bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm and dynamic batch size
run: |
ray stop --force
SEQ_BALANCE=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm with Liger Kernel enabled
run: |
ray stop --force
LIGER=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled
run: |
ray stop --force
FUSED_KERNELS=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled
run: |
ray stop --force
FUSED_KERNEL=True FUSED_KERNEL_BACKEND=triton bash tests/special_e2e/ppo_trainer/run_model_reward.sh
- name: Running GSM8K E2E training tests on vllm async
run: |
ray stop --force
export VLLM_USE_V1=1
ray start --head
TOTAL_TRAIN_STEPS=2 ENGINE=vllm ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh
e2e_ppo_trainer_vllm_vlm:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test,gpu,vllm,geo,trl]
pip3 install transformers==$TRANSFORMERS_VERSION
# Geo3k
- name: Prepare GEO3K dataset
run: |
python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
- name: Running GEO3K VLM GRPO E2E training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
SP_SIZE=2 \
bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GEO3K VLM PPO E2E training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
ADV_ESTIMATOR=gae RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
SP_SIZE=2 \
bash tests/special_e2e/ppo_trainer/run_function_reward.sh
- name: Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \
ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
SP_SIZE=2 \
LORA_RANK=32 LORA_EXCLUDE=".*visual.*" \
bash tests/special_e2e/ppo_trainer/run_function_reward.sh
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
e2e_ppo_trainer_megatron-deepseek-override-transformer-config,
e2e_ppo_trainer_megatron-moe-expert-parallel,
e2e_ppo_trainer_megatron-qwen2_5vl-3b,
e2e_ppo_trainer_vllm,
e2e_ppo_trainer_vllm_vlm
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,3 +1,34 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: e2e_sft
on:
@ -6,18 +37,28 @@ on:
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_sft.yml
- v0.*
pull_request:
branches:
- main
- v0.2.x
- v0.*
paths:
- "**/*.py"
- .github/workflows/e2e_sft.yml
- "tests/e2e/*.sh"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# Megatron
- "!verl/workers/**/megatron_*.py"
# Entrypoints
- ".github/workflows/e2e_sft.yml"
- "examples/data_preprocess/gsm8k.py"
- "tests/special_e2e/sft"
- "verl/trainer/fsdp_sft_trainer.py"
- "verl/trainer/config/sft_trainer.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
@ -25,47 +66,96 @@ concurrency:
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
e2e_sft:
runs-on: [self-hosted, l20-1]
timeout-minutes: 5 # Increase this timeout value as needed
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 30 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,gpu]
pip3 install peft
pip3 install --no-deps -e .[test,gpu]
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
bash tests/sft/run_sft.sh
- name: Running gsm8k e2e training tests on 8 L20 GPUs with sequence parallism
bash tests/special_e2e/sft/run_sft.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs w/o rmpad using function rm
run: |
ray stop --force
bash examples/sft/gsm8k/run_qwen_05_sp2.sh 8 $HOME/ckpts/
RM_PAD=False bash tests/special_e2e/sft/run_sft.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism
run: |
ray stop --force
SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
- name: Check loss difference between sequence parallel vs. default implementation
run: |
ray stop --force
bash tests/sft/run_sft_sp_loss_match.sh
- name: Running gsm8k e2e training tests on 8 L20 GPUs with sequence parallism and liger
ENTRYPOINT="tests/special_e2e/sft/test_sp_loss_match.py" SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
- name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism and liger
run: |
ray stop --force
bash tests/sft/run_sft_qwen05_sp2_liger.sh 8 $HOME/ckpts/
rm -rf $HOME/ckpts/
SP_SIZE=2 LIGER=True bash tests/special_e2e/sft/run_sft.sh
- name: Running GSM8K E2E training tests with LoRA
run: |
ray stop --force
LORA_RANK=32 bash tests/special_e2e/sft/run_sft.sh
- name: Run GSM8K E2E training and resume tests resuming from the checkpoint manager
run: |
ray stop --force
LORA_RANK=32 RESUME_MODE=auto TOTAL_TRAIN_STEP=2 bash tests/special_e2e/sft/run_sft.sh
# TODO: multiturn
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Running GSM8K E2E training tests with multiturn and various configs and compare results
run: |
bash tests/special_e2e/sft/test_sft_engine_all.sh
cleanup:
runs-on: ubuntu-latest
needs: [setup, e2e_sft]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,60 +0,0 @@
name: e2e_sglang_gsm8k
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_sglang_gsm8k.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/e2e_sglang_gsm8k.yml
- "tests/e2e/*.sh"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_sglang_gsm8k:
runs-on: [self-hosted, l20-1]
timeout-minutes: 40 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: ocss884/verl-sglang:ngc-th2.5.1-cu126-sglang0.4.3.post3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,gpu,sglang] --no-deps
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm and save ckpt
run: |
ray stop --force
bash tests/e2e/run_qwen_gsm8k_function_rm.sh sglang

View File

@ -1,54 +0,0 @@
name: e2e_vlm_geo3k
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_vlm_geo3k.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/e2e_vlm_geo3k.yml
- "tests/e2e/*.sh"
# Declare permissions just read content.
permissions:
contents: read
jobs:
e2e_vlm_geo3k:
runs-on: [self-hosted, l20-1]
timeout-minutes: 10 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2
options: --gpus all --shm-size=40g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,geo,vllm]
python -c "import transformers; print(transformers.__version__)"
- name: Prepare geo3k dataset
run: |
ray stop --force
python3 examples/data_preprocess/geo3k.py
- name: Running geo3k vlm e2e training tests on 8 L20 GPUs with rmpad using function rm
run: |
ray stop --force
bash tests/e2e/run_qwen2vl_geo3k_function_rm.sh

113
.github/workflows/gpu_unit_tests.yml vendored Normal file
View File

@ -0,0 +1,113 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: GPU unit tests
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.4.x
paths:
- "**/*.py"
- .github/workflows/gpu_unit_tests.yml
pull_request:
branches:
- main
- v0.4.x
paths:
# The order that you define paths patterns matters:
# A matching negative pattern (prefixed with !) after a positive match will exclude the path.
# A matching positive pattern after a negative match will include the path again.
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
- "!recipe/**"
# Entrypoints
- .github/workflows/gpu_unit_tests.yml
- "tests/**test_*.py"
# Ignore CPU tests
- "!tests/*_on_cpu.py"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
gpu_unit_tests:
if: github.repository_owner == 'volcengine'
runs-on: [L20x8]
timeout-minutes: 60 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install --no-deps -e .[test]
pip3 install --upgrade "ray>=2.40.0"
pip3 install cupy-cuda12x
- name: Download Model to Use
run: |
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct
huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct
export HF_HUB_OFFLINE=1
# Disable requests to avoid network errors
- name: Run all GPU unit tests
run: |
pytest -s -x --ignore-glob="*test_special_*.py" --ignore-glob='*on_cpu.py' --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob='tests/special*' --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" tests/
- name: Testing LinearCrossEntropyTP Correctness, Computation Time and Memory Consumption
run: |
LOW_MEMORY=True torchrun --standalone --nnodes=1 --nproc-per-node=8 tests/utils/test_special_linear_cross_entropy_tp.py
- name: Testing FSDP2 actor functionality
run: |
torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/actor/test_special_dp_actor.py
- name: Testing FSDP2 critic functionality
run: |
torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/critic/test_special_dp_critic.py

View File

@ -1,4 +1,36 @@
name: model_rmpad
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
# name: Check PR Title
name: model
on:
# Trigger the workflow on push or pull request,
@ -6,76 +38,194 @@ on:
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/model.yml
- v0.*
pull_request:
branches:
- main
- v0.2.x
- v0.*
paths:
- "**/*.py"
- .github/workflows/model.yml
- "verl/**/*.py"
# Entrypoints
- ".github/workflows/model.yml"
- "tests/special_distributed/test_fsdp_ckpt.py"
- "tests/special_distributed/test_mcore_config_converter.py"
- "tests/special_distributed/test_tensor_dict.py"
- "tests/models/**"
- "tests/special_distributed/run_all.sh"
# Declare permissions just read content.
permissions:
permissions:
contents: read
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
model_rmpad:
runs-on: [self-hosted, l20-1]
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository and upgrade to latest transformers/flash_attn
fetch-depth: 0
- name: Install the current repository and upgrade to latest transformers(4.54.0)/flash_attn, transformers 4.55.0 has strange behavior with model backward
run: |
pip3 install -e .[test]
pip3 install --no-deps -e .[test]
pip3 install --upgrade transformers
- name: Running rmpad model tests on 8 L20 GPUs + flash_attn 2.5.8
run: |
pytest -s tests/model/test_transformer.py
pytest -s tests/models/test_transformer.py
- name: Running rmpad model tests on 8 L20 GPUs + latest flash_attn
run: |
pip3 install --upgrade flash_attn --no-build-isolation
pytest -s tests/model/test_transformer.py
pytest -s tests/models/test_transformer.py
- name: Running FSDP rmpad model tests on 8 L20 GPUs + latest flash_attn
run: |
pip3 install hf_transfer
torchrun --nproc_per_node=8 tests/checkpoint/test_fsdp_ckpt.py
STRATEGY=fsdp torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py
- name: Running transformers ulysses tests on 8 L20 GPUs + latest transformers
run: |
torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.49.0
torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.54.1
run: |
pip3 install transformers==4.49.0
torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.48.0
pip3 install transformers==4.54.1
torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.53.2
run: |
pip3 install transformers==4.48.0
torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.47.0
pip3 install transformers==4.53.2
torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.52.0
run: |
pip3 install transformers==4.47.0
torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.46.0
run: |
pip3 install transformers==4.46.0
torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py
- name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.45.0
run: |
pip3 install transformers==4.45.0
torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py
pip3 install transformers==4.52.0
torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
- name: Run distributed test
run: |
bash tests/distributed/run_all.sh
bash tests/special_distributed/run_all.sh
# TODO: Move this back to model_rmpad once FSDP2 is stable.
# NOTE: List as an independent job to make rerun easier.
model_rmpad_fsdp2_unstable:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository and upgrade to latest transformers/flash_attn
run: |
pip3 install --no-deps -e .[test]
pip3 install --upgrade transformers
- name: Running FSDP2 rmpad model tests on 8 L20 GPUs + latest flash_attn
run: |
STRATEGY=fsdp2 torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py
mcore_config_converter:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip install --upgrade "huggingface_hub[cli]"
# - name: Download model config files
# run: |
# hf download Qwen/Qwen2.5-7B config.json --local-dir $HOME/configs/Qwen/Qwen2.5-7B
# hf download Qwen/Qwen3-8B config.json --local-dir $HOME/configs/Qwen/Qwen3-8B
# hf download deepseek-ai/deepseek-coder-1.3b-instruct config.json --local-dir $HOME/configs/deepseek-ai/deepseek-coder-1.3b-instruct
# hf download Qwen/Qwen2-57B-A14B config.json --local-dir $HOME/configs/Qwen/Qwen2-57B-A14B
# hf download Qwen/Qwen3-30B-A3B config.json --local-dir $HOME/configs/Qwen/Qwen3-30B-A3B
# hf download deepseek-ai/DeepSeek-V3-Base config.json --local-dir $HOME/configs/deepseek-ai/DeepSeek-V3-Base
- name: Running mcore config converter tests on 8 L20 GPUs
run: |
torchrun --nproc_per_node=8 tests/special_distributed/test_mcore_config_converter.py
model_engine:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install --no-deps -e .[test]
pip3 install --upgrade tensordict transformers
pip install --upgrade "huggingface_hub[cli]"
- name: Download model config files
run: |
hf download Qwen/Qwen2.5-0.5B-Instruct --local-dir $HOME/models/Qwen/Qwen2.5-0.5B-Instruct
- name: Running mcore engine tests on 8 L20 GPUs
run: |
ray stop --force
pytest -s -x tests/models/test_engine.py
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
model_rmpad,
model_rmpad_fsdp2_unstable,
mcore_config_converter,
model_engine
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

40
.github/workflows/pre-commit.yml vendored Normal file
View File

@ -0,0 +1,40 @@
# c.f. https://github.com/pre-commit/action?tab=readme-ov-file#using-this-action
name: pre-commit
# No need to avoid / cancel lightweight pre-commit jobs
on:
schedule:
- cron: "0 0 * * 0"
pull_request:
push:
branches:
- main
- v0.*
# Allow manual triggering
workflow_dispatch:
# Declare permissions just read content.
permissions:
contents: read
jobs:
pre-commit:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12"]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install the current repository
run: |
pip install -e .
- name: Set ruff --output-format=github
run: |
sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
git add .pre-commit-config.yaml
# Check "--all-files" by default
- uses: pre-commit/action@v3.0.1

View File

@ -1,54 +0,0 @@
name: ray
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/ray_test.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/ray_test.yml
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
ray:
runs-on: [self-hosted, l20-0]
timeout-minutes: 5 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip install hf_transfer
pip install -e .[test]
pip install --upgrade "ray>=2.40.0"
- name: Running ray tests that need 8 GPUs
run: |
cd tests/ray
pytest -s -x --ignore=test_check_worker_alive.py --ignore=test_rvdz.py .

131
.github/workflows/reward_model.yml vendored Normal file
View File

@ -0,0 +1,131 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
# name: Check PR Title
name: reward_model
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
pull_request:
branches:
- main
- v0.*
paths:
- "verl/**/*.py"
# Entrypoints
- ".github/workflows/reward_model.yml"
- "tests/workers/reward_model/**"
# Declare permissions just read content.
permissions:
contents: read
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
TRANSFORMERS_VERSION: "4.56.2"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
reward_model:
needs: setup
runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]
timeout-minutes: 20 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK: "True"
NCCL_SHM_DISABLE: "1"
NCCL_P2P_DISABLE: "1"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install -e .[test]
# - name: Download model config files
# run: |
# hf download Skywork/Skywork-Reward-V2-Llama-3.2-1B --local-dir $HOME/models/Skywork/Skywork-Reward-V2-Llama-3.2-1B
# hf download verl-team/GenRM-CI-Test-1.5B --local-dir $HOME/models/verl-team/GenRM-CI-Test-1.5B
- name: Running discriminative reward model tests on 8 L20 GPUs
run: |
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
pytest -s -x tests/workers/reward_model/test_discriminative_reward_model.py
- name: Running generative reward model tests on 8 L20 GPUs
run: |
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
pytest -s -x tests/workers/reward_model/test_generative_reward_model.py
cleanup:
runs-on: ubuntu-latest
needs:
[
setup,
reward_model
]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,54 +0,0 @@
name: sandbox
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/sandbox.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/sandbox.yml
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
sandbox:
runs-on: [self-hosted, l20-0]
timeout-minutes: 3 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test,prime]
pip3 install vllm==0.5.4
- name: Running sandbox tests on 8 L20 GPUs
run: |
cd tests/sandbox
pytest -s -x .

View File

@ -1,3 +1,35 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
# name: Check PR Title
name: sanity
on:
@ -6,17 +38,15 @@ on:
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/sanity.yml
- v0.*
pull_request:
branches:
- main
- v0.2.x
- v0.*
paths:
- "**/*.py"
- .github/workflows/sanity.yml
- "tests/special_sanity/**"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
@ -24,7 +54,7 @@ concurrency:
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
permissions:
contents: read
jobs:
@ -42,13 +72,38 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install the current repository
run: |
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip3 install -r requirements.txt
pip install -e .[test]
- name: Run sanity test
run: |
pytest -s -x tests/sanity
- name: Run utility test
run: |
pytest -s -x tests/utility
pytest -s -x tests/special_sanity
- name: Run license test
run: |
python3 tests/sanity/check_license.py --directory .
python3 tests/special_sanity/check_license.py --directories .
- name: Assert naming convention
run: |
if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ 'veRL' .; then
echo "Please use verl instead of veRL in the codebase"
exit 1
fi
- name: Assert SGLang naming convention
run: |
if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ -E 'Sglang|sgLang|sglAng|sglaNg|sglanG' .; then
echo "Please use SGLang or sglang as the formal name of SGLang rollout engine"
exit 1
fi
- name: Validate test folder structure
run: python3 tests/special_sanity/validate_structure.py
- name: Assert documentation requirement for functions
run: python3 tests/special_sanity/validate_imported_docs.py
- name: Assert device api usage in verl/recipe
run: python3 tests/special_sanity/check_device_api_usage.py --directory ./recipe
- name: Assert device api usage in verl/verl
run: python3 tests/special_sanity/check_device_api_usage.py --directory ./verl
- name: Assert documentation time info
run: python3 tests/special_sanity/check_docs_time_info.py
- name: Check docstrings for specified files
run: python3 tests/special_sanity/check_docstrings.py
- name: Check DataProto for specified folders
run: python3 tests/special_sanity/check_dataproto_usage.py -d ./verl/workers/engine

View File

@ -10,9 +10,11 @@ on:
# To guarantee Maintained check is occasionally updated. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
schedule:
- cron: '27 7 * * 1'
- cron: "27 7 * * 1"
push:
branches: [ "main" ]
branches:
- main
- v0.*
# Declare default permissions as read only.
permissions: read-all

View File

@ -2,6 +2,7 @@ on:
push:
branches:
- main
- v0.*
pull_request:
permissions:
@ -11,11 +12,11 @@ jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@7dc056a193116ba8d82154bf0549381c8fb8545c # v3.88.14
with:
extra_args: --results=verified,unknown
- name: Checkout code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@7dc056a193116ba8d82154bf0549381c8fb8545c # v3.88.14
with:
extra_args: --results=verified,unknown

178
.github/workflows/sgl.yml vendored Normal file
View File

@ -0,0 +1,178 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: sgl
on:
# workflow_dispatch: # Manual
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.*
paths:
- "**/*.py"
- .github/workflows/sgl.yml
pull_request:
branches:
- main
- v0.*
paths:
- "**/*.py"
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Megatron
- "!verl/workers/**/megatron_*.py"
# vLLM
- "!**/*vllm*"
# Recipes
- "!recipe/**"
# Entrypoints
- ".github/workflows/sgl.yml"
- "tests/rollout/*sglang*"
- "tests/rollout/async_rollout_utils.py"
- "tests/workers/rollout/*interaction*"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
sgl:
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 35 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: 1
SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK: "True"
NCCL_SHM_DISABLE: "1"
NCCL_P2P_DISABLE: "1"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer fastmcp
pip3 install -e .[test]
# - name: Download Model to Use
# run: |
# huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B
# huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-1.5B-Instruct
# huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct
# export HF_HUB_OFFLINE=1
- name: Prepare gsm8k dataset
run: |
ray stop --force
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Test the latest SGLang Rollout async with agent loop
run: |
ROLLOUT_NAME=sglang pytest -svvv tests/experimental/agent_loop
# huggingface-cli download verl-team/gsm8k-v0.4.1 --repo-type dataset --local-dir ~/verl-data/gsm8k
- name: Test the latest SGLang
run: |
cd tests/workers/rollout
torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_spmd.py
- name: Test the latest SGLang Rollout async with interaction
run: |
cd tests/workers/rollout
torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_async_rollout_w_interaction.py
- name: Test the latest SGLang Multi Interaction
run: |
cd tests/workers/rollout
torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_multi_interaction.py
- name: Test the latest SGLang Rollout async with tool
run: |
cd tests/workers/rollout
torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_async_rollout_w_tools.py
- name: Test the latest SGLang Rollout async with sandbox fusion tool
run: |
cd tests/workers/rollout
pytest -s test_sglang_async_rollout_sf_tools.py
- name: Test the latest SGLang Rollout async with search tool
run: |
cd tests/workers/rollout
pytest -s test_sglang_async_rollout_search_tools.py
- name: Test the latest SGLang Rollout async with mcp search tool
run: |
cd tests/workers/rollout
pytest -s test_sglang_async_rollout_mcp_tools.py
# Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests
- name: Test the latest SGLang Rollout async with multimodal delta
run: |
cd tests/workers/rollout
pytest -s test_sglang_async_rollout_multimodal_delta.py
cleanup:
runs-on: ubuntu-latest
needs: [setup, sgl]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -0,0 +1,31 @@
name: Type Annotation and Docstring Coverage
on:
pull_request:
paths:
- '**/*.py'
- '.github/workflows/type-coverage-check.yml'
jobs:
type-coverage-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # 🚨 Important: fetch full history so `origin/main` is available
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip3 install -r requirements.txt
pip3 install -e . --no-deps
- name: Run type annotation coverage check
run: |
python3 tests/special_sanity/type_coverage_check.py
- name: Run docstring coverage check
run: |
python3 tests/special_sanity/check_api_docs.py verl

View File

@ -1,3 +1,34 @@
# # Tests layout
# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...
# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments
# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.
# # Workflow layout
# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
# - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
# - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
# - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
# - new workflow yaml is added to `.github/workflows`
# - new tests are added to workflow mentioned in 2.
name: vllm
on:
@ -6,18 +37,32 @@ on:
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/vllm.yml
- v0.*
pull_request:
branches:
- main
- v0.2.x
- v0.*
paths:
- "**/*.py"
- "verl/trainer/config/*.yaml"
- .github/workflows/vllm.yml
# Other entrypoints
- "!examples/**"
- "!tests/**"
- "!verl/trainer/main_*.py"
- "!verl/trainer/fsdp_sft_trainer.py"
# Recipes
- "!recipe/**"
# FSDP
- "!verl/workers/**/*dp_*.py"
# Megatron
- "!verl/workers/**/megatron_*.py"
# SGLang
- "!**/*sglang*"
# Entrypoints
- ".github/workflows/vllm.yml"
- "tests/special_e2e/generation"
- "tests/workers/rollout"
- "verl/trainer/main_generation.py"
- "verl/trainer/config/generation.yaml"
# Cancel jobs on the same ref if a new one is triggered
concurrency:
@ -25,46 +70,76 @@ concurrency:
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
permissions:
contents: read
env:
IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"
DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"
jobs:
setup:
if: github.repository_owner == 'volcengine'
runs-on: ubuntu-latest
outputs:
runner-label: ${{ steps.create-runner.outputs.runner-label }}
mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
steps:
- uses: actions/checkout@v4
- id: create-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "create"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-image: "${{ env.IMAGE }}"
vllm:
runs-on: [self-hosted, l20-0]
timeout-minutes: 20 # Increase this timeout value as needed
needs: setup
runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
timeout-minutes: 35 # Increase this timeout value as needed
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
NO_PROXY: "localhost,127.0.0.1"
HF_HUB_ENABLE_HF_TRANSFER: 1
container:
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
options: --gpus all --shm-size=10g
NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
HF_ENDPOINT: "https://hf-mirror.com"
HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
fetch-depth: 0
- name: Install the current repository
run: |
pip3 install hf_transfer
pip3 install -e .[test]
pip3 install vllm==0.5.4
- name: Running vllm tests on 8 L20 GPUs
# - name: Download Model to Use
# run: |
# huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct
# huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-1.5B-Instruct
# huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct
# huggingface-cli download OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN --local-dir ${HOME}/models/OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN
# export HF_HUB_OFFLINE=1
- name: Prepare gsm8k dataset
run: |
cd tests/rollout
torchrun --standalone --nnodes=1 --nproc_per_node=8 $(which pytest) -s test_vllm_hf_loader.py
ray stop --force
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
- name: Test the latest vLLM Rollout async with agent loop
run: |
ROLLOUT_NAME=vllm pytest -svvv tests/experimental/agent_loop
- name: Test the latest vLLM
run: |
pip3 install --upgrade vllm==0.7.3
cd tests/rollout
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s test_vllm_spmd.py
- name: Run Qwen 0.5B generation test
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s tests/workers/rollout/rollout_vllm/test_vllm_spmd.py
- name: Test the latest vLLM on model with rope scaling
run: |
cd tests/generation
bash ./run_gen_qwen05.sh 4 $HOME/data/gen/qwen_05_gen_test.parquet 2
rm -rf $HOME/data/gen/qwen_05_gen_test.parquet
- name: Run Qwen 0.5B generation test when world_size == 1
run: |
cd tests/generation
bash ./run_gen_qwen05.sh 1 $HOME/data/gen/qwen_05_gen_test.parquet 1
rm -rf $HOME/data/gen/qwen_05_gen_test.parquet
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s tests/workers/rollout/rollout_vllm/test_vllm_model_rope_scaling.py
# Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests
cleanup:
runs-on: ubuntu-latest
needs: [setup, vllm]
if: always()
steps:
- id: destroy-runner
uses: volcengine/vemlp-github-runner@v1
with:
mode: "destroy"
faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

View File

@ -1,56 +0,0 @@
name: yapf
on:
# Trigger the workflow on push or pull request,
# but only for the main branch
push:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/yapf_format.yml
pull_request:
branches:
- main
- v0.2.x
paths:
- "**/*.py"
- .github/workflows/yapf_format.yml
# Cancel jobs on the same ref if a new one is triggered
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
# Declare permissions just read content.
permissions:
contents: read
jobs:
yapf:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12"]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
# - name: checkout
# run: |
# commits=${{ github.event.pull_request.commits }}
# if [[ -n "$commits" ]]; then
# # Prepare enough depth for diffs with main
# git fetch --depth="$(( commits + 1 ))"
# fi
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install --upgrade yapf
pip install toml==0.10.2
- name: Running yapf
run: |
yapf -r -vv -d --style=./.style.yapf verl tests examples

7
.gitignore vendored
View File

@ -33,6 +33,7 @@ lib64/
parts/
sdist/
var/
tmp/
*.egg-info/
.installed.cfg
*.egg
@ -57,6 +58,8 @@ nosetests.xml
coverage.xml
*,cover
.hypothesis/
pytest.ini
output.txt
# Translations
*.mo
@ -108,9 +111,6 @@ ENV/
# Mac
.DS_Store
# output logs
tests/e2e/toy_examples/deepspeed/synchronous/output.txt
# vim
*.swp
@ -125,3 +125,4 @@ tests/e2e/toy_examples/deepspeed/synchronous/output.txt
logs
log
outputs
.history

37
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,37 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: "v0.12.2"
hooks:
- id: ruff
args: ["--fix", "--show-fixes", "--output-format=full"]
exclude: ^.*\.(ipynb)$
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.17.0'
hooks:
- id: mypy
- repo: local
hooks:
- id: autogen-trainer-cfg
name: Generate and verify verl/trainer/config/_generated_*.yaml
entry: scripts/generate_trainer_config.sh
language: script
pass_filenames: false
- repo: local
hooks:
- id: check-docstrings
name: Check doc string coverage
entry: python3 tests/special_sanity/check_docstrings.py
language: python
pass_filenames: false
- repo: local
hooks:
- id: check-license
name: Check license
entry: python3 tests/special_sanity/check_license.py --directories examples recipe scripts tests verl setup.py
language: python
pass_filenames: false

View File

@ -1,5 +0,0 @@
[style]
based_on_style = google
column_limit = 120
indent_width = 4
split_arguments_when_comma_terminated: true

15
.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,15 @@
{
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.codeActionsOnSave": {
"source.organizeImports": "always",
}
},
"files.associations": {
"array": "cpp",
"string_view": "cpp",
"initializer_list": "cpp",
"utility": "cpp"
},
"iis.configDir": ""
}

89
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,89 @@
# Contributing to verl
Thank you for considering a contribution to verl! We welcome contributions of any kind - bug fixes, enhancements, documentation improvements, or even just feedback. Whether you're an experienced developer or this is your first open-source project, your help is invaluable.
Your support can take many forms:
- Report issues or unexpected behaviors.
- Suggest or implement new features.
- Improve or expand documentation.
- Review pull requests and assist other contributors.
- Spread the word: share verl in blog posts, social media, or give the repo a ⭐.
## Finding Issues to Contribute
Looking for ways to dive in? Check out these issues:
- [Good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)
- [Call for contribution](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22call%20for%20contribution%22)
Furthermore, you can learn the development plan and roadmap via [RFC](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3ARFC) and [Roadmap](https://github.com/volcengine/verl/issues?q=state%3Aopen%20label%3A%22roadmap%22).
## Developing
- **Python-only**: install verl via `pip install -e .[test,vllm]` or `pip install -e .[test,sglang]` and iterate quickly. For full dependency setup, check out the verl [installation doc](https://verl.readthedocs.io/en/latest/start/install.html).
## Code Linting and Formatting
We rely on pre-commit to keep our code consistent. To set it up:
```bash
pip install pre-commit
pre-commit install
# for staged changes
pre-commit run
# for all files in the repo
pre-commit run --all-files
# run a specific hook with pre-commit
# pre-commit run --all-files --show-diff-on-failure --color=always <hood-id>
pre-commit run --all-files --show-diff-on-failure --color=always ruff
pre-commit run --all-files --show-diff-on-failure --color=always autogen-trainer-cfg
```
## Testing
Our test suites run on GitHub Actions. Check these workflows for details:
- [GPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/gpu_unit_tests.yml)
- [CPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/cpu_unit_tests.yml)
- [vLLM tests](https://github.com/volcengine/verl/blob/main/.github/workflows/vllm.yml)
- [SGLang tests](https://github.com/volcengine/verl/blob/main/.github/workflows/sgl.yml)
### Adding CI tests
If possible, please add CI test(s) for your new feature:
1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).
2. Add related path patterns to the `paths` section if not already included.
3. Minimize the workload of the test script(s) (see existing scripts for examples).
## Building the Docs
```
# Ensure verl is on your PYTHONPATH, e.g.:
pip install -e .[test]
# Install documentation dependencies
pip install -r requirements-docs.txt
# Generate HTML docs
make clean
make html
# Preview locally
python -m http.server -d _build/html/
```
Open your browser at http://localhost:8000 to explore the docs.
## Pull Requests & Code Reviews
Thanks for submitting a PR! To streamline reviews:
- Follow our Pull Request Template for title format and checklist.
- Adhere to our pre-commit lint rules and ensure all checks pass.
- Update docs for any user-facing changes.
- Add or update tests in the CI workflows, or explain why tests aren't applicable.
## License
See the [LICENSE](https://github.com/volcengine/verl/blob/main/LICENSE) file for full details.
## Thank You
We appreciate your contributions to verl. Your efforts help make the project stronger and more user-friendly. Happy coding!

227
README.md
View File

@ -1,14 +1,25 @@
<h1 style="text-align: center;">verl: Volcano Engine Reinforcement Learning for LLM</h1>
<div align="center">
👋 Hi, everyone!
verl is a RL training library initiated by <b>ByteDance Seed team</b> and maintained by the verl community.
<br>
<br>
</div>
<div align="center">
<a href="https://deepwiki.com/volcengine/verl"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
[![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)](https://github.com/volcengine/verl/stargazers)
![GitHub forks](https://img.shields.io/github/forks/volcengine/verl)
[![Twitter](https://img.shields.io/twitter/follow/verl_project)](https://twitter.com/verl_project)
<a href="https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA"><img src="https://img.shields.io/badge/Slack-verl-blueviolet?logo=slack&amp"></a>
<a href="https://join.slack.com/t/verl-project/shared_invite/zt-3c6mc2khw-v0lo6NfDPuFP6OnkrZwfqw"><img src="https://img.shields.io/badge/Slack-verl-blueviolet?logo=slack&amp"></a>
<a href="https://arxiv.org/pdf/2409.19256"><img src="https://img.shields.io/static/v1?label=EuroSys&message=Paper&color=red"></a>
![GitHub contributors](https://img.shields.io/github/contributors/volcengine/verl)
[![Documentation](https://img.shields.io/badge/documentation-blue)](https://verl.readthedocs.io/en/latest/)
<a href="https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
</div>
![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)
<h1 style="text-align: center;">verl: Volcano Engine Reinforcement Learning for LLMs</h1>
verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).
@ -16,7 +27,7 @@ verl is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF
verl is flexible and easy to use with:
- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex Post-Training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.
- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex post-training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.
- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc
@ -24,7 +35,6 @@ verl is flexible and easy to use with:
- Ready integration with popular HuggingFace models
verl is fast with:
- **State-of-the-art throughput**: SOTA LLM training and inference engine integrations and SOTA RL throughput.
@ -34,86 +44,144 @@ verl is fast with:
</p>
## News
- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is [publicly available](https://github.com/volcengine/verl/tree/gm-tyx/puffin/main/recipe/dapo) now.
- [2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!
- [2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [LMSys Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid March.
- [2025/02] verl v0.2.0.post2 is released! See [release note](https://github.com/volcengine/verl/releases/) for details.
- [2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).
- [2025/08] verl is presented in the [PyTorch Expert Exchange Webinar](https://www.youtube.com/watch?v=Vd79NmmqY3Q&t=2s). [Slides](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl_talk_pytorch_2025_08.pdf) available.
- [2025/07] The [ReTool](https://arxiv.org/pdf/2504.11536) recipe is fully open sourced. [Blog](https://www.notion.so/verl-reTool-recipe-Using-multi-round-conversations-and-code-sandboxing-to-improve-the-math-of-large-23a8b5b7feba80b386b2e5b5e3c1cde0)
- [2025/07] The first verl meetup will be held at ICML Vancouver on July 16th! Please [join us](https://lu.ma/0ek2nyao) if you are at ICML! (onsite only)
- [2025/06] verl with Megatron backend enables large MoE models such as [DeepSeek-671B and Qwen3-235B](https://verl.readthedocs.io/en/latest/perf/dpsk.html).
- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is available in `recipe/dapo` now.
<details><summary> more... </summary>
<ul>
<li>[2025/04] [Seed-Thinking-v1.5](https://github.com/ByteDance-Seed/Seed-Thinking-v1.5/blob/main/seed-thinking-v1.5.pdf) tech report is released! Trained with verl, Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains.</li>
<li>[2025/07] verl keynote at [AWS AI Hours Singapore](https://pages.awscloud.com/aws-ai-hours-sg.html#agenda) on 7/8, verl & verl-agent project updates at [Agent for SWE meetup](https://lu.ma/e498qhsi) by LF AI & Data Singapore on 7/11.</li>
<li>[2025/06] verl team will provide latest project updates at [PyTorch Day China](https://www.lfasiallc.com/pytorch-day-china/) on June 7th. Meet our dev team in Beijing!</li>
<li> [2025/04] [VAPO](https://arxiv.org/pdf/2504.05118) (value-based augmented PPO) paper covers our latest RL method for reasoning models. Trained from Qwen-32B-base model, VAPO achieves 60.4 on AIME 2024, outperforming DAPO-32B.</li>
<li>[2025/05] [PF-PPO](https://arxiv.org/abs/2409.06957), accepted to ICML 2025, is now supported in verl! PF-PPO enhances policy learning efficiency and robustness by filtering potentially noisy reward signals and reusing high-quality experiences via a replay buffer.</li>
<li>[2025/04] We will give a tutorial about latest post-training techniques and programming guide for verl at [ICLR 2025 Expo](https://iclr.cc/virtual/2025/calendar?filter_events=Expo+Talk+Panel&filter_rooms=), [SCI-FM workshop](https://open-foundation-model.github.io/) and [LMSys afterparty](https://lu.ma/d23nyynm). Talk materials available [here](https://github.com/eric-haibin-lin/verl-community/tree/main/iclr25). </li>
<li>[2025/03] verl v0.3.0.post1 is released! See [release note](https://github.com/volcengine/verl/releases/) for details. It achieves [~1.4x speedup](https://tongyx361.github.io/blogs/posts/verl-intro/#/verl-flexible-and-efficient-rl-for-llms) compared to prev versions.</li>
<li>[2025/05] verl will be presented at [A2M Shanghai](https://a2m.msup.com.cn/home/?aid=4488&city=shanghai) on 5/16 - 5/17.</li>
<li>[2025/05] verl will be presented at [GOSIM x PyTorch Day 2025](https://paris2025.gosim.org/). See you in Paris! </li>
<li>[2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [SGLang-LMSYS Org Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid-March.</li>
<li>[2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!</li>
<li>[2025/02] verl v0.2.0.post2 is released!</li>
<li>[2025/02] We presented verl in the <a href="https://lu.ma/ji7atxux">Bytedance/NVIDIA/Anyscale Ray Meetup</a>. See you in San Jose!</li>
<li>[2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).</li>
<li>[2024/12] verl is presented at Ray Forward 2024. Slides available <a href="https://github.com/eric-haibin-lin/verl-community/blob/main/slides/Ray_Forward_2024_%E5%B7%AB%E9%94%A1%E6%96%8C.pdf">here</a></li>
<li>[2024/10] verl is presented at Ray Summit. <a href="https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37">Youtube video</a> available.</li>
<li>[2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024. <a href="https://github.com/eric-haibin-lin/verl-data/tree/neurips">Slides</a> and <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">video</a> available.</li>
<li>[2024/10] verl is presented at Ray Summit. <a href="https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37">Youtube video</a> available.</li>
<li>[2024/08] HybridFlow (verl) is accepted to EuroSys 2025.</li>
</ul>
</details>
## Key Features
- **FSDP** and **Megatron-LM** for training.
- **vLLM**, **SGLang**(experimental) and **HF Transformers** for rollout generation.
- Compatible with Hugging Face Transformers and Modelscope Hub: Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc
- **FSDP**, **FSDP2** and **Megatron-LM** for training.
- **vLLM**, **SGLang** and **HF Transformers** for rollout generation.
- Compatible with Hugging Face Transformers and Modelscope Hub: [Qwen-3](https://github.com/volcengine/verl/blob/main/examples/grpo_trainer/run_qwen3-8b.sh), Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc
- Supervised fine-tuning.
- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [Reinforce++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), etc.
- Support model-based reward and function-based reward (verifiable reward)
- Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh)
- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [GSPO](recipe/gspo/), [ReMax](examples/remax_trainer/), [REINFORCE++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), [DAPO](recipe/dapo/), [DrGRPO](recipe/drgrpo), [KL_Cov & Clip_Cov](recipe/entropy) etc.
- Support model-based reward and function-based reward (verifiable reward) for math, [coding](https://github.com/volcengine/verl/tree/main/recipe/dapo), etc
- Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh) with Qwen2.5-vl, Kimi-VL
- [Multi-turn with tool calling](https://github.com/volcengine/verl/tree/main/examples/sglang_multiturn)
- LLM alignment recipes such as [Self-play preference optimization (SPPO)](https://github.com/volcengine/verl/tree/main/recipe/sppo)
- Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh).
- Scales up to 70B models and hundreds of GPUs.
- Scales up to 671B models and hundreds of GPUs with [expert parallelism](https://github.com/volcengine/verl/pull/1467)
- Multi-gpu [LoRA RL](https://verl.readthedocs.io/en/latest/advance/ppo_lora.html) support to save memory.
- Experiment tracking with wandb, swanlab, mlflow and tensorboard.
## Upcoming Features
- DeepSeek 671b optimizations with Megatron v0.11
- Multi-turn rollout optimizations
## Upcoming Features and Changes
- Q3 Roadmap https://github.com/volcengine/verl/issues/2388
- DeepSeek 671b optimizations with Megatron https://github.com/volcengine/verl/issues/1033
- Multi-turn rollout and tools using optimizations https://github.com/volcengine/verl/issues/1882
- [Agent integration](https://github.com/volcengine/verl/tree/main/verl/experimental/agent_loop)
- Async and off-policy architecture https://github.com/volcengine/verl/pull/2231
- List of breaking changes since v0.4 https://github.com/volcengine/verl/discussions/2270
## Getting Started
<a href="https://verl.readthedocs.io/en/latest/index.html"><b>Documentation</b></a>
**Quickstart:**
- [Installation](https://verl.readthedocs.io/en/latest/start/install.html)
- [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html)
- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html)
- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html) & [Tech Talk](https://hcqnc.xetlk.com/sl/3vACOK) (in Chinese)
- [PPO in verl](https://verl.readthedocs.io/en/latest/algo/ppo.html)
- [GRPO in verl](https://verl.readthedocs.io/en/latest/algo/grpo.html)
**Running a PPO example step-by-step:**
- Data and Reward Preparation
- [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
- [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
- Understanding the PPO Example
- [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
- [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)
- [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html)
- [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
- [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
- [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
- [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)
**Reproducible algorithm baselines:**
- [PPO, GRPO, ReMax](https://verl.readthedocs.io/en/latest/experiment/ppo.html)
- [RL performance on coding, math](https://verl.readthedocs.io/en/latest/algo/baseline.html)
**For code explanation and advance usage (extension):**
- PPO Trainer and Workers
- [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html)
- [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
- [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html)
- Advance Usage and Extension
- [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
- [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
- Advanced Usage and Extension
- [Add Models with the FSDP Backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
- [Add Models with the Megatron-LM Backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)
- [Multi-turn Rollout Support](https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html)
- [Search Tool Integration](https://verl.readthedocs.io/en/latest/sglang_multiturn/search_tool_example.html)
- [Sandbox Fusion Integration](https://verl.readthedocs.io/en/latest/examples/sandbox_fusion_example.html)
- [Deployment using Separate GPU Resources](https://github.com/volcengine/verl/tree/main/examples/split_placement)
- [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
- [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
**Blogs from the community**
- [使用verl进行GRPO分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)
- [HybridFlow veRL 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)
- [最高提升20倍吞吐量豆包大模型团队发布全新 RLHF 框架,现已开源!](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)
- [When Reasoning Models Break Tokenization: The Hidden Complexity of Multiturn Training](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/fast_tokenization/multiturn_tokenization_and_masking.md)
- [verl deployment on AWS SageMaker](https://medium.com/@kaige.yang0110/run-verl-on-sagemaker-using-4x8-l40s-gpus-8e6d5c3c61d3)
- [verl x SGLang Multi-turn Code Walkthrough](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme_EN.md)
- [Optimizing SGLang Memory Usage in verl](https://hebiao064.github.io/rl-memory-management)
- [SGLang, verl, OpenBMB and Tsinghua University: Pioneering End-to-End Multi-Turn RLHF](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/verl-multiturn-rollout-Release.md)
- [Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration](https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html)
- [veMLP x verl :玩转强化学习训练](https://mp.weixin.qq.com/s/7nbqxk4knMGd-hQE9ls2tA)
- [使用 verl 进行 GRPO 分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)
- [HybridFlow verl 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)
- [最高提升 20 倍吞吐量!豆包大模型团队发布全新 RLHF 框架,现已开源!](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)
## Performance Tuning Guide
The performance is essential for on-policy RL algorithm. We have written a detailed [performance tuning guide](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) to help you optimize performance.
## Use vLLM v0.8
veRL now supports vLLM>=0.8.0 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for installation guide and more information.
## Upgrade to vLLM >= v0.8.2
verl now supports vLLM>=0.8.2 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for the installation guide and more information. Please avoid vllm 0.7.x, which contains bugs that may lead to OOMs and unexpected errors.
## Use Latest SGLang
SGLang is fully supported with verl, and SGLang RL Group is working extensively on building unique features, including multi-turn agentic RL, VLM RLHF, server-based RL, and partial rollout. Please refer to [this document](https://verl.readthedocs.io/en/latest/workers/sglang_worker.html) for the installation guide and more information.
## Upgrade to FSDP2
verl is fully embracing FSDP2! FSDP2 is recommended by torch distributed team, providing better throughput and memory usage, and is composible with other features (e.g. torch.compile). To enable FSDP2, simply use verl main and set the following options:
```
actor_rollout_ref.ref.strategy=fsdp2
actor_rollout_ref.actor.strategy=fsdp2
critic.strategy=fsdp2
reward_model.strategy=fsdp2
```
Furthermore, FSDP2 cpu offloading is compatible with gradient accumulation. You can turn it on to save memory with `actor_rollout_ref.actor.fsdp_config.offload_policy=True`. For more details, see https://github.com/volcengine/verl/pull/1026
## AMD Support (ROCm Kernel)
verl now supports FSDP as the training engine (Megatron support coming soon) and both integrates with vLLM and SGLang as inference engines. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst) for the installation guide and more information, and [this document](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_vllm_page.rst) for the vLLM performance tuning for ROCm.
## Citation and acknowledgement
If you find the project helpful, please cite:
- [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)
- [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf)
@ -126,41 +194,74 @@ If you find the project helpful, please cite:
}
```
verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, and many more.
verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and contributed by Bytedance, Anyscale, LMSys.org, [Alibaba Qwen team](https://github.com/QwenLM/), Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, ke.com, [All Hands AI](https://www.all-hands.dev/), [ModelBest](http://modelbest.cn/), JD AI Lab, Microsoft Research, [StepFun](https://www.stepfun.com/), Amazon, LinkedIn, Meituan, [Camel-AI](https://www.camel-ai.org/), [OpenManus](https://github.com/OpenManus), Xiaomi, NVIDIA research, [Baichuan](https://www.baichuan-ai.com/home), [RedNote](https://www.xiaohongshu.com/), [SwissAI](https://www.swiss-ai.org/), [Moonshot AI (Kimi)](https://www.moonshot-ai.com/), Baidu, Snowflake, Skywork.ai, JetBrains, [IceSword Lab](https://www.iceswordlab.com), and many more.
## Awesome work using verl
- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks ![GitHub Repo stars](https://img.shields.io/github/stars/Jiayi-Pan/TinyZero)
- [DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B ![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)
- [SkyThought](https://github.com/NovaSky-AI/SkyThought): RL training for Sky-T1-7B by NovaSky AI team. ![GitHub Repo stars](https://img.shields.io/github/stars/NovaSky-AI/SkyThought)
- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason): SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild ![GitHub Repo stars](https://img.shields.io/github/stars/hkust-nlp/simpleRL-reason)
- [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework ![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1)
- [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL): LLM Agents RL tunning framework for multiple agent environments. ![GitHub Repo stars](https://img.shields.io/github/stars/OpenManus/OpenManus-RL)
- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/deepscaler)
- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME)
- [rllm](https://github.com/agentica-project/rllm): async RL training with [verl-pipeline](https://github.com/agentica-project/verl-pipeline) ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/rllm)
- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework ![GitHub Repo stars](https://img.shields.io/github/stars/ZihanWang314/ragen)
- [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. ![GitHub Repo stars](https://img.shields.io/github/stars/Unakar/Logic-RL)
- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/PeterGriffinJin/Search-R1)
- [ReSearch](https://github.com/Agent-RL/ReSearch): Learning to **Re**ason with **Search** for LLMs via Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Agent-RL/ReSearch)
- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): Hacking **Real Search Engines** and **retrievers** with LLMs via RL for **information retrieval** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)
- [cognitive-behaviors](https://github.com/kanishkg/cognitive-behaviors): Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs ![GitHub Repo stars](https://img.shields.io/github/stars/kanishkg/cognitive-behaviors)
- [MetaSpatial](https://github.com/PzySeere/MetaSpatial): Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse ![GitHub Repo stars](https://img.shields.io/github/stars/PzySeere/MetaSpatial)
- [DeepEnlighten](https://github.com/DolbyUUU/DeepEnlighten): Reproduce R1 with **social reasoning** tasks and analyze key findings ![GitHub Repo stars](https://img.shields.io/github/stars/DolbyUUU/DeepEnlighten)
- [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): Skywork open reaonser series ![GitHub Repo stars](https://img.shields.io/github/stars/SkyworkAI/Skywork-OR1)
- [ToRL](https://github.com/GAIR-NLP/ToRL): Scaling tool-integrated RL ![GitHub Repo stars](https://img.shields.io/github/stars/GAIR-NLP/ToRL)
- [Absolute Zero Reasoner](https://github.com/LeapLabTHU/Absolute-Zero-Reasoner): [A no human curated data self-play framework for reasoning](https://arxiv.org/abs/2505.03335) ![GitHub Repo stars](https://img.shields.io/github/stars/LeapLabTHU/Absolute-Zero-Reasoner)
- [verl-agent](https://github.com/langfengQ/verl-agent): A scalable training framework for **long-horizon LLM/VLM agents**, along with a new algorithm **GiGPO** ![GitHub Repo stars](https://img.shields.io/github/stars/langfengQ/verl-agent)
- [RL-Factory](https://github.com/Simple-Efficient/RL-Factory): An easy and efficient RL post-training framework for Agentic Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Simple-Efficient/RL-Factory)
- [ReTool](https://retool-rl.github.io/): ReTool: reinforcement learning for strategic tool use in LLMs. Code release is in progress...
- [verl-tool](https://github.com/TIGER-AI-Lab/verl-tool): An unified and easy-to-extend tool-agent training framework based on verl![GitHub Repo stars](https://img.shields.io/github/stars/TIGER-AI-Lab/verl-tool)
- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME)
- [MemAgent](https://github.com/BytedTsinghua-SIA/MemAgent): MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent ![GitHub Repo stars](https://img.shields.io/github/stars/BytedTsinghua-SIA/MemAgent)
- [POLARIS](https://github.com/ChenxinAn-fdu/POLARIS): A Post-training recipe for scaling RL on Advanced Reasoning models ![GitHub Repo stars](https://img.shields.io/github/stars/ChenxinAn-fdu/POLARIS)
- [GUI-R1](https://github.com/ritzz-ai/GUI-R1): **GUI-R1**: A Generalist R1-style Vision-Language Action Model For **GUI Agents** ![GitHub Repo stars](https://img.shields.io/github/stars/ritzz-ai/GUI-R1)
- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): RL Training of **Search Agent** with **Search/Retrieval Outcome** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)
- [Code-R1](https://github.com/ganler/code-r1): Reproducing R1 for **Code** with Reliable Rewards ![GitHub Repo stars](https://img.shields.io/github/stars/ganler/code-r1)
- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models** ![GitHub Repo stars](https://img.shields.io/github/stars/RLHFlow/Self-rewarding-reasoning-LLM)
- [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation ![GitHub Repo stars](https://img.shields.io/github/stars/HKUNLP/critic-rl)
- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization
- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models
- [DeepResearcher](https://github.com/GAIR-NLP/DeepResearcher): Scaling deep research via reinforcement learning in real-world environments ![GitHub Repo stars](https://img.shields.io/github/stars/GAIR-NLP/DeepResearcher)
- [VAGEN](https://github.com/RAGEN-AI/VAGEN): Training VLM agents with multi-turn reinforcement learning ![GitHub Repo stars](https://img.shields.io/github/stars/RAGEN-AI/VAGEN)
- [RM-R1](https://arxiv.org/abs/2505.02387): RL training of reasoning reward models ![GitHub Repo stars](https://img.shields.io/github/stars/RM-R1-UIUC/RM-R1)
- [LUFFY](https://arxiv.org/pdf/2504.14945): Learning to Reason under Off-Policy Guidance![GitHub Repo stars](https://img.shields.io/github/stars/ElliottYan/LUFFY)
- [DeepMath](https://github.com/zwhe99/DeepMath): DeepMath-103K data and series models for math reasoning![GitHub Repo stars](https://img.shields.io/github/stars/zwhe99/DeepMath)
- [PACS](https://github.com/ritzz-ai/PACS): Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR ![GitHub Repo stars](https://img.shields.io/github/stars/ritzz-ai/PACS)
- [Entropy Mechanism of RL](https://github.com/PRIME-RL/Entropy-Mechanism-of-RL): The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/Entropy-Mechanism-of-RL)
- [LLaSA-TTS-GRPO](https://github.com/channel-io/ch-tts-llasa-rl-grpo): TTS fine-tuning with GRPO optimization based on LLASA models ![GitHub Repo stars](https://img.shields.io/github/stars/channel-io/ch-tts-llasa-rl-grpo)
- [PF-PPO](https://arxiv.org/abs/2409.06957): Policy Filtration for PPO based on the reliability of reward signals for more efficient and robust RLHF.
- [RACRO](https://github.com/gyhdog99/RACRO2): Build multi-modal reasoning models via decoupling it into query-conditioned captioning and text-only reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/gyhdog99/RACRO2)
- [Agent Lightning](https://github.com/microsoft/agent-lightning): A flexible and extensible framework that enables seamless agent optimization for any existing agent framework. ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/agent-lightning)
- [VTool-R1](https://github.com/VTOOL-R1/vtool-r1): VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use. ![GitHub Repo stars](https://img.shields.io/github/stars/VTOOL-R1/vtool-r1)
- [Kimina-Prover-RL](https://github.com/project-numina/kimina-prover-rl/tree/main/recipe/kimina_prover_rl): Training pipeline for formal theorem proving, based on a paradigm inspired by DeepSeek-R1.
- [RL-PLUS](https://github.com/YihongDong/RL-PLUS): Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization.
- [rStar2-Agent](https://github.com/microsoft/rStar): Using reinforcement learning with multi-step tool-calling for math tasks, rStar2-Agent-14B reaches frontier-level math reasoning in just 510 RL training steps ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/rStar)
- [Vision-SR1](https://github.com/zli12321/Vision-SR1): Self-Rewarding Vision-Language Model via Reasoning Decomposition ![GitHub Repo stars](https://img.shields.io/github/stars/zli12321/Vision-SR1)
- [SimpleVLA-RL](https://github.com/PRIME-RL/SimpleVLA-RL): SimpleVLA-RL: A Simple yet Effective Vision-Language Action Model for Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/SimpleVLA-RL)
- [Table-R1](https://github.com/Table-R1/Table-R1): Table-R1: Inference-Time Scaling for Table Reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/Table-R1/Table-R1)
- [Revisual-R1](https://github.com/CSfufu/Revisual-R1): Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/CSfufu/Revisual-R1)
- [ARES](https://github.com/shawn0728/ARES): ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping ![GitHub Repo stars](https://img.shields.io/github/stars/shawn0728/ARES)
- [Meta-Bandit-LLM](https://github.com/sanxing-chen/meta-bandit-llm): Meta-Bandit-LLM: Long-horizon multiturn interactive training for meta-bandit agents ![GitHub Repo stars](https://img.shields.io/github/stars/sanxing-chen/meta-bandit-llm)
and many more awesome work listed in [recipe](recipe/README.md).
## Contribution Guide
Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354) to see where you can contribute.
### Code formatting
We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat your code locally, make sure you have installed the **latest** version of `yapf`
```bash
pip3 install yapf --upgrade
```
Then, make sure you are at top level of verl repo and run
```bash
bash scripts/format.sh
```
We are HIRING! Send us an [email](mailto:haibin.lin@bytedance.com) if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment.
See [contributions guide](CONTRIBUTING.md)
## About [ByteDance Seed Team](https://team.doubao.com/)
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society. You can get to know Bytedance Seed better through the following channels👇
<div>
<a href="https://team.doubao.com/">
<img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
<a href="https://github.com/user-attachments/assets/469535a8-42f2-4797-acdf-4f7a1d4a0c3e">
<img src="https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>
<a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">
<img src="https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>
<a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">
<img src="https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>
</div>
---
We are HIRING! Send us an [email](mailto:the.verl.project@gmail.com) if you are interested in internship/FTE opportunities in RL for agents.

57
docker/Apptainerfile.rocm Normal file
View File

@ -0,0 +1,57 @@
Bootstrap: docker
# Support - Traing: fsdp; Inference: vllm
# FROM: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
# Support - Traing: fsdp; Inference: vllm, sglang
FROM lmsysorg/sglang:v0.4.5-rocm630
%environment
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
export HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
export CFLAGS="-D__HIP_PLATFORM_AMD__"
export CXXFLAGS="-D__HIP_PLATFORM_AMD__"
%post
# Create source directory
mkdir -p /opt/src
# Uninstall and reinstall vllm
pip uninstall -y vllm
cd /opt/src
git clone -b v0.6.3 https://github.com/vllm-project/vllm.git
cd vllm
MAX_JOBS=$(nproc) python3 setup.py install
cd /opt
rm -rf /opt/src/vllm
# Install dependencies
pip install "tensordict<0.6" --no-deps
pip install accelerate \
codetiming \
datasets \
dill \
hydra-core \
liger-kernel \
numpy \
pandas \
peft \
"pyarrow>=15.0.0" \
pylatexenc \
"ray[data,train,tune,serve]" \
torchdata \
transformers \
wandb \
orjson \
pybind11
# Clone and install verl from GitHub
cd /opt
git clone https://github.com/volcengine/verl.git
cd verl
# Uncomment to use a specific version
# git checkout v0.3.0.post0
pip install -e . --no-deps
# Install torch_memory_saver
pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps

View File

@ -0,0 +1,55 @@
# Base Image support aws EFA
# Build Image with frameworks based on this
FROM verlai/verl:app-verl0.5-sglang0.4.6.post5-mcore0.12.2
# For aws instances with EFA net interface (Sagemaker AI Pod)
# install EFA driver:
######## AWS EFA ############
ENV NCCL_VERSION=2.25.1-1
ENV DEBIAN_FRONTEND=noninteractive
ENV EFA_INSTALLER_VERSION=1.40.0
ENV AWS_OFI_NCCL_VERSION=1.14.2
ENV FI_EFA_SET_CUDA_SYNC_MEMOPS=0
ENV FI_PROVIDER=efa
RUN apt update && apt install -y linux-image-generic libhwloc-dev
RUN cd /tmp && \
curl -O https://efa-installer.amazonaws.com/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz && \
tar -xf aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz && \
cd aws-efa-installer && \
./efa_installer.sh -y -g --skip-kmod --skip-limit-conf --no-verify && \
ldconfig && \
rm -rf /tmp/aws-efa-installer /var/lib/apt/lists/*
# NCCL EFA Plugin
RUN cd /tmp && \
curl -LO https://github.com/aws/aws-ofi-nccl/archive/refs/tags/v${AWS_OFI_NCCL_VERSION}.tar.gz && \
tar -xzf /tmp/v${AWS_OFI_NCCL_VERSION}.tar.gz && \
rm /tmp/v${AWS_OFI_NCCL_VERSION}.tar.gz && \
mv aws-ofi-nccl-${AWS_OFI_NCCL_VERSION} aws-ofi-nccl && \
cd /tmp/aws-ofi-nccl && \
./autogen.sh && \
./configure --prefix=/opt/amazon/efa \
--with-libfabric=/opt/amazon/efa \
--with-cuda=/usr/local/cuda \
--enable-platform-aws \
--with-mpi=/opt/amazon/openmpi && \
make -j$(nproc) install && \
rm -rf /tmp/aws-ofi/nccl
# NCCL
RUN echo "/usr/local/lib" >> /etc/ld.so.conf.d/local.conf && \
echo "/opt/amazon/openmpi/lib" >> /etc/ld.so.conf.d/efa.conf && \
ldconfig
ENV OMPI_MCA_pml=^cm,ucx \
OMPI_MCA_btl=tcp,self \
OMPI_MCA_btl_tcp_if_exclude=lo,docker0,veth_def_agent \
OPAL_PREFIX=/opt/amazon/openmpi \
NCCL_SOCKET_IFNAME=^docker,lo,veth_def_agent \
FI_EFA_USE_HUGE_PAGE=0
# docker build -t verl:awsefa --label "commit=$(git rev-parse --short HEAD)" .
# on aws:
# docker run --ipc=host --privileged --name verldev --gpus all --network=host --shm-size=1800gb -itd verl:awsefa

View File

@ -1,9 +0,0 @@
FROM verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
RUN pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
RUN cd /opt/nvidia && git clone --single-branch --branch core_r0.11.0 https://github.com/NVIDIA/Megatron-LM.git Megatron-LM
# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
# unset for now
RUN cd /opt/nvidia/Megatron-LM && pip3 install --no-deps -e .

View File

@ -3,12 +3,12 @@ FROM nvcr.io/nvidia/pytorch:24.05-py3
# uninstall nv-pytorch fork
RUN pip3 uninstall pytorch-quantization \
pytorch-triton \
torch \
torch-tensorrt \
torchvision \
xgboost transformer_engine flash_attn \
apex megatron-core -y
pytorch-triton \
torch \
torch-tensorrt \
torchvision \
xgboost transformer_engine flash_attn \
apex megatron-core -y
RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
@ -35,10 +35,11 @@ RUN pip3 install --no-cache-dir \
'tensordict<0.6' \
'transformers' \
'vllm==0.6.3.post1' \
'wandb'
'wandb' \
'tensorboard'
# full dependencies
RUN pip3 install pytest yapf py-spy pyext liger-kernel
RUN pip3 install pytest pre-commit py-spy pyext liger-kernel
# =============== Megatron dependencies (optional) =================
# install Transformer Engine, which requires FA 2.5.8. Do it in a separate step for docker cache

View File

@ -1,17 +1,13 @@
# Start from the NVIDIA official image (ubuntu-22.04 + python-3.10)
# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:24.08-py3
# uninstall nv-pytorch fork
RUN pip3 uninstall -y pytorch-quantization \
pytorch-triton torch torch-tensorrt torchvision \
xgboost transformer_engine flash_attn apex megatron-core
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Define installation arguments
@ -42,21 +38,34 @@ RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Install torch-2.6.0 + vllm-0.8.2
RUN pip install --no-cache-dir vllm==0.8.2 torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata \
transformers>=4.49.0 accelerate datasets peft hf-transfer \
ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \
pytest yapf py-spy pyext pre-commit ruff
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
xgboost transformer_engine flash_attn apex megatron-core grpcio
# Install flash_attn-2.7.4.post1
RUN pip uninstall -y transformer-engine flash-attn && \
wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
# Install torch-2.6.0+cu124 + vllm-0.8.3
# torch-2.6.0+cu124: cxx11abi=False
# torch-2.6.0+cu126: cxx11abi=True
# see https://github.com/flashinfer-ai/flashinfer/issues/911
RUN pip install --no-cache-dir "vllm==0.8.3" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata \
"transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=15.0.0" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \
pytest py-spy pyext pre-commit ruff tensorboard
# Install flash-attn-2.7.4.post1 (cxx11abi=False)
RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
# Fix cv2
# Install flashinfer-0.2.2.post1+cu124 (cxx11abi=False)
# vllm-0.8.3 does not support flashinfer>=0.2.3
# see https://github.com/vllm-project/vllm/pull/15777
RUN wget -nv https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
# Fix packages
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6 && \
pip install --no-cache-dir --upgrade optree>=0.13.0
pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
# Install verl
RUN pip install --no-cache-dir verl[vllm] -U

View File

@ -27,7 +27,7 @@ RUN apt-get update && \
RUN pip install --no-cache-dir vllm==0.8.2 torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata==0.11.0 \
transformers>=4.49.0 accelerate datasets peft hf-transfer \
ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \
pytest yapf py-spy pyext pre-commit ruff
pytest pre-commit py-spy pyext ruff tensorboard
# Install flash_attn-2.7.4.post1
RUN pip uninstall -y transformer-engine flash-attn && \

View File

@ -1,30 +1,296 @@
# Build the docker in the repo dir:
# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .
# docker images # you can find your built docker
# FROM "compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.4:94_ubuntu22.04_py3.10_pytorch_release-2.7_575e247"
# FROM "rlfoundation.azurecr.io/rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04"
FROM "rlsys/rocm-6.3.4-patch:rocm6.3.4-numa-patch_ubuntu-22.04"
SHELL ["/bin/bash", "-ceuxo", "pipefail"]
ENV MAX_JOBS=512
ENV PATH="/usr/local/python3.12/bin:$PATH"
RUN ln -sf /usr/bin/python3.12 /usr/bin/python && \
ln -sf /usr/bin/pip3.12 /usr/bin/pip
############################################
############################################
RUN apt-get update
RUN apt-get install -y pkg-config liblzma-dev
############################################
############################################
FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
###########################################
##########Install TransformerEngine########
###########################################
WORKDIR /workspace/
# transformer-engine install
# https://github.com/ROCm/TransformerEngine
# Set working directory
# WORKDIR $PWD/app
RUN rm -rf TransformerEngine
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git
WORKDIR /workspace/TransformerEngine
RUN git checkout 236178e5
# git checkout bb061ade
# git checkout 864405c
ENV NVTE_FRAMEWORK=pytorch
ENV NVTE_ROCM_ARCH=gfx942
ENV NVTE_USE_HIPBLASLT=1
ENV NVTE_USE_ROCM=1
# export CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr:${CMAKE_PREFIX_PATH:-}"
ENV CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr"
# ENV NVTE_BUILD_MAX_JOBS=$(MAX_JOBS)
RUN MAX_JOBS=$(MAX_JOBS) pip install . -vvv
WORKDIR /workspace/
###########################################
###########################################
###########################################
####################################################################################
################Install vllm - sglang require vllm 0.6.7 dependency#################
####################################################################################
#### Require vllm 0.6.7 - checkout 113274a0
WORKDIR /workspace/
RUN rm -rf vllm
RUN pip uninstall -y vllm
# Refer to here (down-grade vllm to 0.6.3): https://docs.vllm.ai/en/v0.6.3/getting_started/amd-installation.html
RUN git clone https://github.com/ROCm/vllm.git
# git clone https://github.com/vllm-project/vllm.git
WORKDIR /workspace/vllm
RUN git checkout 113274a0
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
#ENV MAX_JOBS=512
ENV MAX_JOBS=${MAX_JOBS}
RUN pip install "boto3>=1.26.0"
RUN pip install setuptools_scm
# will add src into py. You can delete the repo
RUN python3 setup.py install
WORKDIR /workspace/
####################################################################################
####################################################################################
####################################################################################
###########################################
############For hack docker################
###########################################
RUN pip install setuptools==75.8.0
###########################################
###########################################
###########################################
###########################################
############build sgalng###################
###########################################
# Set environment variables
ENV BASE_DIR=/sgl-workspace
ENV BUILD_TYPE=all
ENV SGL_REPO=https://github.com/sgl-project/sglang
ENV SGL_BRANCH=v0.4.6.post5
ENV TRITON_REPO=https://github.com/ROCm/triton.git
ENV TRITON_COMMIT=improve_fa_decode_3.0.0
ENV AITER_REPO=https://github.com/ROCm/aiter.git
ENV AITER_COMMIT=v0.1.2
# v0.1.2 version - commit id: 9d11f47
# ENV AITER_COMMIT=9d11f47
ENV HIP_FORCE_DEV_KERNARG=1
ENV HSA_NO_SCRATCH_RECLAIM=1
ENV SGLANG_SET_CPU_AFFINITY=1
ENV SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
ENV NCCL_MIN_NCHANNELS=112
ENV MOE_PADDING=1
ENV VLLM_FP8_PADDING=1
ENV VLLM_FP8_ACT_PADDING=1
ENV VLLM_FP8_WEIGHT_PADDING=1
ENV VLLM_FP8_REDUCE_CONV=1
ENV TORCHINDUCTOR_MAX_AUTOTUNE=1
ENV TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1
ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"
ENV AMDGPU_TARGETS=gfx942
ENV ROCM_ARCH=gfx942
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
# Install vllm
RUN pip uninstall -y vllm && \
rm -rf vllm && \
git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \
cd vllm && \
MAX_JOBS=$(nproc) python3 setup.py install && \
cd .. && \
rm -rf vllm
# Switch to working directory
WORKDIR /sgl-workspace
# Copy the entire project directory
COPY . .
# Clean and create directory
RUN rm -rf /sgl-workspace && mkdir -p /sgl-workspace
# Install dependencies
RUN pip install "tensordict<0.6" --no-deps && \
# Clone and build sglang
RUN git clone ${SGL_REPO} \
&& cd sglang \
&& git checkout ${SGL_BRANCH} || echo "Using default branch" \
&& cd sgl-kernel \
&& rm -f pyproject.toml \
&& mv pyproject_rocm.toml pyproject.toml \
&& python setup_rocm.py install \
&& cd .. \
&& if [ "$BUILD_TYPE" = "srt" ]; then \
python -m pip --no-cache-dir install -e "python[srt_hip]"; \
else \
python -m pip --no-cache-dir install -e "python[all_hip]"; \
fi \
&& cd /sgl-workspace \
&& cp -r /sgl-workspace/sglang /sglang \
&& python -m pip cache purge
# Install common Python packages
RUN pip install IPython orjson python-multipart torchao pybind11
# Rebuild Triton
RUN pip uninstall -y triton || true \
&& git clone ${TRITON_REPO} \
&& cd triton \
&& git checkout ${TRITON_COMMIT} \
&& cd python \
&& python3 setup.py install \
&& cd /sgl-workspace
# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942 --amdgpu-lower-module-lds-strategy=1"
# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"
# Build aiter
#version: Commit 9d11f47
# && git checkout ${AITER_COMMIT} \
RUN pip uninstall -y aiter || true
RUN git clone ${AITER_REPO} \
&& cd aiter \
&& git checkout ${AITER_COMMIT} \
&& git submodule sync \
&& git submodule update --init --recursive \
&& PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py install \
&& cd /sgl-workspace
# && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \
# && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \
# Copy MI300X config
RUN find /sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/ \
/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/ \
-type f -name '*MI300X*' | \
xargs -I {} sh -c 'vf_config=$(echo "$1" | sed "s/MI300X/MI300X_VF/"); cp "$1" "$vf_config"' -- {}
# Environment setup complete.
RUN echo "Environment setup complete."
WORKDIR /workspace/
###########################################
###########################################
###########################################
###########################################
###############vllm v0.8.5#################
###########################################
# ENV GITHUB_USERNAME=yushengsu-thu
# ENV GITHUB_MAIL=yushengsu@gmail.com
# RUN git config --global user.name "${GITHUB_USERNAME}" \
# && git config --global user.email "${GITHUB_MAIL}"
WORKDIR /workspace/
ENV VLLM_TARGET_DEVICE=rocm
ENV ROCM_PATH=/opt/rocm
ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev
# Find the repo path in: DockerFile/Dockerfile.rocm_yang
# RUN git clone https://github.com/RLFoundation/vllm-patch.git
RUN pip uninstall -y vllm || true
RUN rm -rf vllm-patch
RUN git clone https://github.com/RLFoundation/vllm-patch.git \
&& cd vllm-patch \
&& git checkout v0.8.5-sleep-numa \
&& rm -rf build/ dist/ *.egg-info \
&& ln -sf /opt/rocm/lib/libamdhip64.so /usr/lib/libamdhip64.so \
&& SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py install
# RUN SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py develop
WORKDIR /workspace/
###########################################
###########################################
###########################################
#########################################
#### Install megatron-core###############
#########################################
RUN pip uninstall -y megatron-core && \
git clone https://github.com/yushengsu-thu/Megatron-LM-amd_version.git && \
cd Megatron-LM-amd_version && \
pip install -vvv -e . && \
cd /workspace/
#########################################
#########################################
#########################################
#######################################
################apex###################
#######################################
WORKDIR /workspace/
RUN pip uninstall -y apex && \
git clone https://github.com/ROCm/apex.git && \
cd apex && \
python setup.py install && \
cd /workspace/
#######################################
#######################################
#######################################
################################################################################
###########################Add torch_memory_saver###############################
################################################################################
# Set environment variables
ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
ENV CFLAGS="-D__HIP_PLATFORM_AMD__"
ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"
RUN pip install "git+https://github.com/YangWang92/torch_memory_saver_numa.git@numa"
################################################################################
################################################################################
################################################################################
########################################
######Install ray#######################
########################################
# need to add this patch: https://github.com/ray-project/ray/pull/53531/files
RUN pip uninstall ray -y
RUN pip install "ray[data,train,tune,serve]>=2.47.0"
########################################
########################################
########################################
##########################################
#######Install other dependencies#########
##########################################
RUN pip install "tensordict==0.6.2" --no-deps && \
pip install accelerate \
codetiming \
datasets \
@ -36,10 +302,21 @@ RUN pip install "tensordict<0.6" --no-deps && \
peft \
"pyarrow>=15.0.0" \
pylatexenc \
"ray[data,train,tune,serve]" \
torchdata \
transformers \
wandb \
orjson \
pybind11 && \
pip install -e . --no-deps
pybind11
WORKDIR /workspace/
RUN git clone https://github.com/volcengine/verl.git && \
cd verl && \
pip install -e .
##########################################
##########################################
##########################################
WORKDIR /workspace/
CMD ["/usr/bin/bash"]

141
docker/Dockerfile.rocm7 Normal file
View File

@ -0,0 +1,141 @@
# default base image
ARG REMOTE_VLLM="1"
ARG COMMON_WORKDIR=/app
ARG BASE_IMAGE=rocm/vllm-dev:base_rocm7_0930_rc1_20250916_tuned_20250917
FROM ${BASE_IMAGE} AS base
ARG ARG_PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH=${ARG_PYTORCH_ROCM_ARCH:-${PYTORCH_ROCM_ARCH}}
# Install some basic utilities
RUN apt-get update -q -y && apt-get install -q -y \
sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev \
apt-transport-https ca-certificates wget curl
# Remove sccache
RUN python3 -m pip install --upgrade pip
RUN apt-get purge -y sccache; python3 -m pip uninstall -y sccache; rm -f "$(which sccache)"
ARG COMMON_WORKDIR
WORKDIR ${COMMON_WORKDIR}
# -----------------------
# vLLM fetch stages
FROM base AS fetch_vllm_0
ONBUILD COPY ./ vllm/
FROM base AS fetch_vllm_1
#ARG VLLM_REPO="https://github.com/ROCm/vllm.git"
#ARG VLLM_BRANCH="main"
ARG VLLM_REPO=https://github.com/HollowMan6/vllm.git
ARG VLLM_BRANCH="sleep_amd"
ONBUILD RUN git clone ${VLLM_REPO} \
&& cd vllm \
&& git checkout ${VLLM_BRANCH}
FROM fetch_vllm_${REMOTE_VLLM} AS fetch_vllm
# -----------------------
# vLLM build stages
FROM fetch_vllm AS build_vllm
# Build vLLM
RUN cd vllm \
&& python3 -m pip install -r requirements/rocm.txt \
&& python3 setup.py clean --all \
&& ln -sf /opt/rocm/lib/libamdhip64.so /usr/lib/libamdhip64.so \
&& VLLM_TARGET_DEVICE=rocm ROCM_PATH=/opt/rocm/ VLLM_GPU_LANG=HIP SETUPTOOLS_SCM_PRETEND_VERSION=0.8.4.dev python3 setup.py bdist_wheel --dist-dir=dist
#&& python3 setup.py bdist_wheel --dist-dir=dist
FROM scratch AS export_vllm
ARG COMMON_WORKDIR
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/dist/*.whl /
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/requirements /requirements
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/benchmarks /benchmarks
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/tests /tests
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/examples /examples
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/.buildkite /.buildkite
# -----------------------
# Test vLLM image
FROM base AS test
RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*
# Install vLLM
RUN --mount=type=bind,from=export_vllm,src=/,target=/install \
cd /install \
&& pip install -U -r requirements/rocm.txt \
&& pip install -U -r requirements/rocm-test.txt \
&& pip uninstall -y vllm \
&& pip install *.whl
WORKDIR /vllm-workspace
ARG COMMON_WORKDIR
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm /vllm-workspace
# install development dependencies (for testing)
RUN cd /vllm-workspace \
&& rm -rf vllm \
&& python3 -m pip install -e tests/vllm_test_utils \
&& python3 -m pip install lm-eval[api]==0.4.4 \
&& python3 -m pip install pytest-shard
# -----------------------
# Final vLLM image
FROM base AS final
RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*
# Error related to odd state for numpy 1.20.3 where there is no METADATA etc, but an extra LICENSES_bundled.txt.
# Manually remove it so that later steps of numpy upgrade can continue
RUN case "$(which python3)" in \
*"/opt/conda/envs/py_3.9"*) \
rm -rf /opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy-1.20.3.dist-info/;; \
*) ;; esac
RUN python3 -m pip install --upgrade huggingface-hub[cli]
# Install vLLM
RUN --mount=type=bind,from=export_vllm,src=/,target=/install \
cd /install \
&& pip install -U -r requirements/rocm.txt \
&& pip uninstall -y vllm \
&& pip install *.whl
ARG COMMON_WORKDIR
# Copy over the benchmark scripts as well
COPY --from=export_vllm /benchmarks ${COMMON_WORKDIR}/vllm/benchmarks
COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples
ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1
ENV TOKENIZERS_PARALLELISM=false
# ENV that can improve safe tensor loading, and end-to-end time
ENV SAFETENSORS_FAST_GPU=1
# Performance environment variable.
ENV HIP_FORCE_DEV_KERNARG=1
# -----------------------
# Install verl
RUN pip install "tensordict==0.6.2" --no-deps && \
pip install accelerate \
codetiming \
datasets \
dill \
hydra-core \
liger-kernel \
numpy \
pandas \
peft \
"pyarrow>=15.0.0" \
pylatexenc \
torchdata \
wandb \
orjson \
pybind11
WORKDIR /workspace/
RUN git clone https://github.com/volcengine/verl.git && \
cd verl && \
pip install -e .
CMD ["/bin/bash"]

View File

@ -0,0 +1,58 @@
# Build the docker in the repo dir:
# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .
# docker images # you can find your built docker
# Support - Traing: fsdp; Inference: vllm
# FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
# Support - Traing: fsdp; Inference: vllm, sglang
FROM lmsysorg/sglang:v0.4.6.post5-rocm630
# Set working directory
# WORKDIR $PWD/app
# Set environment variables
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
ENV CFLAGS="-D__HIP_PLATFORM_AMD__"
ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"
# Install vllm
RUN pip uninstall -y vllm && \
rm -rf vllm && \
git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \
cd vllm && \
MAX_JOBS=$(nproc) python3 setup.py install && \
cd .. && \
rm -rf vllm
# Copy the entire project directory
COPY . .
# Install dependencies
RUN pip install "tensordict==0.6.2" --no-deps && \
pip install accelerate \
codetiming \
datasets \
dill \
hydra-core \
liger-kernel \
numpy \
pandas \
peft \
"pyarrow>=15.0.0" \
pylatexenc \
"ray[data,train,tune,serve]<2.45.0" \
torchdata \
transformers \
wandb \
orjson \
pybind11
RUN git clone https://github.com/volcengine/verl.git && \
cd verl && \
pip install -e .
# Install torch_memory_saver
RUN pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps

View File

@ -0,0 +1,323 @@
# FROM "compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.4:94_ubuntu22.04_py3.10_pytorch_release-2.7_575e247"
# FROM "rlfoundation.azurecr.io/rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04"
FROM "rlsys/rocm-6.3.4-patch:rocm6.3.4-numa-patch_ubuntu-22.04"
SHELL ["/bin/bash", "-ceuxo", "pipefail"]
ENV MAX_JOBS=512
ENV PATH="/usr/local/python3.12/bin:$PATH"
RUN ln -sf /usr/bin/python3.12 /usr/bin/python && \
ln -sf /usr/bin/pip3.12 /usr/bin/pip
############################################
############################################
RUN apt-get update
RUN apt-get install -y pkg-config liblzma-dev
############################################
############################################
###########################################
##########Install TransformerEngine########
###########################################
WORKDIR /workspace/
# transformer-engine install
# https://github.com/ROCm/TransformerEngine
RUN rm -rf TransformerEngine
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git
WORKDIR /workspace/TransformerEngine
RUN git checkout 236178e5
# git checkout bb061ade
# git checkout 864405c
ENV NVTE_FRAMEWORK=pytorch
ENV NVTE_ROCM_ARCH=gfx942
ENV NVTE_USE_HIPBLASLT=1
ENV NVTE_USE_ROCM=1
# export CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr:${CMAKE_PREFIX_PATH:-}"
ENV CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr"
# ENV NVTE_BUILD_MAX_JOBS=$(MAX_JOBS)
RUN MAX_JOBS=$(MAX_JOBS) pip install . -vvv
WORKDIR /workspace/
###########################################
###########################################
###########################################
####################################################################################
################Install vllm - sglang require vllm 0.6.7 dependency#################
####################################################################################
#### Require vllm 0.6.7 - checkout 113274a0
WORKDIR /workspace/
RUN rm -rf vllm
RUN pip uninstall -y vllm
# Refer to here (down-grade vllm to 0.6.3): https://docs.vllm.ai/en/v0.6.3/getting_started/amd-installation.html
RUN git clone https://github.com/ROCm/vllm.git
# git clone https://github.com/vllm-project/vllm.git
WORKDIR /workspace/vllm
RUN git checkout 113274a0
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
#ENV MAX_JOBS=512
ENV MAX_JOBS=${MAX_JOBS}
RUN pip install "boto3>=1.26.0"
RUN pip install setuptools_scm
# will add src into py. You can delete the repo
RUN python3 setup.py install
WORKDIR /workspace/
####################################################################################
####################################################################################
####################################################################################
###########################################
############For hack docker################
###########################################
RUN pip install setuptools==75.8.0
###########################################
###########################################
###########################################
###########################################
############build sgalng###################
###########################################
# Set environment variables
ENV BASE_DIR=/sgl-workspace
ENV BUILD_TYPE=all
ENV SGL_REPO=https://github.com/sgl-project/sglang
ENV SGL_BRANCH=v0.4.6.post5
ENV TRITON_REPO=https://github.com/ROCm/triton.git
ENV TRITON_COMMIT=improve_fa_decode_3.0.0
ENV AITER_REPO=https://github.com/ROCm/aiter.git
ENV AITER_COMMIT=v0.1.2
# v0.1.2 version - commit id: 9d11f47
# ENV AITER_COMMIT=9d11f47
ENV HIP_FORCE_DEV_KERNARG=1
ENV HSA_NO_SCRATCH_RECLAIM=1
ENV SGLANG_SET_CPU_AFFINITY=1
ENV SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
ENV NCCL_MIN_NCHANNELS=112
ENV MOE_PADDING=1
ENV VLLM_FP8_PADDING=1
ENV VLLM_FP8_ACT_PADDING=1
ENV VLLM_FP8_WEIGHT_PADDING=1
ENV VLLM_FP8_REDUCE_CONV=1
ENV TORCHINDUCTOR_MAX_AUTOTUNE=1
ENV TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1
ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"
ENV AMDGPU_TARGETS=gfx942
ENV ROCM_ARCH=gfx942
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
# Switch to working directory
WORKDIR /sgl-workspace
# Clean and create directory
RUN rm -rf /sgl-workspace && mkdir -p /sgl-workspace
# Clone and build sglang
RUN git clone ${SGL_REPO} \
&& cd sglang \
&& git checkout ${SGL_BRANCH} || echo "Using default branch" \
&& cd sgl-kernel \
&& rm -f pyproject.toml \
&& mv pyproject_rocm.toml pyproject.toml \
&& python setup_rocm.py install \
&& cd .. \
&& if [ "$BUILD_TYPE" = "srt" ]; then \
python -m pip --no-cache-dir install -e "python[srt_hip]"; \
else \
python -m pip --no-cache-dir install -e "python[all_hip]"; \
fi \
&& cd /sgl-workspace \
&& cp -r /sgl-workspace/sglang /sglang \
&& python -m pip cache purge
# Install common Python packages
RUN pip install IPython orjson python-multipart torchao pybind11
# Rebuild Triton
RUN pip uninstall -y triton || true \
&& git clone ${TRITON_REPO} \
&& cd triton \
&& git checkout ${TRITON_COMMIT} \
&& cd python \
&& python3 setup.py install \
&& cd /sgl-workspace
# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942 --amdgpu-lower-module-lds-strategy=1"
# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"
# Build aiter
#version: Commit 9d11f47
# && git checkout ${AITER_COMMIT} \
RUN pip uninstall -y aiter || true
RUN git clone ${AITER_REPO} \
&& cd aiter \
&& git checkout ${AITER_COMMIT} \
&& git submodule sync \
&& git submodule update --init --recursive \
&& PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py install \
&& cd /sgl-workspace
# && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \
# && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \
# Copy MI300X config
RUN find /sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/ \
/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/ \
-type f -name '*MI300X*' | \
xargs -I {} sh -c 'vf_config=$(echo "$1" | sed "s/MI300X/MI300X_VF/"); cp "$1" "$vf_config"' -- {}
# Environment setup complete.
RUN echo "Environment setup complete."
WORKDIR /workspace/
###########################################
###########################################
###########################################
###########################################
###############vllm v0.8.5#################
###########################################
# ENV GITHUB_USERNAME=yushengsu-thu
# ENV GITHUB_MAIL=yushengsu@gmail.com
# RUN git config --global user.name "${GITHUB_USERNAME}" \
# && git config --global user.email "${GITHUB_MAIL}"
WORKDIR /workspace/
ENV VLLM_TARGET_DEVICE=rocm
ENV ROCM_PATH=/opt/rocm
ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev
# Find the repo path in: DockerFile/Dockerfile.rocm_yang
# RUN git clone https://github.com/RLFoundation/vllm-patch.git
RUN pip uninstall -y vllm || true
RUN rm -rf vllm-patch
RUN git clone https://github.com/RLFoundation/vllm-patch.git \
&& cd vllm-patch \
&& git checkout v0.8.5-sleep-numa \
&& rm -rf build/ dist/ *.egg-info \
&& ln -sf /opt/rocm/lib/libamdhip64.so /usr/lib/libamdhip64.so \
&& SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py install
# RUN SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py develop
WORKDIR /workspace/
###########################################
###########################################
###########################################
#########################################
#### Install megatron-core###############
#########################################
RUN pip uninstall -y megatron-core && \
git clone https://github.com/yushengsu-thu/Megatron-LM-amd_version.git && \
cd Megatron-LM-amd_version && \
pip install -vvv -e . && \
cd /workspace/
#########################################
#########################################
#########################################
#######################################
################apex###################
#######################################
WORKDIR /workspace/
RUN pip uninstall -y apex && \
git clone https://github.com/ROCm/apex.git && \
cd apex && \
python setup.py install && \
cd /workspace/
#######################################
#######################################
#######################################
################################################################################
###########################Add torch_memory_saver###############################
################################################################################
# Set environment variables
ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
ENV CFLAGS="-D__HIP_PLATFORM_AMD__"
ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"
RUN pip install "git+https://github.com/YangWang92/torch_memory_saver_numa.git@numa"
################################################################################
################################################################################
################################################################################
########################################
######Install ray#######################
########################################
# need to add this patch: https://github.com/ray-project/ray/pull/53531/files
RUN pip uninstall ray -y
RUN pip install "ray[data,train,tune,serve]>=2.47.0"
########################################
########################################
########################################
##########################################
#######Install other dependencies#########
##########################################
RUN pip install "tensordict==0.6.2" --no-deps && \
pip install accelerate \
codetiming \
datasets \
dill \
hydra-core \
liger-kernel \
numpy \
pandas \
peft \
"pyarrow>=15.0.0" \
pylatexenc \
torchdata \
wandb \
orjson \
pybind11
WORKDIR /workspace/
RUN git clone https://github.com/volcengine/verl.git && \
cd verl && \
pip install -e .
##########################################
##########################################
##########################################
WORKDIR /workspace/
CMD ["/usr/bin/bash"]
CMD ["/usr/bin/bash"]

55
docker/Dockerfile.sglang Normal file
View File

@ -0,0 +1,55 @@
# Start from the NVIDIA official image (ubuntu-22.04 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:24.08-py3
# Define environments
ENV MAX_JOBS=32
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
# Define installation arguments
ARG APT_SOURCE=https://mirrors.ustc.edu.cn/ubuntu/
# Set apt source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
{ \
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
} > /etc/apt/sources.list
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install tini
RUN apt-get update && \
apt-get install -y tini && \
apt-get clean
# Change pip source
ARG PIP_INDEX=https://mirrors.aliyun.com/pypi/simple/
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Install sglang-0.4.6.post5 and torch-memory-saver
RUN pip uninstall -y cuda-python && pip install "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir
# Install torch-2.6.0
RUN pip install --no-cache-dir torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata \
transformers>=4.49.0 accelerate datasets peft hf_transfer \
ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb liger-kernel \
pytest pre-commit py-spy pyext
# Install flash_attn-2.7.4.post1
RUN pip uninstall -y transformer-engine flash-attn && \
wget -v https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
# Fix cv2
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6

View File

@ -23,7 +23,7 @@ RUN pip3 install --no-cache-dir \
RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation
# vllm depends on ray, and veRL does not support ray > 2.37
# vllm depends on ray
RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10
# install apex

View File

@ -0,0 +1,115 @@
# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:24.08-py3
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Define installation arguments
ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Set apt source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
{ \
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
} > /etc/apt/sources.list
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install tini
RUN apt-get update && \
apt-get install -y tini aria2 && \
apt-get clean
# Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
xgboost transformer_engine flash_attn apex megatron-core grpcio
# Reinstall CUDA 12.4
RUN aria2c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
RUN aria2c --always-resume=true --max-tries=99999 https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cuda-toolkit-12-4 && \
rm cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
update-alternatives --set cuda /usr/local/cuda-12.4 && \
rm -rf /usr/local/cuda-12.6
# Install torch-2.6.0+cu124 + vllm-0.8.5.post1 + sglang-0.4.6.post5
# torch-2.6.0+cu124: cxx11abi=False
# torch-2.6.0+cu126: cxx11abi=True
# see https://github.com/flashinfer-ai/flashinfer/issues/911
# Install sglang-0.4.6.post1 and torch-memory-saver
RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install --resume-retries 999 torch-memory-saver --no-cache-dir
RUN pip install --resume-retries 999 --no-cache-dir "vllm==0.8.5.post1" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata
RUN pip install --resume-retries 999 --no-cache-dir "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=15.0.0" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile \
pytest py-spy pyext pre-commit ruff
# Install flash-attn-2.7.4.post1 (cxx11abi=False)
RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
# Fix packages
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
# Install cudnn
RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cudnn-cuda-12 && \
rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install Apex
RUN git clone https://github.com/NVIDIA/apex.git && \
cd apex && \
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/TransformerEngine.git@v2.3
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Fix opencv
RUN pip install opencv-python
RUN pip install opencv-fixer && \
python -c "from opencv_fixer import AutoFix; AutoFix()"
# Install verl
# Reset pip config
RUN pip config unset global.index-url && \
pip config unset global.extra-index-url
RUN apt-get update && \
apt-get install -y aria2 libfreeimage3 libfreeimage-dev zlib1g

72
docker/README.md Normal file
View File

@ -0,0 +1,72 @@
# Dockerfiles of verl
We provide pre-built Docker images for quick setup. And from this version, we utilize a new image release hierarchy for productivity and stability.
The image types are divided into three large categories:
- **Base Image**: Without inference and training frameworks, only basic dependencies are installed. Can directly install vllm or SGLang on top of it, without need of reinstall torch or CUDA.
- **Application Image**: Stable version with inference and training frameworks installed.
- **Preview Image**: Unstable version with the latest frameworks and features.
The first two types of images are hosted on dockerhub [verlai/verl](https://hub.docker.com/r/verlai/verl) repository, while the preview images are hosted on community repository.
> The image versions are mapped with verl releases, for example, image with tag ``verl0.4`` is built for verl release ``v0.4.x``.
## Base Image
The stable base image is ``verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4`` with different CUDA versions.
The update of base image is not frequent, and the app image can be built on top of it without reinstalling base packages.
## Application Image
From this version, we divide images built for vLLM and SGLang as the divergence of dependent packages like FlashInfer.
There are 2 types of application images available:
- **vLLM with FSDP and Megatron**: ``verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2``
- **SGLang with FSDP and Megatron**: `verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2`
Docker images with Megatron backends are runnable with large language model like ``Qwen/Qwen3-235B-A22B``, ``deepseek-ai/DeepSeek-V3-0324`` post-training. Refer to the :doc:`Large Language Model Post-Training documentation<../perf/dpsk>` for more details.
Application images can be updated frequently, and the Dockerfile can be found in ``docker/verl[version]-[packages]/Dockerfile.app.[frameworks]``. Based on the base image, it is easy to build your own application image with the desired inference and training frameworks.
## Community Image
For vLLM with FSDP, please refer to [hiyouga/verl](https://hub.docker.com/r/hiyouga/verl) repository and the latest version is ``hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.4-flashinfer0.2.2-cxx11abi0``.
For SGLang with FSDP, please refer to [ocss884/verl-sglang](https://hub.docker.com/r/ocss884/verl-sglang) repository and the latest version is ``ocss884/verl-sglang:ngc-th2.6.0-cu126-sglang0.4.6.post5`` which is provided by SGLang RL Group.
For latest vLLM with Megatron, please refer to [iseekyan/verl](https://hub.docker.com/r/iseekyan/verl) repository and the latest version is ``iseekyan/verl:nemo.gptoss_vllm0.11.0``.
See files under ``docker/`` for NGC-based image or if you want to build your own.
Note that For aws instances with EFA net interface (Sagemaker AI Pod), you need to install EFA driver as shown in ``docker/Dockerfile.extenstion.awsefa``
## Installation from Docker
After pulling the desired Docker image and installing desired inference and training frameworks, you can run it with the following steps:
1. Launch the desired Docker image and attach into it:
```sh
docker create --runtime=nvidia --gpus all --net=host --shm-size="10g" --cap-add=SYS_ADMIN -v .:/workspace/verl --name verl <image:tag> sleep infinity
docker start verl
docker exec -it verl bash
```
2. If you use the images provided, you only need to install verl itself without dependencies:
```sh
# install the nightly version (recommended)
git clone https://github.com/volcengine/verl && cd verl
pip3 install --no-deps -e .
```
[Optional] If you hope to switch between different frameworks, you can install verl with the following command:
```sh
# install the nightly version (recommended)
git clone https://github.com/volcengine/verl && cd verl
pip3 install -e .[vllm]
pip3 install -e .[sglang]
```

View File

@ -0,0 +1,41 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.6.post5 and torch-memory-saver
RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir
# Some sglang operations in 0.4.6.post5 require vllm
# [Warning] vllm can have some packages not compatible with sglang, for example, flashinfer
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Fix for transformers 4.53.0
RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,82 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.6.post5 and torch-memory-saver
RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir
# Some sglang operations in 0.4.6.post5 require vllm
# [Warning] vllm can have some packages not compatible with sglang, for example, flashinfer
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Fix for transformers 4.53.0
RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"
# Install mbridge
RUN pip3 install --no-cache-dir mbridge
# Install DeepEP
## the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
## Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
## Build deepep-nvshmem
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install

View File

@ -0,0 +1,82 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.6.post5 and torch-memory-saver
RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir
# Some sglang operations in 0.4.6.post5 require vllm
# [Warning] vllm can have some packages not compatible with sglang, for example, flashinfer
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_r0.13.0
# Fix for transformers 4.53.0
RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"
# Install mbridge
RUN pip3 install --no-cache-dir mbridge
# Install DeepEP
## the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
## Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
## Build deepep-nvshmem
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install

View File

@ -0,0 +1,47 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install torch-2.6.0+cu124 + vllm-0.8.5.post1
# torch-2.6.0+cu124: cxx11abi=False
# torch-2.6.0+cu126: cxx11abi=True
# see https://github.com/flashinfer-ai/flashinfer/issues/911
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1
# Install flashinfer-0.2.2.post1+cu126 (cxx11abi=True)
# vllm-0.8.3 does not support flashinfer>=0.2.3
# see https://github.com/vllm-project/vllm/pull/15777
RUN aria2c --max-tries=9999 https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
rm flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Fix for transformers 4.53.0
RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,88 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install torch-2.6.0+cu124 + vllm-0.8.5.post1
# torch-2.6.0+cu124: cxx11abi=False
# torch-2.6.0+cu126: cxx11abi=True
# see https://github.com/flashinfer-ai/flashinfer/issues/911
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1
# Install flashinfer-0.2.2.post1+cu126 (cxx11abi=True)
# vllm-0.8.3 does not support flashinfer>=0.2.3
# see https://github.com/vllm-project/vllm/pull/15777
RUN aria2c --max-tries=9999 https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
rm flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Fix for transformers 4.53.0
RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"
# Install mbridge
RUN pip3 install --no-cache-dir mbridge
# Install DeepEP
## the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
## Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
## Build deepep-nvshmem
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install

View File

@ -0,0 +1,85 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install torch-2.6.0+cu124 + vllm-0.8.5.post1
# torch-2.6.0+cu124: cxx11abi=False
# torch-2.6.0+cu126: cxx11abi=True
# see https://github.com/flashinfer-ai/flashinfer/issues/911
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1
# Install flashinfer-0.2.2.post1+cu126 (cxx11abi=True)
# vllm-0.8.3 does not support flashinfer>=0.2.3
# see https://github.com/vllm-project/vllm/pull/15777
RUN aria2c --max-tries=9999 https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
rm flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Install mbridge
RUN pip3 install --no-cache-dir mbridge
# Install DeepEP
## the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
## Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
## Build deepep-nvshmem
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install

View File

@ -0,0 +1,113 @@
# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks
# Target: verlai/verl:base-v2-cu124-cudnn9.8-torch2.6-fa2.8.0-te2.3
# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:24.08-py3
# Define environments
ENV MAX_JOBS=16
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Define installation arguments
ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Set apt source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
{ \
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
} > /etc/apt/sources.list
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install tini
RUN apt-get update && \
apt-get install -y tini aria2 && \
apt-get clean
# Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
xgboost transformer_engine flash_attn apex megatron-core grpcio
# Reinstall CUDA 12.4
RUN aria2c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
RUN aria2c --always-resume=true --max-tries=99999 https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cuda-toolkit-12-4 && \
rm cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \
update-alternatives --set cuda /usr/local/cuda-12.4 && \
rm -rf /usr/local/cuda-12.6
RUN pip install --resume-retries 999 --no-cache-dir torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
# Install flash-attn-2.7.4.post1 (cxx11abi=False)
RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
# Fix packages
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
# Install cudnn
RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cudnn-cuda-12 && \
rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
# Install Apex
RUN git clone https://github.com/NVIDIA/apex.git && \
cd apex && \
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# Profiling tools
RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
apt-get update && apt-get install -y libxcb-cursor0 && \
dpkg -i ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
rm -rf /usr/local/cuda/bin/nsys && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
rm -rf /usr/local/cuda/bin/nsys-ui && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb
# Fix opencv
RUN pip install --resume-retries 999 --no-cache-dir opencv-python
RUN pip install --resume-retries 999 --no-cache-dir opencv-fixer && \
python -c "from opencv_fixer import AutoFix; AutoFix()"
RUN pip install --resume-retries 999 --no-cache-dir cuda-bindings
# Reset pip config
RUN pip config unset global.index-url && \
pip config unset global.extra-index-url
RUN apt-get update && \
apt-get install -y libfreeimage3 libfreeimage-dev zlib1g htop

View File

@ -0,0 +1,31 @@
# verl image with verl v0.4.x
## Important packages version
```txt
cuda==12.4
cudnn==9.8.0
torch==2.6.0
flash_attn=2.7.4
sglang==0.4.6.post5
vllm==0.8.5.post1
vidia-cudnn-cu12==9.8.0.87
transformer_engine==2.3
megatron.core==core_v0.12.2
# Preview
transformer_engine==2.5
megatron.core==core_r0.13.0
```
## Target
- Base image:
- `verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4`
- App image:
- `verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.12.2-te2.2`: SGLang requires vLLM in 0.4.6.post5 version, vLLM can have some package conflicts with SGLang
- `verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.12.2-te2.2-deepep`: Built with deepep
- `verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.2-te2.2`
- `verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.2-te2.2-deepep`: Built with deepep
- Preview image:
- `verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.13.0-te2.2-preview`
- `verlai/verl:app-verl0.4-vllm0.8.5-mcore0.13.0-te2.2-preview`

View File

@ -0,0 +1,37 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4
# Define environments
ENV MAX_JOBS=8
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.10
# Install FlashInfer Python package
RUN pip install --upgrade pip setuptools packaging
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.9rc1
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation "sglang[all]==0.4.10.post2"
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]==4.55.4" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.13.0
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,37 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4
# Define environments
ENV MAX_JOBS=8
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.10
# Install FlashInfer Python package
RUN pip install --upgrade pip setuptools packaging
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.9rc1
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation "sglang[all]==0.4.9.post6"
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]==4.55.4" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.13.0
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,38 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install torch-2.7.1+cu126 + vllm-0.10.0
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.10.0
# Fix packages
# transformers 4.54.0 still not support
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.55.4" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.13.0
# Install mbridge
RUN pip3 install --no-cache-dir mbridge
# Fix qwen vl
RUN pip3 install --no-cache-dir --no-deps trl

View File

@ -0,0 +1,39 @@
# Start from the verl base image
# Dockerfile.base
FROM iseekyan/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4-h100
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install torch-2.7.1+cu126 + vllm-0.10.0
RUN pip install --resume-retries 999 --no-cache-dir vllm==0.10.0
# Fix packages
# transformers 4.54.0 still not support
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.55.4" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.7
RUN pip install onnxscript
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.15.0rc4
# Install mbridge
RUN pip3 install --no-cache-dir mbridge==v0.15.0
# Fix qwen vl
RUN pip3 install --no-cache-dir --no-deps trl

View File

@ -0,0 +1,133 @@
# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks
# Target: verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6
# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:24.08-py3
# Define environments
ENV MAX_JOBS=16
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Define installation arguments
ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Set apt source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
{ \
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
} > /etc/apt/sources.list
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install tini
RUN apt-get update && \
apt-get install -y tini aria2 libfreeimage3 libfreeimage-dev zlib1g htop && \
apt-get clean
# Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
xgboost transformer_engine flash_attn apex megatron-core grpcio
RUN pip install --resume-retries 999 --no-cache-dir torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1
# Install flash-attn-2.7.4.post1, although built with torch2.6, it is compatible with torch2.7
# https://github.com/Dao-AILab/flash-attention/issues/1644#issuecomment-2899396361
RUN ABI_FLAG=$(python -c "import torch; print('TRUE' if torch._C._GLIBCXX_USE_CXX11_ABI else 'FALSE')") && \
URL="https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \
FILE="flash_attn-2.7.4.post1+cu12torch2.6cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \
wget -nv "${URL}" && \
pip install --no-cache-dir "${FILE}"
# Fix packages
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
# Install cudnn
RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cudnn-cuda-12 && \
rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
# Install Apex
RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" --resume-retries 999 git+https://github.com/NVIDIA/apex.git
# Profiling tools
RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
apt-get update && apt-get install -y libxcb-cursor0
RUN apt-get install -y ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
rm -rf /usr/local/cuda/bin/nsys && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
rm -rf /usr/local/cuda/bin/nsys-ui && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb
RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.52.3" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas cuda-bindings \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
# Install DeepEP
## the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
## Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
## Build deepep-nvshmem
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install
# Reset pip config
RUN pip config unset global.index-url && \
pip config unset global.extra-index-url

View File

@ -0,0 +1,27 @@
# verl image with verl v0.5
## Important packages version
```txt
cuda==12.6
cudnn==9.8.0
torch==2.7.1
flash_attn=2.7.4.post1
sglang==0.4.9.post6
vllm==0.8.5.post1
vidia-cudnn-cu12==9.8.0.87
transformer_engine==2.3
megatron.core==core_v0.12.2
# Preview
transformer_engine==2.5
megatron.core==core_r0.13.0
```
## Target
- Base image:
- `verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4`: We offer a base image with deep ep built in, for vllm/sglang
- App image:
- `verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2`
- `verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2`
- `iseekyan/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.15.0-te2.7`

View File

@ -0,0 +1,37 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0
# Define environments
ENV MAX_JOBS=8
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.8 and torch-memory-saver
# Install FlashInfer Python package
RUN pip install --upgrade pip setuptools packaging
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.6.post1
RUN pip install --resume-retries 999 --no-cache-dir "sglang[all]==0.4.8" && pip install torch-memory-saver --no-cache-dir
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.3
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,37 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0
# Define environments
ENV MAX_JOBS=8
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.8 and torch-memory-saver
# Install FlashInfer Python package
RUN pip install --upgrade pip setuptools packaging
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.6.post1
RUN pip install --resume-retries 999 --no-cache-dir "sglang[all]==0.4.8" && pip install torch-memory-saver --no-cache-dir
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,132 @@
# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks
# Target: verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6
# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:24.08-py3
# Define environments
ENV MAX_JOBS=16
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Define installation arguments
ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Set apt source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
{ \
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
} > /etc/apt/sources.list
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install tini
RUN apt-get update && \
apt-get install -y tini aria2 libfreeimage3 libfreeimage-dev zlib1g htop && \
apt-get clean
# Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
xgboost transformer_engine flash_attn apex megatron-core grpcio
RUN pip install --resume-retries 999 --no-cache-dir torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1
# Install flash-attn-2.8.0.post2 (cxx11abi=True)
RUN ABI_FLAG=$(python -c "import torch; print('TRUE' if torch._C._GLIBCXX_USE_CXX11_ABI else 'FALSE')") && \
URL="https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \
FILE="flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \
wget -nv "${URL}" && \
pip install --no-cache-dir "${FILE}"
# Fix packages
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
# Install cudnn
RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cudnn-cuda-12 && \
rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
# Install Apex
RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" --resume-retries 999 git+https://github.com/NVIDIA/apex.git
# Profiling tools
RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
apt-get update && apt-get install -y libxcb-cursor0
RUN apt-get install -y ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
rm -rf /usr/local/cuda/bin/nsys && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
rm -rf /usr/local/cuda/bin/nsys-ui && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb
RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.53" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas cuda-bindings \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pyext pre-commit ruff
# Install DeepEP
## the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
## Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
## Build deepep-nvshmem
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install
# Reset pip config
RUN pip config unset global.index-url && \
pip config unset global.extra-index-url

View File

@ -0,0 +1,27 @@
# verl image with verl v0.5
## Important packages version
```txt
cuda==12.6
cudnn==9.8.0
torch==2.7.1
flash_attn=2.8.0 ##
sglang==0.4.8
vllm==0.8.5.post1
vidia-cudnn-cu12==9.8.0.87
transformer_engine==2.3
megatron.core==core_v0.12.2
# Preview
transformer_engine==2.5
megatron.core==core_r0.13.0
```
## Target
- Base image:
- `verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0`: We offer a base image with deep ep built in
- App image:
- `verlai/verl:app-verl0.5-sglang0.4.9-mcore0.12.2`
- `verlai/verl:app-verl0.5-sglang0.4.9-mcore0.13.0-preview`
- vllm temporarily not support latest version

View File

@ -0,0 +1,36 @@
# Start from the verl base image
# Dockerfile.base
FROM verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6
# Define environments
ENV MAX_JOBS=8
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Install sglang-0.4.8 and torch-memory-saver
# Install FlashInfer Python package
RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.6.post1
RUN pip install --resume-retries 999 --no-cache-dir "sglang[all]==0.4.8" && pip install torch-memory-saver --no-cache-dir
# Fix packages
RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pre-commit ruff
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5
# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_r0.13.0
# Install mbridge
RUN pip3 install --no-cache-dir mbridge

View File

@ -0,0 +1,91 @@
# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks
# Target: verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6
# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
FROM nvcr.io/nvidia/pytorch:25.02-py3
# Define environments
ENV MAX_JOBS=16
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
# Define installation arguments
ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Set apt source
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
{ \
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
} > /etc/apt/sources.list
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install tini
RUN apt-get update && \
apt-get install -y tini aria2 libfreeimage3 libfreeimage-dev zlib1g htop && \
apt-get clean
# Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
python -m pip install --upgrade pip
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
xgboost transformer_engine flash_attn apex megatron-core grpcio
RUN pip install --resume-retries 999 --no-cache-dir torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
# Install flash-attn-2.8.0.post2 (cxx11abi=True)
RUN ABI_FLAG=$(python -c "import torch; print('TRUE' if torch._C._GLIBCXX_USE_CXX11_ABI else 'FALSE')") && \
URL="https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp312-cp312-linux_x86_64.whl" && \
FILE="flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp312-cp312-linux_x86_64.whl" && \
wget -nv "${URL}" && \
pip install --no-cache-dir "${FILE}"
# Fix packages
RUN pip uninstall -y pynvml nvidia-ml-py && \
pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
# Install cudnn
RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \
cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && \
apt-get -y install cudnn-cuda-12 && \
rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
# Install Apex
RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" --resume-retries 999 git+https://github.com/NVIDIA/apex.git
# Profiling tools
RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
apt-get update && apt-get install -y libxcb-cursor0
RUN apt-get install -y ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \
rm -rf /usr/local/cuda/bin/nsys && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
rm -rf /usr/local/cuda/bin/nsys-ui && \
ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb
RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas cuda-bindings \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pre-commit ruff
# Reset pip config
RUN pip config unset global.index-url && \
pip config unset global.extra-index-url

View File

@ -0,0 +1,26 @@
# verl image with verl v0.5
## Important packages version
```txt
cuda==12.8
cudnn==9.8.0
torch==2.7.1
flash_attn=2.8.0 ##
sglang==0.4.8
transformer_engine==2.5
megatron.core==core_r0.13.0
vidia-cudnn-cu12==9.8.0.87
```
## Target
- Base image:
- `verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8.0`: We offer a base image with flash infer 0.2.6.post1 built in
- App image:
- `verlai/verl:app-verl0.5-preview-sglang0.4.8-mcore0.13.0-preview`
- vllm temporarily not support latest version
## !!!Notice!!!
- pyext is lack of maintainace and cannot work with python 3.12, consider using replacement and deprecating this package.

View File

@ -0,0 +1,4 @@
FROM verlai/verl:base-verl0.6-cu128-cudnn9.8-torch2.8.0-fa2.7.4
RUN pip install --no-cache-dir "sglang[all]==0.5.2"
RUN pip install --no-cache-dir "torch-memory-saver==0.0.9rc1"

View File

@ -0,0 +1,108 @@
# Start from the NVIDIA official image (ubuntu-24.04 + cuda-12.8 + python-3.12)
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-03.html
FROM nvcr.io/nvidia/pytorch:25.03-py3
# Define environments
ENV MAX_JOBS=32
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
ENV DEBIAN_FRONTEND=noninteractive
ENV NODE_OPTIONS=""
ENV PIP_ROOT_USER_ACTION=ignore
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
ENV PIP_CONSTRAINT=""
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \
pip config set global.no-cache-dir "true" && \
python -m pip install --upgrade pip
# Install systemctl
RUN apt-get update && \
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
apt-get clean
# Install libxml2
RUN apt-get update && \
apt-get install -y libxml2 aria2 && \
apt-get clean
# Uninstall nv-pytorch fork
RUN pip uninstall -y torch torchvision torchaudio \
pytorch-quantization pytorch-triton torch-tensorrt \
transformer_engine flash_attn apex megatron-core \
xgboost opencv grpcio
# Fix packages
RUN pip install --no-cache-dir tensordict torchdata "transformers[hf_xet]==4.55.4" accelerate datasets peft hf-transfer \
"numpy<2.0.0" "pyarrow>=19.0.1" pandas \
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \
pytest py-spy pre-commit ruff
# Fix cv2
RUN rm -rf /usr/local/lib/python3.11/dist-packages/cv2
# Install torch
RUN pip install --no-cache-dir torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
# Install flash-attn
RUN pip install --no-cache-dir --no-build-isolation flash_attn==2.7.4.post1
# Install DeepEP
# the dependency of IBGDA
RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so
# Clone and build deepep and deepep-nvshmem
RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \
git clone https://github.com/deepseek-ai/DeepEP.git && \
cd DeepEP && git checkout a84a248
# Prepare nvshmem
RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \
tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \
cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch
## Build deepep-nvshmem
RUN apt-get install -y ninja-build cmake
ENV CUDA_HOME=/usr/local/cuda
### Set MPI environment variables. Having errors when not set.
ENV CPATH=/usr/local/mpi/include:$CPATH
ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH
ENV GDRCOPY_HOME=/workspace/gdrcopy
ENV GDRCOPY_INCLUDE=/workspace/gdrcopy/include
RUN cd deepep-nvshmem && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_MPI_SUPPORT=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install
ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install
ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH
ENV PATH=$NVSHMEM_DIR/bin:$PATH
## Build deepep
RUN cd DeepEP && \
python setup.py install
# Install Apex
RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git
# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1
# Install Megatron-LM
RUN git clone -b core_v0.13.0 https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && pip3 install --no-deps -e .
# Install mbridge
RUN pip3 install --no-cache-dir git+https://github.com/ISEEKYAN/mbridge.git

View File

@ -0,0 +1,15 @@
FROM nvcr.io/nvidia/nemo:25.07.gpt_oss
RUN git clone -b v0.11.0 --depth 1 https://github.com/vllm-project/vllm.git /opt/vllm
RUN pip install setuptools_scm
RUN cd /opt/vllm && pip install --no-deps --no-build-isolation --no-cache-dir -e .
RUN pip install cbor2 setproctitle blake3 openai_harmony pybase64 msgspec partial_json_parser py-cpuinfo diskcache gguf
RUN pip install --upgrade transformers tokenizers
RUN pip install codetiming tensordict mathruler pylatexenc
RUN pip3 install --no-cache-dir mbridge

View File

@ -1,9 +1,12 @@
# verl documents
# verl documentations
## Build the docs
```bash
# Install dependencies.
# If you want to view auto-generated API docstring, please make sure verl is available in python path. For instance, install verl via:
# pip install .. -e[test]
# Install dependencies needed for building docs.
pip install -r requirements-docs.txt
# Build the docs.
@ -16,4 +19,4 @@ make html
```bash
python -m http.server -d _build/html/
```
Launch your browser and navigate to http://localhost:8000 to view the documentation.
Launch your browser and navigate to http://localhost:8000 to view the documentation. Alternatively you could drag the file `_build/html/index.html` to your local browser and view directly.

View File

@ -1,8 +1,10 @@
# Upgrading to vllm >= 0.7
Note: verl+vllm 0.8.3 is now stable. Please see ``docs/README_vllm0.8.md`` for upgrade guide.
## Installation
Note: This version of veRL+vllm 0.7+ supports **FSDP** for training and **vLLM** for rollout.
Note: At time of writing, verl+vllm 0.7.x supports **FSDP** for training and **vLLM** for rollout.
```
# Create the conda environment
@ -47,11 +49,11 @@ After installation, examples using FSDP as training backends can be used. By def
```
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
actor_rollout_ref.rollout.free_cache_engine=True \
```
For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.
For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.
**Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts) using vLLM's V0 Engine.
@ -68,4 +70,4 @@ VLLM_USE_PRECOMPILED=1 pip install --editable .
```
Then you can enable the V1 engine by setting `export VLLM_USE_V1=1`. In some benchmark tests, the V1 engine demonstrates a 1.5x speed improvement over the vLLM V0 engine.
The stable support of the vLLM V1 engine will come soon.
The stable support of the vLLM V1 engine is available on verl main.

View File

@ -1,8 +1,10 @@
# Upgrading to vLLM >= 0.8
Last updated: 05/04/2025.
## Installation
Note: This version of veRL+vLLM 0.8+ supports **FSDP** for training and **vLLM** for rollout.
Note: This version of verl+vLLM 0.8+ supports **FSDP** for training and **vLLM** for rollout.
```bash
# Create the conda environment
@ -15,34 +17,30 @@ cd verl
pip3 install -e .
# Install the latest stable version of vLLM
pip3 install vllm==0.8.2
pip3 install vllm==0.8.3
# Install flash-attn
pip3 install flash-attn --no-build-isolation
```
We have a pre-built docker image for veRL+vLLM 0.8.2. You can direct import it with the following command:
We have a pre-built docker image for verl+vLLM 0.8.3. You can direct import it with the following command:
```bash
docker pull hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2
docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0
```
## Features
vLLM 0.8+ supports cuda graph and V1 engine by default in veRL. To enable these features, remember to add the following lines to the bash script:
vLLM 0.8+ supports cuda graph and V1 engine by default in verl. To enable these features, remember to add the following lines to the bash script:
```bash
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
actor_rollout_ref.rollout.free_cache_engine=True \
```
and also **remove** the environment variable if it exists:
```bash
export VLLM_ATTENTION_BACKEND=XFORMERS
```
## Notes
When you just directly upgrade vllm>=0.8, some dependency packages may undergo version changes. If you encounter the following problems:

217
docs/_static/custom.css vendored Normal file
View File

@ -0,0 +1,217 @@
/* Make the documentation use full screen width */
.wy-nav-content {
max-width: none !important;
width: 100% !important;
padding: 1.618em 3.236em !important;
}
/* Adjust the content wrapper - will be set by JavaScript */
.wy-nav-content-wrap {
margin-left: 300px;
transition: margin-left 0.2s ease;
width: auto !important;
position: relative !important;
background: white !important;
min-height: 100vh !important;
}
/* Make the main content area responsive */
.rst-content {
max-width: none !important;
width: 100% !important;
}
/* Optional: Adjust table widths to prevent overflow */
.rst-content table.docutils {
width: 100% !important;
table-layout: auto !important;
}
/* Optional: Better code block width handling */
.rst-content .highlight {
width: 100% !important;
}
/* Content area positioning already handled above */
/* Optional: Improve readability with some margin on very wide screens */
@media (min-width: 1400px) {
.wy-nav-content {
max-width: none !important;
margin: 0 auto !important;
}
}
/* Resizable sidebar styles */
.wy-nav-side {
position: fixed !important;
top: 0 !important;
bottom: 0 !important;
left: 0 !important;
width: 300px;
min-width: 200px;
max-width: 600px;
display: flex;
flex-direction: column;
z-index: 200 !important;
}
/* Ensure sidebar header (logo, search) adapts to width */
.wy-side-nav-search {
width: 100% !important;
box-sizing: border-box !important;
padding: 0.809em 0.809em !important;
}
.wy-side-nav-search input[type="text"] {
width: 100% !important;
box-sizing: border-box !important;
}
/* Make logo/title area responsive */
.wy-side-nav-search > div.version {
width: 100% !important;
}
.wy-side-nav-search > a {
width: 100% !important;
display: block !important;
white-space: nowrap !important;
overflow: hidden !important;
text-overflow: ellipsis !important;
}
/* Responsive adjustments for narrow sidebar */
@media (max-width: 300px) {
.wy-side-nav-search > a {
font-size: 0.9em !important;
}
.wy-side-nav-search input[type="text"] {
font-size: 0.8em !important;
}
}
/* Ensure search input doesn't overflow */
.wy-side-nav-search form {
width: 100% !important;
margin: 0 !important;
}
/* Make search icon responsive */
.wy-side-nav-search .wy-dropdown {
width: 100% !important;
}
/* Adjust search results dropdown width */
.wy-side-nav-search .wy-dropdown-menu {
width: 100% !important;
max-width: none !important;
left: 0 !important;
right: 0 !important;
}
/* Resize handle is created by JavaScript */
/* Make sure the sidebar content doesn't overflow */
.wy-side-scroll {
width: 100% !important;
flex: 1 !important;
overflow-y: auto !important;
overflow-x: hidden !important;
padding-right: 10px !important;
box-sizing: border-box !important;
scroll-behavior: auto !important; /* Prevent smooth scrolling on sidebar itself */
}
/* Ensure proper scroll behavior for main content area */
html {
scroll-behavior: smooth !important;
}
/* Ensure anchor links work properly in main content */
.wy-nav-content-wrap {
scroll-behavior: smooth !important;
}
/* Fix scroll to target for anchor links */
.rst-content {
scroll-behavior: smooth !important;
}
/* Fix anchor scroll offset to account for fixed header */
.rst-content .section {
scroll-margin-top: 60px;
}
/* Fix anchor scroll offset for headers */
.rst-content h1, .rst-content h2, .rst-content h3, .rst-content h4, .rst-content h5, .rst-content h6 {
scroll-margin-top: 60px;
}
/* Fix anchor scroll offset for specific scroll targets */
.rst-content .headerlink {
scroll-margin-top: 60px;
}
/* Fix sidebar navigation styling */
.wy-menu-vertical {
width: 100% !important;
}
.wy-menu-vertical li {
width: 100% !important;
}
.wy-menu-vertical a {
width: 100% !important;
word-wrap: break-word !important;
white-space: normal !important;
}
/* Content area margin is handled by JavaScript */
/* Custom drag handle (more visible) */
.resize-handle {
position: absolute;
top: 0;
right: 0;
width: 8px;
height: 100%;
background: #ccc;
cursor: col-resize;
z-index: 1001;
opacity: 0.3;
transition: opacity 0.2s ease;
}
.resize-handle:hover {
opacity: 0.8;
background: #999;
}
.resize-handle::before {
content: '';
position: absolute;
top: 50%;
left: 50%;
width: 2px;
height: 20px;
background: #666;
transform: translate(-50%, -50%);
border-radius: 1px;
}
.resize-handle:hover::before {
background: #333;
}
/* Ensure smooth resizing */
.wy-nav-side.resizing {
user-select: none;
pointer-events: none;
}
.wy-nav-side.resizing .wy-side-scroll {
overflow: hidden;
}

251
docs/_static/js/resizable-sidebar.js vendored Normal file
View File

@ -0,0 +1,251 @@
// Resizable sidebar functionality
document.addEventListener('DOMContentLoaded', function() {
const sidebar = document.querySelector('.wy-nav-side');
const content = document.querySelector('.wy-nav-content-wrap');
if (!sidebar || !content) return;
// Create resize handle
const resizeHandle = document.createElement('div');
resizeHandle.className = 'resize-handle';
sidebar.appendChild(resizeHandle);
let isResizing = false;
let startX = 0;
let startWidth = 0;
// Get initial width
const getInitialWidth = () => {
return 300; // Default width
};
// Save width to localStorage
const saveWidth = (width) => {
localStorage.setItem('sidebar-width', width);
};
// Load width from localStorage
const loadWidth = () => {
const savedWidth = localStorage.getItem('sidebar-width');
if (savedWidth) {
const width = parseInt(savedWidth, 10);
if (width >= 200 && width <= 600) {
return width;
}
}
return getInitialWidth();
};
// Apply width to sidebar and content
const applyWidth = (width) => {
// Update sidebar width
sidebar.style.width = width + 'px';
// Update content margin with !important to override any CSS
content.style.setProperty('margin-left', width + 'px', 'important');
// Also update any other content wrapper that might exist
const contentInner = document.querySelector('.wy-nav-content');
if (contentInner) {
contentInner.style.setProperty('margin-left', '0px', 'important');
}
// Force reflow and repaint
sidebar.offsetHeight;
content.offsetHeight;
// Trigger window resize event to notify other components
window.dispatchEvent(new Event('resize'));
};
// Initialize with saved width
const initialWidth = loadWidth();
applyWidth(initialWidth);
// Mouse down on resize handle
resizeHandle.addEventListener('mousedown', (e) => {
isResizing = true;
startX = e.clientX;
startWidth = parseInt(window.getComputedStyle(sidebar).width, 10);
sidebar.classList.add('resizing');
document.body.style.cursor = 'col-resize';
document.body.style.userSelect = 'none';
// Add overlay to prevent iframe issues
const overlay = document.createElement('div');
overlay.style.cssText = `
position: fixed;
top: 0;
left: 0;
width: 100%;
height: 100%;
z-index: 9999;
cursor: col-resize;
`;
overlay.id = 'resize-overlay';
document.body.appendChild(overlay);
e.preventDefault();
});
// Mouse move
document.addEventListener('mousemove', (e) => {
if (!isResizing) return;
const width = startWidth + e.clientX - startX;
const clampedWidth = Math.max(200, Math.min(600, width));
applyWidth(clampedWidth);
});
// Mouse up
document.addEventListener('mouseup', () => {
if (!isResizing) return;
isResizing = false;
sidebar.classList.remove('resizing');
document.body.style.cursor = '';
document.body.style.userSelect = '';
// Remove overlay
const overlay = document.getElementById('resize-overlay');
if (overlay) {
overlay.remove();
}
// Save the current width
const currentWidth = parseInt(window.getComputedStyle(sidebar).width, 10);
saveWidth(currentWidth);
});
// Handle window resize - removed to prevent infinite loop
// The sidebar width is fixed and managed by drag functionality, no need to recalculate on window resize
// Double-click to reset to default width
resizeHandle.addEventListener('dblclick', () => {
const defaultWidth = 300;
applyWidth(defaultWidth);
saveWidth(defaultWidth);
});
});
// Fix navigation issues - Using MutationObserver for reliable initialization
document.addEventListener('DOMContentLoaded', function() {
let navigationFixed = false;
function setupNavigationFix() {
if (navigationFixed) return;
// Find all links in the sidebar
const sidebarLinks = document.querySelectorAll('.wy-menu-vertical a');
// Only proceed if we have sidebar links
if (sidebarLinks.length === 0) return;
console.log('Setting up navigation fix...');
sidebarLinks.forEach(function(link) {
const href = link.getAttribute('href');
// Clone the link to remove all existing event listeners
const newLink = link.cloneNode(true);
// Add our own click handler
newLink.addEventListener('click', function(e) {
console.log('Link clicked:', href);
// If it's an anchor link within the same page
if (href && href.startsWith('#') && href !== '#') {
e.preventDefault();
e.stopPropagation();
const targetId = href.substring(1);
const targetElement = document.getElementById(targetId);
if (targetElement) {
// Calculate offset for fixed header
const headerHeight = 60;
const elementPosition = targetElement.getBoundingClientRect().top;
const offsetPosition = elementPosition + window.pageYOffset - headerHeight;
window.scrollTo({
top: offsetPosition,
behavior: 'smooth'
});
// Update URL hash
if (history.pushState) {
history.pushState(null, null, '#' + targetId);
} else {
location.hash = '#' + targetId;
}
}
}
// For external links, navigate normally
else if (href && !href.startsWith('#') && !href.startsWith('javascript:')) {
console.log('Navigating to external link:', href);
window.location.href = href;
}
});
// Replace the old link with the new one
link.parentNode.replaceChild(newLink, link);
});
navigationFixed = true;
// Handle initial page load with hash
if (window.location.hash) {
// Use requestAnimationFrame for better timing
requestAnimationFrame(() => {
const targetId = window.location.hash.substring(1);
const targetElement = document.getElementById(targetId);
if (targetElement) {
const headerHeight = 60;
const elementPosition = targetElement.getBoundingClientRect().top;
const offsetPosition = elementPosition + window.pageYOffset - headerHeight;
window.scrollTo({
top: offsetPosition,
behavior: 'smooth'
});
}
});
}
}
// Try to set up navigation fix immediately
setupNavigationFix();
// If it didn't work, use MutationObserver to watch for when sidebar links are added
if (!navigationFixed) {
const observer = new MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
if (mutation.type === 'childList' && mutation.addedNodes.length > 0) {
// Check if sidebar links were added
const sidebarLinks = document.querySelectorAll('.wy-menu-vertical a');
if (sidebarLinks.length > 0) {
setupNavigationFix();
if (navigationFixed) {
observer.disconnect();
}
}
}
});
});
// Start observing the document for changes
observer.observe(document.body, {
childList: true,
subtree: true
});
// Fallback timeout in case MutationObserver doesn't work
setTimeout(function() {
if (!navigationFixed) {
setupNavigationFix();
}
observer.disconnect();
}, 5000);
}
});

Some files were not shown because too many files have changed in this diff Show More