Comparing 44ac51fa2b...061535208c - verl

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Author	SHA1	Message	Date
HEJIAN SANG	061535208c	[recipe] feat: Add example for gpt-oss training using agent loop (#3774 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test TODO: run training test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Hejian Sang <hsang@linkedin.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-15 16:45:11 +08:00
Chi Zhang	55f651c94d	[misc] feat: bump version to 0.7.0.dev (#3772 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-15 13:40:12 +08:00
Chi Zhang	22d082f9a4	[recipe] feat: add open math reasoning (#3767 ) ### What does this PR do? - Add open math reasoning recipe using sft trainer with model engine - Support setting none to val dataset in sft trainer - Fix main_eval - Using aiohttp for main_generation_server to avoid hang in AsyncOpenAI ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-15 12:11:41 +08:00
Chi Zhang	8ec9bf64a1	[ci] fix: fix test_engine ci (#3771 ) ### What does this PR do? - fix test_engine ci for latest transformers ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-15 12:11:17 +08:00
Chi Zhang	231d725f69	Revert "[trainer] feat: set interleave to False in dapo trainer" (#3770 ) Reverts volcengine/verl#3760	2025-10-15 11:41:33 +08:00
Chi Zhang	d69164e1cb	[misc] feat: bump version to 0.6.0.dev (#3768 ) ### What does this PR do? - Bump version ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-15 10:47:13 +08:00
Liu Yue	2181d5b33a	[recipe] fix: update readme for gmpo-trainer (#3764 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: 刘悦 <liuyue127@xiaohongshu.com>	2025-10-15 10:24:24 +08:00
Yan Bai	33eb86f54f	[megatron] feat: support qwen3vl (#3763 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. support training qwen3vl with megatron 1. add an image with vllm0.11 and nemo's dedicated megatron that support gpt-oss with optimized fused kernels. 2. add a script of training qwen3vl-30b with megatron 3. necessary changes to support qwen3vl megatron. (just register forward functions, the modeling is through mbridge) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. <img width="372" height="314" alt="image" src="https://github.com/user-attachments/assets/f1126e46-51a9-4e00-958f-5d034b8f94bd" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-15 10:19:22 +08:00
jiaqiw09	67f9a21b8e	[trainer] feat: set interleave to False in dapo trainer (#3760 ) ### What does this PR do? Set interleave to False. This way, during inference, if rollout.n is set to a large value, it can prevent multiple identical samples from being run on the same instance, which would otherwise lead to excessive inference overhead. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-14 21:13:57 +08:00
Sanxing Chen	d2c51dc186	Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl (#3756 ) [Meta-Bandit-LLM](https://github.com/sanxing-chen/meta-bandit-llm/) utilizes verl to train on-policy LLM agent with up to 50-turn interations, with support of async vLLM and LoRA. ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 12:01:13 +08:00
凪	16c2a21064	Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. (#3755 ) …verl to project list ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-14 10:51:32 +08:00
KAMiPan	3abcc09d44	[sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy (#3531 ) ### What does this PR do? This PR extends the one-step-off-policy recipe by adding SGLang as an alternative rollout engine to vLLM, allowing flexible backend selection and improving training efficiency. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/3460 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test To validate this solution, we adopted the existing experimental configuration from the recipe one-step-off-policy. The evaluation demonstrates that the proposed SGLang rollout engine integration achieves effective acceleration in one-step-off-policy asynchronous training, providing users with enhanced rollout engine options for diverse deployment scenarios. Experimental Results - Machine Configuration: 2 nodes with 16 H20 GPUs each - Generation: 4 GPUs - Training: 12 GPUs - Model: Qwen2.5-Math-7B - Max Response Length: 8,192 tokens - Algorithm: DAPO - Rollout Engine: vLLM, SGLang \| training mode \| engine \| step \| gen \| wait_prev_gen \| generate_sequences \| old_log_prob \| update_actor \| total time \| acc/best@32/mean \| acc/maj@32/mean \| \|------------------------\|----------------\|------\|-----\|---------------\|--------------------\|--------------\|--------------\|---------------\|------------------\|-----------------\| \| colocate sync \| SGLang+FSDP2 \| 452 \| 131 \| - \| 125 \| 54 \| 199 \| 12h25m \| 0.6560 \| 0.4471 \| \| one-step-overlap async \| SGLang+FSDP2 \| 406 \| - \| 12 \| 305 \| 58 \| 245 \| 11h12m (+11%) \| 0.6303 \| 0.4443 \| * colocate sync: step ≈ gen + old_log_prob + update_actor * one-step-overlap async: step ≈ max(wait_prev_gen + generate_sequences, old_log_prob + update_actor) <img width="1218" height="777" alt="image" src="https://github.com/user-attachments/assets/58734164-2534-492f-bf00-1e80faae0fe7" /> ### API and Usage Example Configuration Example ```bash # Using SGLang engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=sglang \ # ... other configuration parameters # Using vLLM engine python3 -m recipe.one_step_off_policy.main_ppo \ actor_rollout_ref.rollout.name=vllm \ # ... other configuration parameters ``` Script Usage ```bash # Using SGLang engine bash dapo_7b_math_fsdp2_sglang_4_12.sh bash dapo_7b_math_fsdp2_sglang_colocate.sh # Using vLLM engine bash dapo_7b_math_fsdp2_4_12.sh bash dapo_7b_math_fsdp2_colocate.sh ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>	2025-10-14 10:48:29 +08:00
Yingru Li	5d378b5f95	[rollout] refactor: rename "clip" mode back to "mask" mode (#3750 ) # Rollout Importance Sampling Framework related to https://github.com/volcengine/verl/pull/3694 ## Summary This PR introduces a comprehensive Rollout Importance Sampling (IS) framework to correct distribution mismatch between data-collecting (rollout) and training policies, a critical factor for ensuring stable and efficient model training in RL fine-tuning. This work is motivated by the analysis in our blog post, [When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda). If you find this implementation useful in your research, please consider citing: ```bibtex @misc{liu-li-2025, title = {When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch}, url = {https://yingru.notion.site/When-Speed-Kills-Stability-Demystifying-RL-Collapse-from-the-Inference-Training-Mismatch-271211a558b7808d8b12d403fd15edda}, author = {Jiacai Liu and Yingru Li and Yuqian Fu and Jiawei Wang and Qian Liu and Yu Shen}, year = {2025}, month = {September}, } ``` --- ## Problem Statement When using different policies for rollout generation (e.g., vLLM with BFloat16) and training (e.g., FSDP with FP32), distribution mismatch occurs, leading to: - Biased gradient estimates - Training instability and collapse - Reduced sample efficiency - Poor convergence properties This framework addresses these issues through principled importance sampling correction. --- ## Key Features & Improvements ### 1. Flexible Aggregation Levels Three methods for calculating IS weights: - `token`: Per-token importance ratios - `sequence`: Product of per-token ratios - `geometric`: Geometric mean of ratios ### 2. Advanced Bounding Modes Two strategies to control weight variance: - `truncate` (TIS): Caps weights at upper threshold only, preserving gradients - `mask` (MIS): Zeros out weights outside bounds, more aggressive filtering ### 3. Comprehensive Diagnostics Detailed metrics to monitor distribution mismatch and training health: Rollout IS Metrics (automatically prefixed with `mismatch/`): - Health indicators: `rollout_is_eff_sample_size`, `rollout_is_mean` - Distribution statistics: `rollout_is_p25`, `rollout_is_p50`, `rollout_is_p75`, `rollout_is_p95`, `rollout_is_p99`, `rollout_is_max`, `rollout_is_min`, `rollout_is_std` - Diagnostics: `rollout_is_veto_fraction`, `rollout_is_catastrophic_token_fraction`, `rollout_is_masked_fraction` (mask mode) - Sequence-level statistics (for sequence/geometric modes): `rollout_is_seq_mean`, `rollout_is_seq_std`, `rollout_is_seq_max`, `rollout_is_seq_min`, etc. Mismatch Metrics (computed efficiently within IS weight computation): - KL Divergence: `mismatch_kl` (forward KL), `mismatch_k3_kl` (K3 estimator for stability) - Perplexity: `mismatch_training_ppl`, `mismatch_rollout_ppl`, `mismatch_ppl_ratio` - Log perplexity statistics: `mismatch_log_ppl_diff`, `mismatch_log_ppl_abs_diff`, `mismatch_log_ppl_diff_max`, `mismatch_log_ppl_diff_min` ### 4. Outlier Mitigation - Veto mechanism: Automatically discards samples with catastrophic importance weights (per-token ratios below threshold) - Prevents gradient corruption from extreme outliers - Configurable threshold (default: 1e-4) ### 5. Numerical Stability - All core computations in log-space to prevent underflow/overflow - Carefully designed clamping and bounding to maintain numerical precision - Safe handling of edge cases (zero probabilities, extreme ratios) ### 6. Memory Efficiency - Optimized computation to minimize CUDA memory usage - Efficient metric aggregation without large intermediate tensors - Suitable for large-scale distributed training ### 7. Metrics-Only Mode - Compute and monitor mismatch metrics without applying IS weights - Useful for: - Understanding distribution mismatch before intervention - Deciding whether IS correction is needed - A/B testing IS impact - Controlled by `algorithm.rollout_is` flag (independent of weight computation) ### 8. Universal PPO Support - Integrated with all PPO variants: vanilla, GSPO, GPG, Clip-Cov, KL-Cov, geo_mean - Consistent interface across different policy loss functions - Automatic weight application when enabled --- ## API and Configuration Changes ### Migration from Legacy TIS #### ❌ Before (REMOVED) ```yaml # Old TIS configuration - NO LONGER SUPPORTED actor_rollout_ref: actor: tis_imp_ratio_cap: 2.0 # Removed from actor config ``` The legacy implementation: - Only supported token-level truncation - No metrics tracking - Lacked numerical stability - Limited configurability #### ✅ After (New Framework) Configuration moved to `algorithm` section for better organization: ```yaml algorithm: # Main on/off switch: null = disabled, float = enabled rollout_is_threshold: 2.0 # Control weight application (independent of metrics computation) rollout_is: true # true = apply weights, false = metrics only # Optional: lower threshold (defaults to 1/upper if null) rollout_is_threshold_lower: null # Aggregation level: "token", "sequence", or "geometric" rollout_is_level: token # Bounding mode: "truncate" or "mask" rollout_is_mode: truncate # Veto threshold for catastrophic outliers (null = disabled) rollout_is_veto_threshold: 1e-4 # REQUIRED: Enable log probability calculation actor_rollout_ref: rollout: calculate_log_probs: true ``` ### Configuration Examples 1. Token-level truncation (recommended starting point) ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: true rollout_is_level: token rollout_is_mode: truncate ``` 2. Sequence-level masking (more aggressive) ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: true rollout_is_level: sequence rollout_is_mode: mask ``` 3. Metrics-only mode (monitoring without correction) ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: false # Compute metrics but don't apply weights rollout_is_level: token rollout_is_mode: truncate ``` Example script: `bash examples/rollout_importance_sampling/run_with_rollout_is.sh` --- ## Code Changes Overview ### New Files (4 files, 1,442 lines) 1. `verl/trainer/ppo/mismatch_helper.py` (459 lines) - Core implementation of IS weight computation - Three aggregation levels: token, sequence, geometric - Two bounding modes: truncate, mask - Veto mechanism for outlier detection - Comprehensive metrics computation (IS + mismatch) - All computations in log-space for numerical stability - Memory-efficient design 2. `docs/advance/rollout_is_migration.md` (642 lines) - Comprehensive migration guide from legacy TIS - Detailed explanation of all configuration options - Recommended threshold ranges for each aggregation level - Troubleshooting guide and best practices - Metrics interpretation guide 3. `examples/rollout_importance_sampling/README.md` (242 lines) - Quick start guide with working examples - Configuration templates for common scenarios - Threshold tuning guidelines - Metrics monitoring instructions 4. `examples/rollout_importance_sampling/run_with_rollout_is.sh` (99 lines) - Complete working example script - Demonstrates token-level and sequence-level configurations - Ready to run with minimal modifications ### Modified Core Files (9 files) 1. `verl/trainer/ppo/core_algos.py` (~50 lines changed) - Removed legacy TIS logic (`tis_imp_ratio_cap`) - Added `rollout_is_weights` parameter to all policy loss functions - Unified IS weight application interface across all PPO variants: - `compute_policy_loss_vanilla` - `compute_policy_loss_gspo` - `compute_policy_loss_gpg` - `compute_policy_loss_clip_cov` - `compute_policy_loss_kl_cov` - `compute_policy_loss_geo_mean` - Special handling for `geo_mean` (sequence-level aggregation) 2. `verl/trainer/ppo/ray_trainer.py` (~52 lines added) - New method: `compute_rollout_importance_weights_and_add_to_batch()` - Centralized IS computation (once per batch, on driver) - Conditional weight distribution to workers based on `algorithm.rollout_is` - Metrics collection and aggregation - Integration with existing training loop 3. `verl/trainer/config/algorithm.py` (+18 lines) - Added 6 new Rollout IS parameters: - `rollout_is_threshold` (main on/off switch) - `rollout_is` (weight application control) - `rollout_is_threshold_lower` - `rollout_is_level` - `rollout_is_mode` - `rollout_is_veto_threshold` - Comprehensive docstrings explaining each parameter 4. `verl/workers/config/actor.py` (-1 line) - Removed deprecated `tis_imp_ratio_cap` parameter 5. `verl/workers/actor/dp_actor.py` (~26 lines changed) - Updated to use new `rollout_is_weights` parameter - Removed legacy TIS logic 6. `verl/workers/actor/megatron_actor.py` (~15 lines changed) - Updated to use new `rollout_is_weights` parameter - Removed legacy TIS logic 7. Configuration Files (4 files updated) - `verl/trainer/config/ppo_trainer.yaml` - `verl/trainer/config/ppo_megatron_trainer.yaml` - `verl/trainer/config/_generated_ppo_trainer.yaml` - `verl/trainer/config/_generated_ppo_megatron_trainer.yaml` - Added default Rollout IS configuration section with explanatory comments ### Testing (2 files, 530 lines) 1. `tests/trainer/ppo/test_rollout_is.py` (289 lines) - Unit tests for `mismatch_helper.py` - Coverage for all aggregation levels (token, sequence, geometric) - Coverage for all bounding modes (truncate, mask) - Veto mechanism tests - Edge case handling (zeros, extremes, empty sequences) - Numerical stability verification - Metrics correctness validation 2. `tests/trainer/ppo/test_rollout_is_integration.py` (241 lines) - Integration tests with PPO training loop - End-to-end workflow validation - Batch processing tests - Configuration validation - Metrics collection verification - Compatibility with distributed training ### Updated Recipes (2 files) 1. `recipe/dapo/dapo_ray_trainer.py` (+5 lines) - Updated imports to use new framework 2. `recipe/dapo/run_dapo_qwen2.5_32b_tis.sh` (~42 lines changed) - Migrated from legacy TIS to new Rollout IS configuration - Updated documentation and comments ### Documentation Updates (2 files) 1. `docs/examples/config.rst` (~22 lines changed) - Updated configuration examples - Added Rollout IS section 2. `docs/index.rst` (+1 line) - Added link to Rollout IS migration guide --- ## Implementation Highlights ### Centralized Architecture The new design follows a clean separation of concerns: ``` ray_trainer.py (driver) └─> compute_rollout_importance_weights_and_add_to_batch() └─> mismatch_helper.compute_rollout_importance_weights() ├─> Computes IS weights (token/sequence/geometric) ├─> Applies bounding (truncate/mask) ├─> Veto mechanism for outliers ├─> Computes IS metrics └─> Computes mismatch metrics (KL, PPL) └─> Conditionally adds weights to batch (if rollout_is=True) └─> Distributes batch to workers actor workers (dp_actor, megatron_actor) └─> Receive batch with rollout_is_weights (if enabled) └─> Pass weights to policy loss function core_algos.py └─> All policy loss functions accept rollout_is_weights └─> Apply weights if provided: pg_losses = rollout_is_weights ``` ### Key Design Decisions 1. Centralized Computation: IS weights computed once on driver, not per worker - Reduces redundant computation - Ensures consistency across workers - Simplifies debugging and metrics collection 2. Configuration in Algorithm: Moved from actor config to algorithm config - Better conceptual organization (algorithm-level concern, not worker-level) - Easier to manage and validate - Consistent with other algorithm parameters 3. Two-Level Control: - `rollout_is_threshold`: Enables/disables entire system (null = off) - `rollout_is`: Controls weight application (true = apply, false = metrics only) - Allows flexible monitoring and gradual rollout 4. Metrics Consolidation: Mismatch metrics computed within IS weight computation - Eliminates duplicate computation - Reduces memory overhead - Maintains metric accuracy 5. Universal PPO Support: Single interface for all PPO variants - Minimal code changes required - Consistent behavior across algorithms - Easy to add new variants --- ## Migration Guide ### For Users of Legacy TIS Step 1: Update your configuration file* ```yaml # OLD (remove this) actor_rollout_ref: actor: tis_imp_ratio_cap: 2.0 # NEW (add this) algorithm: rollout_is_threshold: 2.0 # Use same value as old tis_imp_ratio_cap rollout_is: true rollout_is_level: token rollout_is_mode: truncate # REQUIRED (add if not present) actor_rollout_ref: rollout: calculate_log_probs: true ``` Step 2: Monitor metrics The first time you run with the new configuration, check these metrics: - `mismatch/rollout_is_eff_sample_size`: Should be > 80% of batch size - `mismatch/rollout_is_veto_fraction`: Should be < 5% - `mismatch/rollout_is_mean`: Should be close to 1.0 Step 3: Tune if needed If effective sample size is too low: - Increase `rollout_is_threshold` - Try `rollout_is_mode: mask` with appropriate lower bound - Consider `rollout_is_level: sequence` for more aggressive correction For detailed guidance, see `docs/advance/rollout_is_migration.md`. ### For New Users Start with recommended defaults: ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: true rollout_is_level: token rollout_is_mode: truncate actor_rollout_ref: rollout: calculate_log_probs: true ``` Run the example script to see it in action: ```bash bash examples/rollout_importance_sampling/run_with_rollout_is.sh ``` --- ## Testing ### Unit Tests - 289 lines of comprehensive unit tests in `test_rollout_is.py` - Covers all aggregation levels, bounding modes, and edge cases - Validates numerical stability and correctness - Fast execution (~1-2 seconds) ### Integration Tests - 241 lines of integration tests in `test_rollout_is_integration.py` - End-to-end workflow with PPO training loop - Distributed training compatibility - Metrics collection validation - Moderate execution time (~10-20 seconds) ### Running Tests ```bash # Run all Rollout IS tests pytest tests/trainer/ppo/test_rollout_is.py -v pytest tests/trainer/ppo/test_rollout_is_integration.py -v # Run specific test pytest tests/trainer/ppo/test_rollout_is.py::test_token_level_truncate -v ``` --- ## Metrics Reference ### Rollout IS Metrics (all prefixed with `mismatch/`) \| Metric \| Description \| Ideal Range \| \|--------\|-------------\|-------------\| \| `rollout_is_eff_sample_size` \| Effective number of samples after IS \| > 80% of batch \| \| `rollout_is_mean` \| Mean IS weight \| ~1.0 \| \| `rollout_is_std` \| Standard deviation of IS weights \| Low variance \| \| `rollout_is_p25` \| 25th percentile \| ~0.8-1.0 \| \| `rollout_is_p50` \| Median IS weight \| ~1.0 \| \| `rollout_is_p75` \| 75th percentile \| ~1.0-1.2 \| \| `rollout_is_p95` \| 95th percentile \| < threshold \| \| `rollout_is_p99` \| 99th percentile \| < threshold \| \| `rollout_is_max` \| Maximum weight \| ≤ threshold \| \| `rollout_is_min` \| Minimum weight \| ≥ lower threshold (mask mode) \| \| `rollout_is_veto_fraction` \| % sequences vetoed \| < 5% \| \| `rollout_is_catastrophic_token_fraction` \| % catastrophic tokens \| < 1% \| \| `rollout_is_masked_fraction` \| % tokens masked (mask mode) \| Variable \| ### Mismatch Metrics (all prefixed with `mismatch/`) \| Metric \| Description \| What It Means \| \|--------\|-------------\|---------------\| \| `mismatch_kl` \| Forward KL divergence \| Distribution difference (rollout vs training) \| \| `mismatch_k3_kl` \| K3 KL estimator \| Stable KL estimate for small divergences \| \| `mismatch_training_ppl` \| Training policy perplexity \| Prediction difficulty of training policy \| \| `mismatch_rollout_ppl` \| Rollout policy perplexity \| Prediction difficulty of rollout policy \| \| `mismatch_ppl_ratio` \| Ratio of training to rollout PPL \| Relative prediction difficulty \| \| `mismatch_log_ppl_diff` \| Log perplexity difference \| Sequence-level PPL mismatch \| \| `mismatch_log_ppl_abs_diff` \| Absolute log PPL difference \| Magnitude of mismatch \| \| `mismatch_log_ppl_diff_max` \| Max log PPL difference \| Worst-case mismatch \| \| `mismatch_log_ppl_diff_min` \| Min log PPL difference \| Best-case mismatch \| \| `mismatch_training_log_ppl` \| Log of training PPL \| Log-scale training perplexity \| \| `mismatch_rollout_log_ppl` \| Log of rollout PPL \| Log-scale rollout perplexity \| --- ## Performance Impact ### Memory - Minimal overhead: ~1-2% increase in peak memory usage - Efficient log-space computation - No large intermediate tensors ### Computation - Negligible impact on training speed: < 1% overhead - Centralized computation on driver (no per-worker redundancy) - Optimized tensor operations ### Training Stability - Significant improvement in stability when distribution mismatch exists - Faster convergence in many scenarios - Reduced risk of training collapse --- ## Breaking Changes > [!IMPORTANT] > This PR contains BREAKING CHANGES to the configuration API. ### Removed - `actor_rollout_ref.actor.tis_imp_ratio_cap`: No longer supported ### Migration Required All users of the legacy TIS implementation must update their configuration files. See the migration guide above or `docs/advance/rollout_is_migration.md` for detailed instructions. ### Backward Compatibility - No backward compatibility with legacy TIS - Configuration files with `tis_imp_ratio_cap` will raise validation errors - Affected recipes have been updated in this PR --- ## Pre-Submission Checklist - [x] Search for similar PRs: [https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling](https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling) - [x] Format PR title as `[{modules}] {type}: {description}` (checked by CI) - Suggested title: `[BREAKING][rollout, trainer, algo] feat: implement comprehensive Rollout Importance Sampling framework` - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting) - [x] Add/update [documentation](https://github.com/volcengine/verl/tree/main/docs) (3 new docs, 2 updated) - [x] Add unit and integration tests (530 lines of tests) - [x] Once PR is ready for CI, send message in `ci-request` channel --- ## References - Blog post: [When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda) - Migration guide: `docs/advance/rollout_is_migration.md` - Examples: `examples/rollout_importance_sampling/` - Tests: `tests/trainer/ppo/test_rollout_is*.py`	2025-10-13 11:06:36 -07:00
Yingru Li	21271aabb9	[BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation (#3694 ) # Rollout Importance Sampling Framework ## Summary This PR introduces a comprehensive Rollout Importance Sampling (IS) framework to correct distribution mismatch between data-collecting (rollout) and training policies, a critical factor for ensuring stable and efficient model training in RL fine-tuning. This work is motivated by the analysis in our blog post, [When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda). If you find this implementation useful in your research, please consider citing: ```bibtex @misc{liu-li-2025, title = {When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch}, url = {https://yingru.notion.site/When-Speed-Kills-Stability-Demystifying-RL-Collapse-from-the-Inference-Training-Mismatch-271211a558b7808d8b12d403fd15edda}, author = {Jiacai Liu and Yingru Li and Yuqian Fu and Jiawei Wang and Qian Liu and Yu Shen}, year = {2025}, month = {September}, } ``` --- ## Problem Statement When using different policies for rollout generation (e.g., vLLM with BFloat16) and training (e.g., FSDP with FP32), distribution mismatch occurs, leading to: - Biased gradient estimates - Training instability and collapse - Reduced sample efficiency - Poor convergence properties This framework addresses these issues through principled importance sampling correction. --- ## Key Features & Improvements ### 1. Flexible Aggregation Levels Three methods for calculating IS weights: - `token`: Per-token importance ratios - `sequence`: Product of per-token ratios - `geometric`: Geometric mean of ratios ### 2. Advanced Bounding Modes Two strategies to control weight variance: - `truncate` (TIS): Caps weights at upper threshold only, preserving gradients - `clip` (CIS): Zeros out weights outside bounds, more aggressive filtering ### 3. Comprehensive Diagnostics Detailed metrics to monitor distribution mismatch and training health: Rollout IS Metrics (automatically prefixed with `mismatch/`): - Health indicators: `rollout_is_eff_sample_size`, `rollout_is_mean` - Distribution statistics: `rollout_is_p25`, `rollout_is_p50`, `rollout_is_p75`, `rollout_is_p95`, `rollout_is_p99`, `rollout_is_max`, `rollout_is_min`, `rollout_is_std` - Diagnostics: `rollout_is_veto_fraction`, `rollout_is_catastrophic_token_fraction`, `rollout_is_clipped_fraction` (clip mode) - Sequence-level statistics (for sequence/geometric modes): `rollout_is_seq_mean`, `rollout_is_seq_std`, `rollout_is_seq_max`, `rollout_is_seq_min`, etc. Mismatch Metrics (computed efficiently within IS weight computation): - KL Divergence: `mismatch_kl` (forward KL), `mismatch_k3_kl` (K3 estimator for stability) - Perplexity: `mismatch_training_ppl`, `mismatch_rollout_ppl`, `mismatch_ppl_ratio` - Log perplexity statistics: `mismatch_log_ppl_diff`, `mismatch_log_ppl_abs_diff`, `mismatch_log_ppl_diff_max`, `mismatch_log_ppl_diff_min` ### 4. Outlier Mitigation - Veto mechanism: Automatically discards samples with catastrophic importance weights (per-token ratios below threshold) - Prevents gradient corruption from extreme outliers - Configurable threshold (default: 1e-4) ### 5. Numerical Stability - All core computations in log-space to prevent underflow/overflow - Carefully designed clipping and bounding to maintain numerical precision - Safe handling of edge cases (zero probabilities, extreme ratios) ### 6. Memory Efficiency - Optimized computation to minimize CUDA memory usage - Efficient metric aggregation without large intermediate tensors - Suitable for large-scale distributed training ### 7. Metrics-Only Mode - Compute and monitor mismatch metrics without applying IS weights - Useful for: - Understanding distribution mismatch before intervention - Deciding whether IS correction is needed - A/B testing IS impact - Controlled by `algorithm.rollout_is` flag (independent of weight computation) ### 8. Universal PPO Support - Integrated with all PPO variants: vanilla, GSPO, GPG, Clip-Cov, KL-Cov, geo_mean - Consistent interface across different policy loss functions - Automatic weight application when enabled --- ## API and Configuration Changes ### Migration from Legacy TIS #### ❌ Before (REMOVED) ```yaml # Old TIS configuration - NO LONGER SUPPORTED actor_rollout_ref: actor: tis_imp_ratio_cap: 2.0 # Removed from actor config ``` The legacy implementation: - Only supported token-level truncation - No metrics tracking - Lacked numerical stability - Limited configurability #### ✅ After (New Framework) Configuration moved to `algorithm` section for better organization: ```yaml algorithm: # Main on/off switch: null = disabled, float = enabled rollout_is_threshold: 2.0 # Control weight application (independent of metrics computation) rollout_is: true # true = apply weights, false = metrics only # Optional: lower threshold (defaults to 1/upper if null) rollout_is_threshold_lower: null # Aggregation level: "token", "sequence", or "geometric" rollout_is_level: token # Bounding mode: "truncate" or "clip" rollout_is_mode: truncate # Veto threshold for catastrophic outliers (null = disabled) rollout_is_veto_threshold: 1e-4 # REQUIRED: Enable log probability calculation actor_rollout_ref: rollout: calculate_log_probs: true ``` ### Configuration Examples 1. Token-level truncation (recommended starting point) ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: true rollout_is_level: token rollout_is_mode: truncate ``` 2. Sequence-level clipping (more aggressive) ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: true rollout_is_level: sequence rollout_is_mode: clip ``` 3. Metrics-only mode (monitoring without correction) ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: false # Compute metrics but don't apply weights rollout_is_level: token rollout_is_mode: truncate ``` Example script: `bash examples/rollout_importance_sampling/run_with_rollout_is.sh` --- ## Code Changes Overview ### New Files (4 files, 1,442 lines) 1. `verl/trainer/ppo/mismatch_helper.py` (459 lines) - Core implementation of IS weight computation - Three aggregation levels: token, sequence, geometric - Two bounding modes: truncate, clip - Veto mechanism for outlier detection - Comprehensive metrics computation (IS + mismatch) - All computations in log-space for numerical stability - Memory-efficient design 2. `docs/advance/rollout_is_migration.md` (642 lines) - Comprehensive migration guide from legacy TIS - Detailed explanation of all configuration options - Recommended threshold ranges for each aggregation level - Troubleshooting guide and best practices - Metrics interpretation guide 3. `examples/rollout_importance_sampling/README.md` (242 lines) - Quick start guide with working examples - Configuration templates for common scenarios - Threshold tuning guidelines - Metrics monitoring instructions 4. `examples/rollout_importance_sampling/run_with_rollout_is.sh` (99 lines) - Complete working example script - Demonstrates token-level and sequence-level configurations - Ready to run with minimal modifications ### Modified Core Files (9 files) 1. `verl/trainer/ppo/core_algos.py` (~50 lines changed) - Removed legacy TIS logic (`tis_imp_ratio_cap`) - Added `rollout_is_weights` parameter to all policy loss functions - Unified IS weight application interface across all PPO variants: - `compute_policy_loss_vanilla` - `compute_policy_loss_gspo` - `compute_policy_loss_gpg` - `compute_policy_loss_clip_cov` - `compute_policy_loss_kl_cov` - `compute_policy_loss_geo_mean` - Special handling for `geo_mean` (sequence-level aggregation) 2. `verl/trainer/ppo/ray_trainer.py` (~52 lines added) - New method: `compute_rollout_importance_weights_and_add_to_batch()` - Centralized IS computation (once per batch, on driver) - Conditional weight distribution to workers based on `algorithm.rollout_is` - Metrics collection and aggregation - Integration with existing training loop 3. `verl/trainer/config/algorithm.py` (+18 lines) - Added 6 new Rollout IS parameters: - `rollout_is_threshold` (main on/off switch) - `rollout_is` (weight application control) - `rollout_is_threshold_lower` - `rollout_is_level` - `rollout_is_mode` - `rollout_is_veto_threshold` - Comprehensive docstrings explaining each parameter 4. `verl/workers/config/actor.py` (-1 line) - Removed deprecated `tis_imp_ratio_cap` parameter 5. `verl/workers/actor/dp_actor.py` (~26 lines changed) - Updated to use new `rollout_is_weights` parameter - Removed legacy TIS logic 6. `verl/workers/actor/megatron_actor.py` (~15 lines changed) - Updated to use new `rollout_is_weights` parameter - Removed legacy TIS logic 7. Configuration Files (4 files updated) - `verl/trainer/config/ppo_trainer.yaml` - `verl/trainer/config/ppo_megatron_trainer.yaml` - `verl/trainer/config/_generated_ppo_trainer.yaml` - `verl/trainer/config/_generated_ppo_megatron_trainer.yaml` - Added default Rollout IS configuration section with explanatory comments ### Testing (2 files, 530 lines) 1. `tests/trainer/ppo/test_rollout_is.py` (289 lines) - Unit tests for `mismatch_helper.py` - Coverage for all aggregation levels (token, sequence, geometric) - Coverage for all bounding modes (truncate, clip) - Veto mechanism tests - Edge case handling (zeros, extremes, empty sequences) - Numerical stability verification - Metrics correctness validation 2. `tests/trainer/ppo/test_rollout_is_integration.py` (241 lines) - Integration tests with PPO training loop - End-to-end workflow validation - Batch processing tests - Configuration validation - Metrics collection verification - Compatibility with distributed training ### Updated Recipes (2 files) 1. `recipe/dapo/dapo_ray_trainer.py` (+5 lines) - Updated imports to use new framework 2. `recipe/dapo/run_dapo_qwen2.5_32b_tis.sh` (~42 lines changed) - Migrated from legacy TIS to new Rollout IS configuration - Updated documentation and comments ### Documentation Updates (2 files) 1. `docs/examples/config.rst` (~22 lines changed) - Updated configuration examples - Added Rollout IS section 2. `docs/index.rst` (+1 line) - Added link to Rollout IS migration guide --- ## Implementation Highlights ### Centralized Architecture The new design follows a clean separation of concerns: ``` ray_trainer.py (driver) └─> compute_rollout_importance_weights_and_add_to_batch() └─> mismatch_helper.compute_rollout_importance_weights() ├─> Computes IS weights (token/sequence/geometric) ├─> Applies bounding (truncate/clip) ├─> Veto mechanism for outliers ├─> Computes IS metrics └─> Computes mismatch metrics (KL, PPL) └─> Conditionally adds weights to batch (if rollout_is=True) └─> Distributes batch to workers actor workers (dp_actor, megatron_actor) └─> Receive batch with rollout_is_weights (if enabled) └─> Pass weights to policy loss function core_algos.py └─> All policy loss functions accept rollout_is_weights └─> Apply weights if provided: pg_losses = rollout_is_weights ``` ### Key Design Decisions 1. Centralized Computation: IS weights computed once on driver, not per worker - Reduces redundant computation - Ensures consistency across workers - Simplifies debugging and metrics collection 2. Configuration in Algorithm: Moved from actor config to algorithm config - Better conceptual organization (algorithm-level concern, not worker-level) - Easier to manage and validate - Consistent with other algorithm parameters 3. Two-Level Control: - `rollout_is_threshold`: Enables/disables entire system (null = off) - `rollout_is`: Controls weight application (true = apply, false = metrics only) - Allows flexible monitoring and gradual rollout 4. Metrics Consolidation: Mismatch metrics computed within IS weight computation - Eliminates duplicate computation - Reduces memory overhead - Maintains metric accuracy 5. Universal PPO Support: Single interface for all PPO variants - Minimal code changes required - Consistent behavior across algorithms - Easy to add new variants --- ## Migration Guide ### For Users of Legacy TIS Step 1: Update your configuration file* ```yaml # OLD (remove this) actor_rollout_ref: actor: tis_imp_ratio_cap: 2.0 # NEW (add this) algorithm: rollout_is_threshold: 2.0 # Use same value as old tis_imp_ratio_cap rollout_is: true rollout_is_level: token rollout_is_mode: truncate # REQUIRED (add if not present) actor_rollout_ref: rollout: calculate_log_probs: true ``` Step 2: Monitor metrics The first time you run with the new configuration, check these metrics: - `mismatch/rollout_is_eff_sample_size`: Should be > 80% of batch size - `mismatch/rollout_is_veto_fraction`: Should be < 5% - `mismatch/rollout_is_mean`: Should be close to 1.0 Step 3: Tune if needed If effective sample size is too low: - Increase `rollout_is_threshold` - Try `rollout_is_mode: clip` with appropriate lower bound - Consider `rollout_is_level: sequence` for more aggressive correction For detailed guidance, see `docs/advance/rollout_is_migration.md`. ### For New Users Start with recommended defaults: ```yaml algorithm: rollout_is_threshold: 2.0 rollout_is: true rollout_is_level: token rollout_is_mode: truncate actor_rollout_ref: rollout: calculate_log_probs: true ``` Run the example script to see it in action: ```bash bash examples/rollout_importance_sampling/run_with_rollout_is.sh ``` --- ## Testing ### Unit Tests - 289 lines of comprehensive unit tests in `test_rollout_is.py` - Covers all aggregation levels, bounding modes, and edge cases - Validates numerical stability and correctness - Fast execution (~1-2 seconds) ### Integration Tests - 241 lines of integration tests in `test_rollout_is_integration.py` - End-to-end workflow with PPO training loop - Distributed training compatibility - Metrics collection validation - Moderate execution time (~10-20 seconds) ### Running Tests ```bash # Run all Rollout IS tests pytest tests/trainer/ppo/test_rollout_is.py -v pytest tests/trainer/ppo/test_rollout_is_integration.py -v # Run specific test pytest tests/trainer/ppo/test_rollout_is.py::test_token_level_truncate -v ``` --- ## Metrics Reference ### Rollout IS Metrics (all prefixed with `mismatch/`) \| Metric \| Description \| Ideal Range \| \|--------\|-------------\|-------------\| \| `rollout_is_eff_sample_size` \| Effective number of samples after IS \| > 80% of batch \| \| `rollout_is_mean` \| Mean IS weight \| ~1.0 \| \| `rollout_is_std` \| Standard deviation of IS weights \| Low variance \| \| `rollout_is_p25` \| 25th percentile \| ~0.8-1.0 \| \| `rollout_is_p50` \| Median IS weight \| ~1.0 \| \| `rollout_is_p75` \| 75th percentile \| ~1.0-1.2 \| \| `rollout_is_p95` \| 95th percentile \| < threshold \| \| `rollout_is_p99` \| 99th percentile \| < threshold \| \| `rollout_is_max` \| Maximum weight \| ≤ threshold \| \| `rollout_is_min` \| Minimum weight \| ≥ lower threshold (clip mode) \| \| `rollout_is_veto_fraction` \| % sequences vetoed \| < 5% \| \| `rollout_is_catastrophic_token_fraction` \| % catastrophic tokens \| < 1% \| \| `rollout_is_clipped_fraction` \| % tokens clipped (clip mode) \| Variable \| ### Mismatch Metrics (all prefixed with `mismatch/`) \| Metric \| Description \| What It Means \| \|--------\|-------------\|---------------\| \| `mismatch_kl` \| Forward KL divergence \| Distribution difference (rollout vs training) \| \| `mismatch_k3_kl` \| K3 KL estimator \| Stable KL estimate for small divergences \| \| `mismatch_training_ppl` \| Training policy perplexity \| Prediction difficulty of training policy \| \| `mismatch_rollout_ppl` \| Rollout policy perplexity \| Prediction difficulty of rollout policy \| \| `mismatch_ppl_ratio` \| Ratio of training to rollout PPL \| Relative prediction difficulty \| \| `mismatch_log_ppl_diff` \| Log perplexity difference \| Sequence-level PPL mismatch \| \| `mismatch_log_ppl_abs_diff` \| Absolute log PPL difference \| Magnitude of mismatch \| \| `mismatch_log_ppl_diff_max` \| Max log PPL difference \| Worst-case mismatch \| \| `mismatch_log_ppl_diff_min` \| Min log PPL difference \| Best-case mismatch \| \| `mismatch_training_log_ppl` \| Log of training PPL \| Log-scale training perplexity \| \| `mismatch_rollout_log_ppl` \| Log of rollout PPL \| Log-scale rollout perplexity \| --- ## Performance Impact ### Memory - Minimal overhead: ~1-2% increase in peak memory usage - Efficient log-space computation - No large intermediate tensors ### Computation - Negligible impact on training speed: < 1% overhead - Centralized computation on driver (no per-worker redundancy) - Optimized tensor operations ### Training Stability - Significant improvement in stability when distribution mismatch exists - Faster convergence in many scenarios - Reduced risk of training collapse --- ## Breaking Changes > [!IMPORTANT] > This PR contains BREAKING CHANGES to the configuration API. ### Removed - `actor_rollout_ref.actor.tis_imp_ratio_cap`: No longer supported ### Migration Required All users of the legacy TIS implementation must update their configuration files. See the migration guide above or `docs/advance/rollout_is_migration.md` for detailed instructions. ### Backward Compatibility - No backward compatibility with legacy TIS - Configuration files with `tis_imp_ratio_cap` will raise validation errors - Affected recipes have been updated in this PR --- ## Pre-Submission Checklist - [x] Search for similar PRs: [https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling](https://github.com/volcengine/verl/pulls?q=is%3Apr+importance+sampling) - [x] Format PR title as `[{modules}] {type}: {description}` (checked by CI) - Suggested title: `[BREAKING][rollout, trainer, algo] feat: implement comprehensive Rollout Importance Sampling framework` - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting) - [x] Add/update [documentation](https://github.com/volcengine/verl/tree/main/docs) (3 new docs, 2 updated) - [x] Add unit and integration tests (530 lines of tests) - [x] Once PR is ready for CI, send message in `ci-request` channel --- ## References - Blog post: [When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch](https://yingru.notion.site/When-Speed-Kills-Stability-271211a558b7808d8b12d403fd15edda) - Migration guide: `docs/advance/rollout_is_migration.md` - Examples: `examples/rollout_importance_sampling/` - Tests: `tests/trainer/ppo/test_rollout_is*.py` --------- Co-authored-by: Yan Bai <bayan@nvidia.com>	2025-10-13 17:05:29 +08:00
yangbaoxing	7f27789961	[fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739 ) ### What does this PR do? > Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to align with Hugging Face Trainer API。 The following pull request is for refactoring the optimizer, however, the naming issue persists. https://github.com/volcengine/verl/pull/3656 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: weiqi.li <weiqi.li@bytedance.com>	2025-10-13 15:58:59 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	e9ee6b39c6	[model] fix: qwen3vl models shape mismatch error with SP (#3735 )	2025-10-13 13:09:10 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	9d4554b931	[model] fix: qwen3vl training stuck with mixed text-image data (#3734 )	2025-10-13 13:08:13 +08:00
Chi Zhang	71cf69e7ad	[ci] feat: increase sft e2e time (#3738 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-13 11:29:39 +08:00
Houmin Wei	7ddb9b29f0	[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 (#3600 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR continues the work started in PR #3567 by deprecating and removing the left_right padding mode 1. Implement no-padding mode for Megatron engine using nested tensors in sft trainer 2. Deprecating left_right padding mode for FSDP/Megatron engine 3. Introduces a transformation layer within Actor/Critic workers, see more [here](https://github.com/volcengine/verl/blob/main/docs/workers/model_engine.rst) - Input Format: Actor/Critic workers continue to receive data in left_rightpadded format. - Transformation: This layer dynamically converts left_rightpadded data into the no-padding format using nested tensors. - Engine Format: FSDP and Megatron engines now operate exclusively using the no-padding data format by default. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-13 08:18:09 +08:00
Chi Zhang	8cc9e3af67	[misc] feat: support offline generation with server mode (#3732 )	2025-10-12 11:00:33 +08:00
Huazhong	f07596c02e	[misc] feat: support build DataProto from TensordDict (#3726 ) ### What does this PR do? Add a utility function to support building DataProto from TensorDict, which helps integrate TransferQueue into verl. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-11 17:28:18 +08:00
Peng Wu	656f4e6705	[rollout] chore: Misc changes for extending internal compatibility (#3701 ) ### What does this PR do? * New config field: * rollout: `pipeline_model_parallel_size` for internal compatibility * ~~legacy_data: `agent_name` for default agent name if not specified in the rldataset~~ * Registry for `RolloutReplica` * `VERL_USE_EXTERNAL_MODULES` to import desired modules to trigger external registration ### Test Be covered by CI ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-11 16:08:39 +08:00
Joel	d36d3b9cbe	[rollout] feat: add default agent name for agent loop (#3716 ) ### What does this PR do? Add `default_agent_loop` config if `agent_name` is absent in RLDataset.	2025-10-11 14:45:30 +08:00
HEJIAN SANG	e960fbaeab	[rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models (#3705 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Add gpt-oss tool parser to enable agent loop training for gpt-oss models ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Manually test offline. Let me know if we want to add unit tests. > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Hejian Sang <hsang@linkedin.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-11 11:53:10 +08:00
Pouria Mistani	d87602432c	[fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync (#3712 ) ## Description When `peft_config` is set and `base_sync_done` is `True`, `per_tensor_param` is assigned directly from the `params` dict instead of `params.items()`, causing `ValueError: too many values to unpack (expected 2)` when passed to `get_named_tensor_buckets()` which expects an iterator of `(name, tensor)` tuples. This fix adds an `isinstance()` check to handle both dict and iterator cases, maintaining backward compatibility while fixing SGLang rollout with LoRA adapters. Fixes: `ValueError` in `sglang_rollout.update_weights()` → `get_named_tensor_buckets()` Related: Multi-turn RL training with LoRA adapters on SGLang backend --- ### What does this PR do? This PR fixes a type mismatch bug in `fsdp_workers.py` that occurs when using LoRA adapters with SGLang backend. The issue manifests during weight synchronization when FSDP workers attempt to pass parameters to the bucket creation function. Root Cause: Line 681 in `verl/workers/fsdp_workers.py` assigns `params` dict directly to `per_tensor_param`, but downstream code at line 1520 in `get_named_tensor_buckets()` expects an iterator of `(name, tensor)` tuples for unpacking. Solution: Add backward-compatible `isinstance()` check that converts dict to `.items()` iterator when needed: ```python per_tensor_param = params.items() if isinstance(params, dict) else params	2025-10-10 21:58:30 +08:00
jiaqiw09	e01376663b	[megatron] feat: add ascend megatron merge support (#3722 ) ### What does this PR do? add ascend megatron merge support ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-10 21:54:27 +08:00
EduardDurech	152ce6a1de	[misc] fix: Allow HF model ID with `use_shm` (#3663 )	2025-10-10 13:44:53 +08:00
Changlong Yu	2d72c52e1b	[misc] fix: model reassign to inner model in vllm patch file (#3668 ) ### What does this PR do? The `model` has been re-assigned to its inner model `model.model` so it does not have `layers` . fixed the reassign issue and refactor the code logic. `f50e5c2e8f/verl/utils/vllm/patch.py (L83-L87)` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/issues/2834 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-10 12:13:49 +08:00
Yingru Li	eb06fda2a9	[data] fix: merge metrics from all workers in DataProto.concat() (#3699 ) ## Summary Fix `DataProto.concat()` to properly merge all `meta_info` keys from all workers, preventing silent data loss when workers have different non-metric keys. ## Problem Previous implementation only preserved non-metric `meta_info` from the first worker: ```python # Old code - only looks at data[0] merged_meta_info = {k: v for k, v in data[0].meta_info.items() if k != "metrics"} ``` This caused silent data loss when workers had different non-metric keys: - `data[0].meta_info = {"config": "A"}` ✓ preserved - `data[1].meta_info = {"extra_key": "B"}` ❌ lost - Result: `{"config": "A"}` - missing `extra_key` This contradicts the docstring which states meta_info is "merged". ## Solution This PR iterates through ALL workers to merge their non-metric meta_info while aggregating metrics: ```python # Merge non-metric meta_info and aggregate metrics from all workers all_metrics = [] for d in data: for k, v in d.meta_info.items(): if k == "metrics": if v is not None: if isinstance(v, list): all_metrics.extend(v) else: all_metrics.append(v) else: if k in merged_meta_info: # Ensure consistency for overlapping non-metric keys assert merged_meta_info[k] == v, f"Conflicting values for meta_info key '{k}'" else: merged_meta_info[k] = v if all_metrics: merged_meta_info["metrics"] = all_metrics ``` Key improvements: - ✅ All non-metric keys from all workers are preserved - ✅ Detects conflicting values for the same key across workers - ✅ Aggregates metrics from all workers in a single loop - ✅ Handles edge cases: missing metrics, non-list values ## Testing Added 6 comprehensive unit tests in `tests/test_protocol_on_cpu.py`: - `test_concat_metrics_from_multiple_workers` - All workers have metrics - `test_concat_with_empty_and_non_list_meta_info` - Partial metrics coverage - `test_concat_first_worker_missing_metrics` - First worker has no metrics - `test_concat_non_list_metrics` - Single dict instead of list - `test_concat_merge_different_non_metric_keys` - Different keys across workers - `test_concat_conflicting_non_metric_keys` - Conflict detection ## Files Changed - `verl/protocol.py`: Updated `DataProto.concat()` to merge all meta_info keys - `tests/test_protocol_on_cpu.py`: Added 2 new tests (6 total) covering all edge cases --- ### Checklist - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) - [x] Pre-commit checks passed (ruff, mypy, etc.) - [x] Documentation updated (N/A - bug fix, no API changes) - [x] Unit tests added (4 comprehensive tests covering all edge cases) - [ ] CI request (pending)	2025-10-10 11:45:08 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	7ffd413734	[megatron, model] fix: VLMs using mbridge together with fused kernels (#3700 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Current code is too rigid in checking support of fused forward and will go wrong if we use mbridge and even it's a Qwen2_5VLModel, since then the defined Qwen2.5VL multi-modal model class will be from class definition in mbridge instead of the one in verl. Also, many other VLMs supported in mbridge uses the `language_model` attribute, and we just need to ensure that `model.language_model` is an instance of mcore defined `GPTModel`, which should be a more flexible and applicable way for checking support. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-10 11:05:32 +08:00
OC	cf619d68d4	[recipe] fix: move all collabllm files into recipe directory (#3706 ) ### What does this PR do? resolve issue https://github.com/volcengine/verl/issues/3606 1. move and register reward manager into custom_reward_function file 2. register agent loop in agent.yaml 3. move collabllm_interation.py into recipe ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ``` (TaskRunner pid=52293) step:3 - global_seqlen/min:56551 - global_seqlen/max:94884 - global_seqlen/minmax_diff:38333 - global_seqlen/balanced_min:72054 - ``` ### API and Usage Example n/a ### Design & Code Changes n/a ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-09 18:50:37 +08:00
Huazhong	23877bcc64	[worker] fix: create a new event loop if none exists (#3703 ) ### What does this PR do? I am working on integrating transferqueue into verl. Specifically, we convert metadata into dataproto in the `register` method of `single_controller/base/decorator.py/`. In this step, `asyncio.run(tq_client.async_get_data(metadata)` is called to get the specific data. If `asyncio.run` and `asyncio.get_event_loop` are called sequentially in the same thread, a RuntimeError: `There is no current event loop in thread %r` is thrown. This PR fixes the above-mentioned issue. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-09 17:11:58 +08:00
Hongpeng Guo	e56e3df071	[worker] refactor: Add `kwargs` to checkpoint related functions in `BaseEngine` and its subclasses (#3662 ) ### What does this PR do? Add `**kwargs` to the checkpoint APIs of `BaseEngine` (and thread them through `FSDPEngine`/`MegatronEngine`) to allow engines and pluggable checkpoint backends to accept implementation-specific options without changing the common interface. This enables extension when users subclass `BaseEngine` or integrate internal engines, while preserving backward compatibility—existing calls remain unchanged and extra keys are simply ignored unless a subclass consumes them. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: wuxibin <wuxibin@bytedance.com>	2025-10-09 14:56:22 +08:00
xichengpro	54fed7fec7	[rollout] feat: support async mode for multimodal data inference (#3702 ) ### What does this PR do? fix https://github.com/volcengine/verl/issues/3518 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-09 14:11:09 +08:00
mgilmore-relace	f06ef09f1c	[rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config (#3675 ) ### What does this PR do? > Bug fix for #3654 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-09 11:48:32 +08:00
Pandeng Yao	fc489dbaef	[rollout] fix: add batch_data_id default value check in AsyncRolloutRequest (#3657 ) ### What does this PR do? This PR improves the robustness of the initialize_request method in verl/workers/rollout/schemas.py. When input_ids exceed max_prompt_len, if the batch_data_id field is missing from values, it will be automatically populated with the default value. This prevents errors during logging and enhances fault tolerance in data processing, making future extension and troubleshooting more convenient. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: yaopandeng <yaopandeng@baidu.com>	2025-10-09 10:56:10 +08:00
HEJIAN SANG	d45d04946b	[rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout (#3661 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. The problem is in the `get_tool_call_parser_type` function in sglang_rollout.py (lines 225-246). The function is checking if `parser.bot_token.strip()` exists as a single token in the tokenizer's vocabulary, but for the gpt-oss parser type, the bot_token is `<\|start\|>assistant<\|channel\|>commentary`, which is a compound token sequence rather than a single special token. For gpt-oss models, ``` parser.bot_token.strip() = <\|start\|>assistant<\|channel\|>commentary This gets tokenized as [200006, 173781, 200005, 12606, 815] (5 tokens) ``` The check parser.bot_token.strip() in tokenizer_vocab returns False because it's looking for this entire string as a single vocabulary entry The current logic assumes that bot_token should be a single special token that exists in the vocabulary, but for GPT-OSS models, it's actually a sequence of tokens that need to be tokenized. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test unit test offline > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Hejian Sang <hsang@linkedin.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-09 10:51:37 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	baf7506cff	[worker] fix: support for vllm V0 deprecation version (#3687 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Related to: - https://github.com/vllm-project/vllm/pull/25901 - https://github.com/vllm-project/vllm/pull/25345 Now we first try to import `WorkerWrapperBase` from `vllm.worker.worker_base`, if we have an error, we append `v1` there. For `compute_logits` patch, we can just remove the import of `SamplingMetadata`, create a wrapper that accepts any arguments with args, *kwargs, and pass them through to the original method, so that it can be more flexible and future-proof. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-09 10:44:31 +08:00
Puneesh Khanna	798a6f8ba0	[trainer] feat: Enabled fused adamw (#3692 ) ### What does this PR do? Enable fused adamw which should be generally faster and more memory efficient. Also provide the config parameter to set eps. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test In our internal code base, we always set fused adamw to True and works fine. However right now, I don't have the step time comparison with and without it. At the same time, I can push same change in RL code too and setting to True should be generally beneficial. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-08 08:13:46 +13:00
Yaowei Zheng	ab10eb2671	[model] fix: qwen3vl patch (#3686 )	2025-10-07 08:32:53 +13:00
Chi Zhang	7904d0b672	[ci] fix: fix checkpoint converter ci (#3685 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-06 19:42:47 +13:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	1216ce4599	[ci] fix: merge pre-commit-full into pre-commit (#3684 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Right now, the pre-commit-full will always fail as it doesn't install related dependencies: https://github.com/volcengine/verl/actions/runs/18251414892 And there's no reason to duplicate pre-commit for pre-commit-full as they are defined as same workflow, so this PR moves manual triggers and schedule to pre-commit and remove pre-commit-full ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-06 15:56:11 +13:00
Yaowei Zheng	42c55ac6b3	[model] feat: add qwen3vl (#3681 ) ### What does this PR do? Add qwen3vl models Fixes #3607 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-06 15:21:19 +13:00
m-Just	327e813136	[rollout] fix: qwen2_vl position_ids shape mismatch (#3653 ) ### What does this PR do? > Fix qwen2_vl position_id shape mismatch: `verl/models/transformers/qwen2_vl.py:process_position_ids` expects `position_ids` to have a shape of `(4, batch_size, seq_length)` but `verl/experimental/agent_loop/agent_loop.py:generate_sequences` returns `(batch_size, 3, seq_length)` (which will be transposed to `(3, batch_size, seq_length)`), ignoring the text dimension. This PR follows the relevant code in `verl/utils/dataset/rl_dataset.py` to fix the issue. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). Not applicable. - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-05 16:03:12 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	83aebcc133	[ci] fix: disable workflows with self-host machines to run on fork (#3677 )	2025-10-04 22:02:41 +13:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4e9faafc94	[model] fix: stuck issue with mixed text-image data (#3670 )	2025-10-04 12:47:09 +13:00
lbk-sys	f50e5c2e8f	[sglang] feat: add preparation for sglang+verl (#3506 ) ### What does this PR do? support npu for verl + sglang ```python bash examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_npu.sh ``` ### Accuracy test 8b： <img width="747" height="842" alt="8b" src="https://github.com/user-attachments/assets/f36ef25a-b32f-4c76-97d0-2e5fe53ff183" /> 30b： <img width="759" height="850" alt="30b" src="https://github.com/user-attachments/assets/97979002-7ebf-47fa-ae57-3e9b6637f12c" /> ### Test ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Signed-off-by: lbk-sys <hello_lbk@163.com> Co-authored-by: 1StepForever <wangww1Step@foxmail.com>	2025-09-29 10:21:01 +08:00
jiaqiw09	aa19c1afc4	[recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo (#3644 ) ### What does this PR do? - Add example scirpt to run mutip-turn grpo in vllm and fsdp - fix progressbar in dapo trainer - When enable_filter is enabled, DAPO runs multiple batch inferences before each actor update, but the progress bar advances once per inference—mismatching the true training step count and leading to confusion. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-28 20:28:25 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	9e2072d120	[megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 (#3640 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Implementation refers to `b600e38d7b (diff-ff40c5dfa6c8106a478517375d98bc4e548ff71bcc3e5b25a4c1cc540f31ed3a)` Use `hasattr(parallel_state, "is_inside_encoder")` for backward compatibility. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-09-28 12:59:32 +08:00
Kion Fallah	39e531f29e	[rollout,vllm] fix: Add LoRA Loading to Async vLLM (#3639 ) ### What does this PR do? Currently, async vLLM with AgentWorkerLoop throws an error when `update_weights` with LoRA weights. This expands support for AgentWorkerLoop with LoRAs. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-28 10:13:40 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	abca659ec7	[megatron, worker] fix: use `extract_multi_modal_inputs` method for handling `multi_modal_inputs` (#3641 ) Follow up for #3553 ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Without those changes in #3315, the error when we train the mixture modal dataset will remain unresolved, so it would be a good idea to add them back. ```logs File "verl/workers/actor/megatron_actor.py", line 639, in update_policy metric_micro_batch = self.forward_backward_batch( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "verl/workers/actor/megatron_actor.py", line 587, in forward_backward_batch losses_reduced = forward_backward_func( ^^^^^^^^^^^^^^^^^^^^^^ File "megatron/core/pipeline_parallel/schedules.py", line 595, in forward_backward_no_pipelining output_tensor, num_tokens = forward_step( ^^^^^^^^^^^^^ File "megatron/core/pipeline_parallel/schedules.py", line 402, in forward_step output_tensor, loss_func = forward_step_func(data_iterator, model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "verl/workers/actor/megatron_actor.py", line 497, in forward_step multi_modal_inputs[key] = torch.cat( ^^^^^^^^^^ RuntimeError: torch.cat(): expected a non-empty list of Tensors ``` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-09-28 10:08:51 +08:00
CedricHuang	4ff3ce2fed	[algo, perf] feat: Vectorize GRPO Advantage Estimator - 13～26x Speedup (#3635 ) ### What does this PR do? Implements a vectorized GRPO advantage path for outcome-only RL in core_algos.py, keeping the original implementation intact and selectable. This yields large speedups at medium–large batch sizes by replacing Python-side grouping loops with segment reductions and one-pass gathers. Results (CPU, Apple M-series example; float32): ```shell [CPU] bs= 512 T= 512 G= 10 \| orig=5.47ms vec=0.21ms speedup=26.16x [CPU] bs= 1024 T=1024 G= 16 \| orig=11.05ms vec=0.54ms speedup=20.60x [CPU] bs= 2048 T=2048 G= 32 \| orig=23.20ms vec=1.74ms speedup=13.32x ``` ```shell [GRPO] seed=0 groups=5 shape=torch.Size([64, 128]) mask_tokens=4147 adv_max_diff=2.384e-07 ret_max_diff=2.384e-07 [GRPO] seed=1 groups=8 shape=torch.Size([128, 256]) mask_tokens=16364 adv_max_diff=2.384e-07 ret_max_diff=2.384e-07 [GRPO] seed=2 groups=10 shape=torch.Size([512, 512]) mask_tokens=130968 adv_max_diff=4.768e-07 ret_max_diff=4.768e-07 ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: #3634 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-27 17:21:08 +08:00
Lambert	c03dcb0f8f	[model] feat: add glm4v (#3291 ) ### What does this PR do? Add GLM4.1V support ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: 武嘉涵 <lambert@wujiahandeMacBook-Pro.local> Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-09-27 04:12:14 +08:00
Joel	84d5619f99	[2/N][rollout] feat: support vllm/sglang DP+EP in server mode (#3530 ) ### What does this PR do? Following https://github.com/volcengine/verl/pull/3456, support vllm/sglang DP+EP in server mode.	2025-09-26 21:52:03 +08:00
A1waysBeenHere	64a9860be2	[trainer] fix: Ref to #3596 . More import fix for transformers version higher than 4.55.0 (#3608 ) ### What does this PR do? Ref to #3596, more import fix for transformers version higher than 4.55.0 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-26 21:37:46 +08:00
Yuyang Ding	e51305883d	[rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas (#3625 ) ### What does this PR do? To enable reusing the vllm/sglang rollout replica for the reward model, I made some modifications to the rollout and reward configuration. Following PR will implement the reuse. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-26 17:43:45 +08:00
Huazhong	2234810235	[megatron] feat: add mindspeed engine and support sft (#3599 ) ### What does this PR do? As per title. Co-authored with @baymax591 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: baymax591 <cbai@mail.nwpu.edu.cn>	2025-09-26 14:39:10 +08:00
Zhichao Wang	377bbb84f0	[recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation (#3369 ) Fix a typo in verl/workers/fsdp_workers.py: original code: if self.model_config.generation_config is not None updated code: if self.generation_config is not None Add async of generation reward model (GRM): As the generative reward model is slow in the call. It is unreasonable to wait for all responses to be generated before sending to GRM for evaluation. So I add an async to start GRM evaluation once individual response generation is finished. --------- Co-authored-by: zhichao (jimmy) <zhichao@inflection.ai>	2025-09-26 13:22:00 +08:00
Huazhong	096ab6dc1b	[CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error (#3631 ) ### What does this PR do? As per title. This PR is a temporary workaround for the following issues： https://github.com/volcengine/verl/actions/runs/18013408026/job/51251922982?pr=3625 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-26 13:20:40 +08:00
Huazhong	231e18948d	[tool] feat: support load local datasets when preparing datasets (#3621 ) ### What does this PR do? This is a follow-up PR to https://github.com/volcengine/verl/pull/3362 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python python examples/data_preprocess/hellaswag.py --local_dataset_path ~/verl/data/hellaswag/ --local_save_dir ~/verl/data/hellaswag_sft ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-26 11:42:53 +08:00
Chi Zhang	fbfdc81f9a	[ci] feat: increase timeout of e2e_sft (#3630 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-26 10:23:25 +08:00
Joel	6ff2b43d13	[ci] feat: upgrade sglang to 0.5.2 (#3613 ) ### What does this PR do? Solve https://github.com/volcengine/verl/pull/3530#issuecomment-3332840437	2025-09-26 09:25:53 +08:00
FlowRays	14c397f474	[doc] feat: Adding Table-R1 to the Awesome work (#3627 )	2025-09-25 23:26:26 +08:00
Chi Zhang	21536f2b03	[ci] fix: fix sanity ci (#3626 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 23:15:10 +08:00
Chi Zhang	515f2255ac	[ci] fix: use local models/configs/datasets to increase stability (#3616 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 22:14:56 +08:00
Qizhi Chen	bf7aac2fa7	[rollout, tool] feat: export rollout rewards to total rewards (#3563 ) ### What does this PR do? This PR exports rollout rewards including tool calling rewards and interaction rewards to `compute_score` fn. Currently, rollout reward_scores is calculated but not used in the final `compute_score`. `96e7071de1/verl/workers/rollout/sglang_rollout/sglang_rollout.py (L1320-L1324)` Fix #3525 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 17:33:03 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	616e933e29	[worker] fix: correctly determine is_vlm_model if sp > 1 (#3282 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Address 2nd issue in https://github.com/volcengine/verl/pull/3281#issuecomment-3239570745 Currently, if we use ulysses sp, we rely on `multi_modal_inputs` to check if it's a multi-modal model, but this can go wrong when we set `data.return_multi_modal_inputs=False`, as that field won't exist even if it's the VLM model. As a result, it would be a reliable way to check by seeing if `vision_config` field is in `self.actor_module.config` referring to `1985eb14ff/verl/workers/fsdp_workers.py (L317-L320)` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-09-25 17:21:40 +08:00
Chi Zhang	90154aeeb6	[doc] fix: fix doc (#3614 ) ### What does this PR do? - Fix url ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 16:11:43 +08:00
mgilmore-relace	7731c5c6ec	[rollout] fix: remove code responsible for tool response duplication (#3604 ) ### What does this PR do? > The `_handle_processing_tools_state` added the same tool response twice when using interactions. See [here](`ba8555120a/verl/experimental/agent_loop/tool_agent_loop.py (L273)`) and [here](`ba8555120a/verl/experimental/agent_loop/tool_agent_loop.py (L297)`). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 16:10:36 +08:00
Zhen	4d0999c161	[ci] chore: Use local dataset and models in e2e_ascend CI (#3601 ) ### What does this PR do? Use local dataset and models in e2e_ascend CI. Local datasets download by following commands: ```shell huggingface-cli download --repo-type dataset openai/gsm8k --local-dir ${HOME}/dataset/openai/gsm8k huggingface-cli download --repo-type dataset hiyouga/geometry3k --local-dir ${HOME}/dataset/hiyouga/geometry3k ``` Local models download by following commands: ```shell huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct huggingface-cli download Qwen/Qwen3-0.6B --local-dir ${HOME}/models/Qwen/Qwen3-0.6B ``` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 15:14:45 +08:00
Chi Zhang	3dfa28ae32	[doc] feat: add model engine doc (#3611 ) ### What does this PR do? - Add model engine doc ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 14:25:44 +08:00
Shirley Wu	25d78fa913	[recipe] feat: CollabLLM integration for multiturn training (#3574 ) ### What does this PR do? This PR add [CollabLLM](https://aka.ms/CollabLLM) as a training recipe. The added components include - A customized `CollabLLMRewardManager` inheriting from `AbstractRewardManager` to compute multiturn-aware rewards. - A customized `CollabLLMAgentLoop` inheriting from `AgentLoop` to sample future conversations with simulated users, which imports `CollabLLMInteraction` from `verl/interactions/collabllm_interation.py`. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. The training rewards when running `train_rl_collabllm.sh` is increasing in a relatively stable manner (on 8xH200): <img width="964" height="480" alt="9baeb0700e3fa6a56596e14a54bc1049" src="https://github.com/user-attachments/assets/53a810d8-1dd7-4145-bb28-4e475e9d7d9d" /> Validation reward: <img width="974" height="538" alt="39364fd10523b0fde13d48645809f5e3" src="https://github.com/user-attachments/assets/c34fe9e7-3d83-4132-8e1a-67e82c221d09" /> #### Samples of model generation After training, when user asks generic questions with missing information, the model learns to ask for clarification <img width="1213" height="562" alt="c8e0ab31948a48ca396c7eccddd13673" src="https://github.com/user-attachments/assets/ae41cd77-3c77-4402-b9d3-21993b046a18" /> and give suggestions: <img width="1534" height="190" alt="7adb7d33eb9120d337c2a249c6a2dd22" src="https://github.com/user-attachments/assets/84e1d8c1-f954-403f-b931-bce45cff1612" /> (In contrast, with the same prompt, GPT-5 doesn't ask for any clarification:) <img width="1754" height="1126" alt="be8d8577584c0b2356cb352d6f294205" src="https://github.com/user-attachments/assets/9b734848-9ed0-4496-af11-68bb8f8d8e08" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # No change on the existing APIs ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. Changes: - Main files under `recipe/collabllm` - Registered `CollabLLMRewardManager` in `workers/reward_manager/collabllm.py` - Added `CollabLLMInteraction` in `verl/interactions/collabllm_interation.py` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). Added to `verl/docs/algo/collabllm.md`. - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: The scripts `train_rl_collabllm.sh` and `train_sft_collabllm.sh` are tested multiple times. - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Chen Haiquan <chenhaiquan@bytedance.com>	2025-09-25 09:53:39 +08:00
A1waysBeenHere	ba8555120a	[trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 (#3596 ) Import the index_first_axis, pad_input, unpad_input, etc in a different way to handle the case for Transformers version higher than v4.55.0 <img width="1372" height="58" alt="Screenshot 2025-09-24 at 2 44 30 PM" src="https://github.com/user-attachments/assets/fda7196b-2128-425b-ba15-9951fae39ee2" /> Since the modification of [PR40002](https://github.com/huggingface/transformers/pull/40002) in Transformers, `index_first_axis, pad_input, unpad_input` have been moved to the `transformers.modeling_falsh_attention_utils`. The original import way for NPU cannot handle it. ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: A1wayzBeenHere <moyicong@h-partners.com> Co-authored-by: Huazhong <hzji210@gmail.com>	2025-09-24 23:27:48 +08:00
Zhen	634bd9352b	[CI] chore: reopen ppo test in e2e_ascend CI (#3588 ) ### What does this PR do? Fix error and reopen ppo test case in `e2e_ascend` CI test. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-24 17:46:30 +08:00
EduardDurech	26a734e740	[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup (#3555 ) Vectorize RLOO advantage estimator 130ms -> 6ms Similar method can be done for other advantage estimators, I just don't have time Implements $$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i} r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$ <img width="2199" height="628" alt="image" src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764" />	2025-09-24 17:36:41 +08:00
Houmin Wei	69b0127b74	[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 (#3567 ) ### What does this PR do? This PR continues the work started in PR #2733, it adds support for variable sequence lengths in MultiTurnSFTDataset by introducing a `no_padding` option for the pad_mode. When this mode is active, sequences are not padded to a fixed length. - Implement no-padding mode for FSDP engine using nested tensors in sft trainer - Add test for no-padding mode both enable/disable use_remove_padding - Fix FSDP2 gradnorm issue ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhangchi.usc1992 <zhangchi.usc1992@bytedance.com>	2025-09-24 17:12:31 +08:00
Chi Zhang	1985eb14ff	[megatron] fix: revert megatron actor refactor (#3553 ) ### What does this PR do? - Revert megatron actor changes in this PR that causes perf degradation: https://github.com/volcengine/verl/pull/3206 - We have to revert following PRs that modify the files too: https://github.com/volcengine/verl/pull/3513 and https://github.com/volcengine/verl/pull/3315 - We will add them back when we fix the problem ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-24 14:27:05 +08:00
Chi Zhang	2d362c490b	[misc] chore: Update CODEOWNERS (#3594 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-24 14:01:10 +08:00
OC	1b4af4440f	[doc] fix: add faq doc to avoid vllm issue 22103 (#3595 ) ### What does this PR do? Provide a workaround for [vllm issue 22103](https://github.com/vllm-project/vllm/issues/22103), which may result grad norm explosion. You may hit this issue when match below conditions: 1. Using non-Hopper architecture GPUs, such as A100, L20, B200, etc. 2. Using vLLM as the inference engine. 3. The input and output texts are very long, for example, in multi-turn scenarios using reasioning models like Qwen3 for RL training. <img width="405" height="278" alt="截屏2025-09-24 下午1 37 50" src="https://github.com/user-attachments/assets/47aec2e7-7c31-4cba-9f86-03af4f795457" /> The issue can be confirmed from comparing rollout_probs_diff_mean metrics: <img width="414" height="267" alt="截屏2025-09-24 下午1 39 24" src="https://github.com/user-attachments/assets/f9cd484e-552a-49a4-b2b3-abb9e311c759" /> The workaround is: `+actor_rollout_ref.rollout.engine_kwargs.vllm.disable_cascade_attn=True` ### Checklist Before Starting - [ x] Search for similar PRs. Paste at least one query link here: ... - [ x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test n/a ### API and Usage Example n/a ### Design & Code Changes n/a ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-24 13:47:36 +08:00
Yan Bai	f1d212c6ec	[megatron] feat: use flash as default attention_backend (#3578 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. 1. add mapping from string to megatron's enum for attention_backend choice. 2. use flash attention as default attention_backend, for consistency to FSDP	2025-09-24 10:40:25 +08:00
EduardDurech	aaa4cf590b	[sglang] fix: Support SGLang>=0.5.2 (#3526 ) `sglang.srt.managers.[tokenizer_manager->io_struct]` fixes refactor https://github.com/sgl-project/sglang/pull/10028, should be compatible >=0.4.1.post6 https://github.com/sgl-project/sglang/pull/2630 Can merge https://github.com/volcengine/verl/pull/3484	2025-09-23 20:12:17 +08:00
Chi Zhang	32575408a8	[ci] fix: fix e2e_sppo ci (#3587 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-23 19:41:40 +08:00
HaochenYuan	0603b7be1b	[megatron] fix: fix bug when holding empty parameters with custom pipeline layout (#3565 ) ### What does this PR do? Current code may cause runtime error when one module holds empty parameter. For example, when running DeepSeek 671B with megatron pipeline_model_parallel_layout="E\|(t\|)*61\|L", pp_rank 62 holds empty parameter and will crash when call the `offload_megatron_optimizer ` or `load_megatron_optimizer ` function. This PR fixes the bug. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-23 19:07:25 +08:00
kang sheng	5150686536	[misc] feat: remove redundant default params (#3577 ) ### What does this PR do? This PR introduces two changes: 1. Removal of redundant default parameters: Default optimizer values are already set in the .yaml configuration file. Defining them again in other files is redundant and can cause confusion for users. 2. Alignment of warm-up step logic: Changed the condition from `num_warmup_steps < 0` to `num_warmup_steps <= 0`. This aligns the code with the documentation in the YAML file and matches the implementation in Megatron. https://github.com/volcengine/verl/blob/main/verl/trainer/config/actor/actor.yaml#L132 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com> Co-authored-by: Changlong Yu <changlong.ycl@gmail.com>	2025-09-23 18:49:56 +08:00
Zhen	5547dbe12b	[CI] chore: Update e2e_ascend CI config (#3532 ) ### What does this PR do? This PR is committed for solving 3 things: 1. All test cases in `e2e_ascend` CI pipeline use 8 NPUs by default, which prevents the machine's original performance from being fully utilized. This PR is committed for solving this problem. Thank @zheliuyu for finding this problem :) 2. Remove qwen3 grpo test case in `e2e_ascend.yml` because it is similar to qwen2.5 grpo. 3. Remove ppo test case in `e2e_ascend.yml` because it is not work since first commit #3502 , @xvxuopop is working for solving this. 4. Update e2e_ascend CI scan path for covering most of file modification case. 5. Ignore loading `libnuma.so` for Ascend NPU. 6. Fix qwen2_vl flash-attention related functions unavailable error for Ascend NPU. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-23 18:49:36 +08:00
Changlong Yu	50368ae291	[trainer] refactor: move rollout log to inheritable trainer (#3576 ) ### What does this PR do? move log rollout logic to one standalone function that can be re-used in other trainers such as `DAPORayTrainer` etc and avoid duplicated code. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-23 15:52:49 +08:00
Chi Zhang	4e1948d416	[ci] fix: fix more ci by pin transformers version (#3582 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-23 15:38:44 +08:00
Yu Jiaqi	96e7071de1	[trainer,rollout] fix: ensure LoRA weights are loaded when vllm_sleep_level=2 and without using layerd_summon (#3541 ) ### What does this PR do? Fix issue where VLLM would only load base model parameters and not LoRA parameters when VLLM_SLEEP_LEVEL == 2 and not using layered_summon. This fixes the LoRA trainer error where the first rollout would only use base model parameters, and subsequent rollouts would correctly load LoRA parameters. Fixes: https://github.com/volcengine/verl/issues/3516 Related PR: https://github.com/volcengine/verl/pull/3461 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-22 13:40:43 +08:00
vickytsang	7e4eec7467	[docker] feat: dockerfile rocm7 initial commit (#3547 ) ### What does this PR do? Dockerfile for rocm7.0 ### Checklist Before Starting - [x ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### API and Usage Example ```bash DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t verl-rocm7.0 . ```	2025-09-22 11:20:39 +08:00
Zilong Wang	fdbffe7e20	[recipe] fix: init self.model_config in fsdp worker of one-step-off policy (#3556 ) ### What does this PR do? Due to updated in the main package, the rollout worker calls `self.model_config` during `generate_sequences` (`d33c85e2c7/verl/workers/fsdp_workers.py (L869)`) which hasn't been initialized in current one-step-off recipe. This will through out runtime errors. Similar code in the default fsdp worker: `d33c85e2c7/verl/workers/fsdp_workers.py (L563)` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [...](https://github.com/volcengine/verl/pull/3531) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-22 11:16:22 +08:00
Nuo Chen	93dc6f5783	[recipe] fix: spin fsdp_workers.py bugs (#3544 ) Fix TypeError: cannot unpack non-iterable NoneType object ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-22 11:13:50 +08:00
Chi Zhang	d45b44103a	[ci] feat: update ci (#3552 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-22 11:06:23 +08:00
Yan Bai	bcd227598e	[megatron] chore: add a docker image for with mcore0.15 and TE2.7 (#3540 )	2025-09-22 10:59:33 +08:00
Chi Zhang	d33c85e2c7	[model] feat: support parameter generator for model engine (#3529 )	2025-09-19 23:20:59 +08:00
Nuo Chen	02b4cd3a85	[recipe] fix: Fix main_spin.py bugs (#3543 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 22:43:37 +08:00
Chi Zhang	4f7920e0ab	[ci] feat: fix more ci (#3537 )	2025-09-19 20:26:03 +08:00
Jiayi Yan	78915c47ed	[chore] fix typo (#3535 )	2025-09-19 17:41:03 +08:00
Yan Bai	bbdf819996	[Megatron] fix: compatible to mcore0.15 (#3534 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 17:18:55 +08:00
Chi Zhang	83205fdae0	[ci] feat: using local dataset to avoid network issue (#3533 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 16:21:55 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	2f6a5d6b00	[worker] fix: get all `multi_modal_inputs` keys with in a microbatch (#3315 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Address the first issue in https://github.com/volcengine/verl/pull/3281#issuecomment-3239570745 More work on top of https://github.com/volcengine/verl/pull/1999 Currently, the code gets the keys from the first row within the microbatch, This can go wrong if the dataset is a mixture of pure-text with multi-modal, where the first data in the microbatch is a pure-text one (no `pixel_values` or `image_grid_thw` exists in the key), and the microbatch still contains multi-modal data. This PR fixes this issue by collecting all available keys for `multi_modal_inputs` within the microbatch, and so that we can concatenate those multi-modal tensors together without ignoring some of them under the above situation. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-09-19 15:57:51 +08:00
Zhen	90648ae222	[doc] chore: Update owners for ascend_tutorial documents (#3528 ) ### What does this PR do? Update owners for ascend_tutorial documents ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 10:32:40 +08:00
Zhen	c7922f0297	[doc] chore: Update ascend quick start document (#3527 ) ### What does this PR do? Remove reward maes, loss mae, total time ratio and throughput information in ascend quick start document. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 10:21:49 +08:00
X. HU	558d4dd581	[doc] fix: Update Qwen3-30B-A3B info in ascend_quick_start.rst (#3514 ) ### What does this PR do? 1. Update the model name based on the training script to keep it consistent with the Hugging Face official website. https://github.com/volcengine/verl/pull/3189 2. Supplement the Qwen3-30B-A3B model info with actor.strategy as megatron according to https://github.com/volcengine/verl/pull/3203 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 09:57:42 +08:00
Yaowei Zheng	c0e2b9d249	[model] fix: qwen2vl for transformers 4.52.* (#3524 )	2025-09-19 06:11:15 +08:00
sharonyu-115	b6b34b2d30	[megatron] Add TIS support to megatron backend (#3513 ) ### What does this PR do? Add the TIS support from https://github.com/volcengine/verl/pull/2953 to megatron actor ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: Shuang Yu <shuangy@shuangy-mlt.client.nvidia.com>	2025-09-18 23:24:08 +08:00
Yaowei Zheng	0d4541f397	[model] fix: refactor qwen2vl patches & support no-image input for fsdp (#3496 ) ### What does this PR do? This PR tries to fix #3491 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested with [latest transformers](`6e50a8afb2`) <img width="2448" height="540" alt="image" src="https://github.com/user-attachments/assets/06d40f40-572c-4454-8e08-115857f61f21" /> <img width="2796" height="1394" alt="image" src="https://github.com/user-attachments/assets/17489b9c-e376-46e3-80d8-71106d304077" /> <img width="2098" height="744" alt="image" src="https://github.com/user-attachments/assets/8c7f736d-bf09-4ba9-9cf4-0d56e367c526" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes #### ⚠️ Breaking We adopt a new format for Qwen2VL's position ids: (4, batch size, seq len) Assuming a vision position ids (mrope) has a shape of (3, batch size, seq len) and a text position ids (normal rope) has a shape of (1, batch size, seq len), we concatenate both to obtain the final position ids. This aligns with the implementation in the Transformers >= 4.54.0 🤗 https://github.com/huggingface/transformers/blob/v4.54.0/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py#L1469 #### 🎤 New We have refactored the Qwen2VL and Qwen2.5VL patches, supporting no-image input for FSDP by introducing fake ViT inputs. We have also removed some redundant code for better maintainability. #### 🚨 Changes We move the ulysses logic into the attention function. So the position ids will be scattered before the language model part. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-18 10:10:30 +08:00
Chi Zhang	214d0f0a94	[data] feat: support customizable loss mask in multi-turn sft dataset (#3507 ) ### What does this PR do? - Support customized loss mask in multi-turn sft dataset - Previously, we set loss mask based on whether the role is "assistant" or not. This is limited if we only want to fit the last assistant message. To tackle this problem, we explicitly introduce a loss_mask in the dataset that can be optionally specified by the user. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 18:48:47 +08:00
xvxuopop	f4e2047074	[model, ci] feat: add qwen3-8b ppo script on ASCEND NPU (#3502 ) ### What does this PR do? add examples/ppo_trainer/run_qwen3-8b_npu.sh > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 18:48:24 +08:00
Zhen	ee8a7af8f4	[recipe] feat: Add qwen2.5-7b DAPO NPU example script (#3501 ) ### What does this PR do? #1858 support DAPO on Ascend NPU, but example `qwen2.5-7b-instruct` training script is not added, which will be added through this PR. The script in this PR is borrowed from https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/PyTorch/built-in/rl/VeRL_for_PyTorch/test/train_qwen2_5_7b_instruct_DAPO_full_16p.sh ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not relaetd. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 16:52:28 +08:00
Chi Zhang	0665153b9a	[training_utils] refactor: extract checkpoint handler into a separate file for reuse (#3505 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 15:24:58 +08:00
Chi Zhang	fa924a43c7	[model] fix: fix device (#3500 ) ### What does this PR do? - Move micro_batch to device in forward_step ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 12:29:42 +08:00
baymax591	04726dbf12	[ray, single_controller] refactor: Accelerate ray.put with thread (#3495 ) ### What does this PR do? For a data size of 6400x20480, the time of `ray.put` was reduced from 28.85s to 19.86s following this optimization, resulting in a ~45% improvement. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [pr2893](`3e2bceb1af (diff-32eb7ca0e11460f1eee309256c2fe7d571699b18cea314cbaef4d15f58b4f7b3)`) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 11:43:45 +08:00
baymax591	5c98ed1b31	[perf, megatron] fix: bugfix if nvml can not import (#3490 ) ### What does this PR do? If the `import pynvml` fails, the `initialized` variable will not be defined, and accessing it in the finally block will cause an error. ``` File "/tmp/ray/session_2025-09-16_14-38-49_113222_2998/runtime_resources/working_dir_files/_ray_pkg_06ff080dac9922d6/verl/utils/distributed.py", line 46, in set_numa_affinity if initialized: UnboundLocalError: local variable 'initialized' referenced before assignment ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [pr3471](https://github.com/volcengine/verl/pull/3471) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-16 15:15:09 +08:00
Brilliant Hanabi	cf5263e82b	[perf] fix: Init some attrs earlier in Profiler (#3482 ) If Profiler init process return with config.enable == False before initialize self.prof, you will get `AttributeError: 'Profiler' object has no attribute 'prof'` when use Profiler.check (called by other funcs such as `Profiler.start`). For the same reasons, self.saved should also be initialized earlier. ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-16 13:04:46 +08:00
Joel	fd8ae66726	[1/N][rollout] feat: support vllm/sglang native http server (#3456 ) ### What does this PR do? This is the first part to support vllm/sglang native http server in server mode rollout. In native http server mode, the inference services are launched separately from the training engine, and the model runner share GPU with training engine but in different processes. We're going to support three deployment modes: - hybrid mode: Training engine and model runner share GPU but in different process. To sync weights, there's a server adapter in training process, which is a http client to send wake_up/sleep/update_weights request to inference server. This is used for on-policy training. - standalone mode: Training engine and inference services have separate GPU resource, disaggregated architecture. This is used for off-policy training. - colocated mode: Like hybrid mode, but without server adapter since no need to sync weights. This is mainly used for GRM service (LLM as a judge). <img width="2644" height="1276" alt="image" src="https://github.com/user-attachments/assets/2c1adf2d-adb5-4563-8a1a-8948f93b09b7" /> Following PR will be: - [2/N] support DP+EP - [3/N] standalone rollout with weight transfer by NCCL/UCX - [4/N] colocated GRM service with wake_up/sleep(without weight synchronization) - [5/N] switch to `/generate` http api with token-in-token-out: currently sglang has `/generate` api but may need some effort to support multi-modal; while vllm still lack `/generate` api - [6/N] switch to sglang/vllm router with better kv-cache awareness load balance The native http server is inspired by the design of [slime](https://github.com/THUDM/slime), thanks to their prior work. Also credit to @ChangyiYang @zhaochenyang20 https://github.com/volcengine/verl/pull/3090 @SuperCB https://github.com/volcengine/verl/pull/3102 with their prior contribution.	2025-09-16 10:41:17 +08:00
baymax591	ac2f790f56	[ray] refactor: Accelerate Tensor serialization by converting to np.ndarray (#3425 ) ### What does this PR do? For a data size of 6400x20480, the average serialization duration was reduced from 3.32s to 1.32s following this optimization, resulting in a ~151% improvement. ``` # tensor average serialize：2.58s deserialize：0.74s total：3.32s TaskRunner pid=1904793) baymax debug serialize time=2.5947s (TaskRunner pid=1904793) baymax debug serialize time=2.593357s (TaskRunner pid=1904793) baymax debug serialize time=2.580081s (TaskRunner pid=1904793) baymax debug serialize time=2.582321s (WorkerDict pid=1905183) baymax debug deserialize time=0.475745s (WorkerDict pid=1905184) baymax debug deserialize time=0.538223s (WorkerDict pid=1905181) baymax debug deserialize time=0.609146s (WorkerDict pid=1905182) baymax debug deserialize time=0.61064s (WorkerDict pid=1905189) baymax debug deserialize time=0.597746s (WorkerDict pid=1905185) baymax debug deserialize time=0.530353s (WorkerDict pid=1905180) baymax debug deserialize time=0.811555s (WorkerDict pid=1905194) baymax debug deserialize time=0.513646s (WorkerDict pid=1905193) baymax debug deserialize time=0.962868s (WorkerDict pid=1905179) baymax debug deserialize time=0.929226s (WorkerDict pid=1905186) baymax debug deserialize time=0.701976s (WorkerDict pid=1905191) baymax debug deserialize time=0.867236s (WorkerDict pid=1905192) baymax debug deserialize time=0.858472s (WorkerDict pid=1905187) baymax debug deserialize time=1.045251s (WorkerDict pid=1905188) baymax debug deserialize time=0.960867s (WorkerDict pid=1905190) baymax debug deserialize time=1.010673s # numpy average serialize：0.000617s deserialize：1.32s total：1.32s [36m(TaskRunner pid=1729638)[0m baymax debug serialize time=0.00016s [36m(TaskRunner pid=1729638)[0m baymax debug serialize time=0.000117s [36m(TaskRunner pid=1729638)[0m baymax debug serialize time=0.000158s [36m(TaskRunner pid=1729638)[0m baymax debug serialize time=0.000182s [36m(WorkerDict pid=1730035)[0m baymax debug deserialize time=0.867232s [36m(WorkerDict pid=1730036)[0m baymax debug deserialize time=0.97372s [36m(WorkerDict pid=1730028)[0m baymax debug deserialize time=1.08627s [36m(WorkerDict pid=1730034)[0m baymax debug deserialize time=1.187599s [36m(WorkerDict pid=1730037)[0m baymax debug deserialize time=1.165926s [36m(WorkerDict pid=1730025)[0m baymax debug deserialize time=1.281101s [36m(WorkerDict pid=1730029)[0m baymax debug deserialize time=1.359834s [36m(WorkerDict pid=1730027)[0m baymax debug deserialize time=1.281978s [36m(WorkerDict pid=1730030)[0m baymax debug deserialize time=1.329298s [36m(WorkerDict pid=1730026)[0m baymax debug deserialize time=1.475415s [36m(WorkerDict pid=1730031)[0m baymax debug deserialize time=1.422345s [36m(WorkerDict pid=1730033)[0m baymax debug deserialize time=1.378894s [36m(WorkerDict pid=1730039)[0m baymax debug deserialize time=1.368721s [36m(WorkerDict pid=1730040)[0m baymax debug deserialize time=1.601587s [36m(WorkerDict pid=1730042)[0m baymax debug deserialize time=1.768378s [36m(WorkerDict pid=1730038)[0m baymax debug deserialize time=1.765994s ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Huazhong <hzji210@gmail.com>	2025-09-16 09:33:28 +08:00
Simiao Zhang	8ecf123736	[perf, megatron] chore: bind NUMA (#3471 ) ### What does this PR do? Improve the data transfer efficiency between the CPU and GPU (H2D, D2H), prepare for the offload feature. > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/3401 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### API and Usage Example set numa affinity ```python set_numa_affinity() ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-16 09:30:26 +08:00
Joel	a7b8675f96	[rollout] fix: make agent loop reward worker thread-safe (#3454 ) ### What does this PR do? Fixed https://github.com/volcengine/verl/issues/3407	2025-09-15 14:43:52 +08:00
X. HU	44b919e5fe	[ci] chore: add codeowner (#3473 ) ### What does this PR do? add codeowner in npu folder /recipe/dapo and /examples/grpo_trainer: model scripts of npu /verl/models/transformers: npu_patch for transformers ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-15 12:33:16 +08:00
X. HU	2061894891	[model] feat: add qwen3 grpo 8b/32b script on ASCEND NPU (#3310 ) ### What does this PR do? add examples/grpo_trainer/run_qwen3_32b_npu.sh <img width="1014" height="1111" alt="image" src="https://github.com/user-attachments/assets/8cd59fc2-5f6a-419e-87ac-bf35a71856fb" /> add examples/grpo_trainer/run_qwen3_8b_npu.sh <img width="844" height="930" alt="image" src="https://github.com/user-attachments/assets/5c23c7a4-8729-4007-8828-027a8cda4779" /> > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... > already support in https://github.com/volcengine/verl/pull/3300 - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Signed-off-by: ZLiao <a627465478@gmail.com> Co-authored-by: ZLiao <a627465478@gmail.com>	2025-09-15 10:13:01 +08:00
Nan Jiang	65170f918b	[sglang, rollout] feat: enable token-in-token-out for SGLang engine (#2759 ) ### What does this PR do? This PR enables token-in-token-out functionality for the SGLang engine, improving performance by avoiding unnecessary tokenization/detokenization steps during rollout. The engine can now work directly with token IDs, and the rollout system passes pre-computed token IDs to avoid recomputation. ### Checklist Before Starting - [x] Search for similar PRs. No similar PRs found for SGLang token-in-token-out functionality. - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test This change maintains backward compatibility and does not require additional testing beyond existing CI. The functionality is tested through existing rollout tests. ### API and Usage Example No API changes are introduced. The enhancement is internal to the SGLang rollout implementation and transparent to users. ```python # Usage remains the same - no changes to user-facing APIs rollout = SGLangRollout(config) results = await rollout.generate(...) ``` ### Design & Code Changes High-level design: - Enable SGLang engine to skip tokenizer initialization by default (`skip_tokenizer_init=True`) - Modify rollout system to extract and pass token IDs directly from engine output - Update message handling to accept pre-computed token IDs Specific changes: 1. `verl/workers/rollout/schemas.py`: - Add optional `content_ids` parameter to `add_assistant_message()` method - Only compute token IDs if not provided, avoiding redundant tokenization 2. `verl/workers/rollout/sglang_rollout/sglang_rollout.py`: - Set `skip_tokenizer_init=True` by default for token-in-token-out mode - Extract `content` or `content_ids` from engine output - Pass `content_ids` to all `add_assistant_message()` calls ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: This change is internal optimization that maintains existing behavior and is covered by existing tests. - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-13 22:21:37 -07:00
Lingfeng Wang	b00b090149	[megatron,recipe] feat: support Qwen3-30B (MoE) DAPO training on ASCEND NPU (#3203 ) ### What does this PR do? Fix of megatron config, and example shell of Qwen3-30B-Dapo with megatron. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test critic/reward/mean: <img width="1304" height="704" alt="dapo_30b_megatron" src="https://github.com/user-attachments/assets/f2062e24-b37d-4d54-8dd6-e9da25f8c69b" /> response_length/mean: <img width="815" height="407" alt="image" src="https://github.com/user-attachments/assets/f59b6c7b-4f24-4aa7-9b9e-bb8184dac5d3" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-13 19:08:23 +08:00
Chi Zhang	6e6fafdc74	[model] feat: add FSDP/Megatron critic worker with model engine (#3439 ) ### What does this PR do? - As title - Add a test to compare the output of FSDP/Megatron engine with huggingface model ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-13 12:18:58 +08:00
EduardDurech	3c9b884ecd	[model] feat: Add `Apertus` (#3295 ) Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR https://github.com/huggingface/transformers/pull/39381 Associated vLLM PR https://github.com/vllm-project/vllm/pull/23068 Associated SGLang PR https://github.com/sgl-project/sglang/pull/9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />	2025-09-13 10:03:58 +08:00
kang sheng	b8c6d132a8	[trainer,rollout] fix: model weights will not be loaded when vllm_sleep_level=2 and using lora (#3461 ) Fix: https://github.com/volcengine/verl/issues/3159, https://github.com/volcengine/verl/issues/3437 The default value of `VLLM_SLEEP_LEVEL` was changed to 2 in PR: https://github.com/volcengine/verl/pull/3019. However, in the previous code, when using LoRA, the worker would only load LoRA weights when calling `wake_up`. This does not cause any issues when `VLLM_SLEEP_LEVEL=1`, since in this mode the base model's weights are moved to the CPU. However, when `VLLM_SLEEP_LEVEL=2`, the weights are completely destroyed. Therefore, we need to sync the weights from the actor every time. Typically, users run LoRA training when they are short on resources. Therefore, this PR does not forcibly set `VLLM_SLEEP_LEVEL=1` when using LoRA. On the contrary, it aims to save CPU memory whenever possible. The basic vLLM rollout is currently skipped: `33edd95e13/tests/workers/rollout/rollout_vllm/test_vllm_spmd.py (L71-L72)`. Thus, no unittest is included in this PR. I will fix the skipped vLLM rollout and propose a follow-up PR to test LoRA vLLM inference in CI. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-12 19:29:25 +08:00
Huazhong	b86cd96eb7	[trainer, fsdp, megatron] feat: Support one step off async rl on Ascend NPU (#2924 ) ### What does this PR do? Since Ray's collective communication interface does not support the hccl backend, we refer to the [example code](https://docs.vllm.ai/en/latest/examples/offline_inference/rlhf.html) of vLLM and complete the weight synchronization between actor and rollout. This PR mainly introduces two changes: 1. Use `StatelessProcessGroup` and `PyNcclCommunicator` instead of ray's `create_collective_group` to create weight synchronization communication groups. 2. Use the `ray.get_runtime_context().get_accelerator_ids` API instead of the environment variable `RAY_LOCAL_RANK` to set device in scenarios where is_ray_noset_visible_devices is true, so as to fix the issue at https://github.com/volcengine/verl/issues/2971. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-12 19:18:36 +08:00
Nan Jiang	638856c986	[sglang, tool] fix: fix text only bug (#3448 ) ### What does this PR do? When the model is text only, we should not do `{"type":"text", "text": "XXX"}`, should just add the text. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-12 19:04:34 +08:00
Chi Zhang	b03866768f	[ci] feat: move more tests to volcano engine (#3455 )	2025-09-12 18:54:55 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	33edd95e13	[worker] fix: respect free_cache_engine flag (#3442 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Continuation of #1464 Now, recent changes have broken the `free_cache_engine` option again. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Unit test cases might not be feasible as the `sleep`/`wake_up` call can happen anywhere in the codebase. An end-to-end test might be resource-consuming. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-09-11 22:49:55 +08:00
Puneesh Khanna	e160d3b2e0	[trainer] fix: Loss calculations for grad accumulation steps (#3332 ) ### What does this PR do? For gradient accumulation steps over micro batches, loss should be normalised before calling loss.backwards(). Also add an optimisation so that the all reduce of gradients is only performed in the last accumulation step. Verified fine-tuning on few open source models with the changes in this PR. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-11 22:46:18 +08:00
maijia-cwh	9bbe745f80	[trainer] feat: VL support freeze vision model (#3178 ) ### What does this PR do? vl model support freeze vision model issue: [2526](https://github.com/volcengine/verl/issues/2526) > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. qwen2_vl_7b_function_rm_1756093906 is vision freeze mode <img width="4374" height="2086" alt="image" src="https://github.com/user-attachments/assets/107772e4-039d-4ec5-b193-54688f4a7176" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mighten Dai <mighten@outlook.com>	2025-09-11 18:17:21 +08:00
Yuyang Ding	f6b09acef4	[worker, sglang] feat: support generative reward model (server mode) (#3441 ) ### What does this PR do? Following https://github.com/volcengine/verl/pull/3352, current implementation of the reward model has supported both discriminative and generative models. For newly supported generative models, users should specify a customized data processor to (1) convert rollout to genrm (including question, response, and optional ground truth) chat template, and (2) convert genrm responses to final reward scores. This args can be passed as `reward_config.data_processor_config.{path/preprocess_fn_name/postprocess_fn_name}`, respectively. The demo implementation can be seen in `tests/workers/reward_model/process_fn.py`. Specific usage of server mode RMs can be checked in `tests/workers/reward_model/test_discriminative_reward_model.py` and `tests/workers/reward_model/test_generative_reward_model.py`, respectively. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2845 (fsdp/megatron mode) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-11 14:05:02 +08:00
OC	4c7812c40b	[doc] fix: table column in document (#3430 ) ### What does this PR do? Added a missing column in Qwen3-30B-A3B MOE part. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test n/a ### API and Usage Example n/a ### Design & Code Changes n/a ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-10 13:14:25 +08:00
Jiarui Fang（方佳瑞）	e48ccf9b97	[doc] feat: add SimpleVLA-RL link in readme (#3433 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: ByteDance <wangjerry@bytedance.com> Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-09-10 13:13:48 +08:00
Yuyang Ding	b5a5e88fb3	[worker] refactor: move the implementation of rm to workers.roles and polish (#3423 )	2025-09-10 05:38:02 +08:00
X. HU	dfa3933ac4	[tool] feat: support local gsm8k dataset in example/data_preprocess (#3362 )	2025-09-09 22:29:56 +08:00
Chi Zhang	5c46f4f437	[model] feat: replace DataProto with TensorDict in engine (#3422 )	2025-09-09 22:28:25 +08:00
Changlong Yu	a4d8952edc	[fsdp, recipe] feat: add grpo reward model example using HH-RLHF dataset (#3417 ) ### What does this PR do? One example of using SOTA BT reward model to train GRPO model - Reward Model: [Skywork/Skywork-Reward-V2-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B) - Dataset: [Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. - Wandb training curve: <img width="2004" height="614" alt="image" src="https://github.com/user-attachments/assets/c6dc9003-7b59-43af-8ff4-560114fe5b10" /> - AlpacaEval 2.0 eval results: \| Model Name \| AlpacaEval LC Win-rate \| Win-rate \|:------\|:-------:\|:-------:\| \| mistralai/Mistral-Nemo-Instruct-2407 \| 42.24 \| 38.68 \| \| mistral12b_skyworkllama8b_grpo_hhrlhf \| 68.20 \| 68.29 \| ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-09 17:11:08 +08:00
Chi Zhang	c4f4caf0cd	[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 (#2733 ) ### What does this PR do? - Add TensorDict utilities and tests to cover the current DataProto functionalities. - Add nested tensor example to remove padding throughout the system - Add image example - Upgrade tensordict to v0.10 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-09 14:47:32 +08:00
Brilliant Hanabi	eaf20fff88	[recipe] fix: Add gts argument for recipe _dump_generations (#3348 ) ### What does this PR do? PR https://github.com/volcengine/verl/pull/2353 forgot to update all `_dump_generations` in recipe codes > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-09 13:11:40 +08:00
kang sheng	662fae30e6	[rollout] fix: raise error if processing multimodal data without vlm processor (#3370 ) Fix https://github.com/volcengine/verl/issues/3234	2025-09-09 13:10:48 +08:00
Lingfeng Wang	c410364ebf	[rollout] chore: Add enable_prefix_caching into config (#3395 ) ### What does this PR do? Added enable_prefix_caching to the RolloutConfig. This feature provides no significant performance benefit in short-input-long-output scenarios (e.g., 2k input to 34k output) and occupies a certain amount of device memory. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### API and Usage Example to disable enable_prefix_caching, using ```bash actor_rollout_ref.rollout.enable_prefix_caching=False ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-09 11:03:04 +08:00
Roseisrosie	eada037455	[vllm] fix: use VLLM_SLEEP_LEVEL=1 on ASCEND NPU (#3355 )	2025-09-09 10:07:53 +08:00
Yuyang Ding	7430285068	[ci] refactor: add ci test for refactored reward worker and add some args to GenRM config (#3385 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. - add ci test for new reward model (accuracy check for the results of server mode rm and hf rm) - add some args for genrm (e.g., reward_type, sampling parameters) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-09 09:59:44 +08:00
Michael	ce037bd8cc	[doc] fix: edit one step off policy readme with original work (#3414 ) ### What does this PR do? Updates the readme for the one-step off-policy async trainer with the original reference paper I introduced async RL training for LLMs in my paper https://arxiv.org/abs/2410.18252. My method is exactly the one step off-policy async replicated here and was on arxiv 7 months before AReal and published at ICLR. AReal's method is different (fully async replay buffer) but it includes a nice graphic of my setup! Love to see you're getting similar speedups to my results and this is a great recipe! Figure 2 from my paper <img width="1040" height="611" alt="image" src="https://github.com/user-attachments/assets/0812fd40-daae-4346-bb72-85bc526bd3fa" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2591 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-09 09:26:47 +08:00
Shaopeng Fu	491e636a8a	[trainer] fix: avoid loading duplicated custom reward function to fix issue #3150 (#3404 )	2025-09-09 06:57:55 +08:00
Chi Zhang	62549582a7	[model] feat: polish megatron engine (#3401 ) ### What does this PR do? - Provide best prepare_dynamic_batch parameters to fsdp and megatron engine - Reuse `prepare_micro_batches` in Megatron engine ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-08 19:42:43 +08:00
Chi Zhang	a1542172d5	[model] refactor: polishing FSDP model engine (#3394 ) ### What does this PR do? - Extract a separate prepare_micro_batches - Fix prepare_dynamic_batch - Make `forward_step` more modular ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-08 14:42:53 +08:00
Blue Space	21dee53e85	[ci] fix: cpu unit test, etp config breaking change (#3390 ) ### What does this PR do? [ci] fix: cpu unit test ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-08 13:30:43 +08:00
Blue Space	6159dee4e9	[model, megatron] feat: Add glm air support and make new model directly use mbridge (#3359 ) ### What does this PR do? [model, megatron] feat: Add glm air support and make new model directly use mbridge. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-08 09:48:50 +08:00
Tialo	d26a913f43	[trainer] fix: Fix ClearML logging (#3384 ) ### What does this PR do? fix typo and use `close` because `mark_completed` closes the program > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-07 22:07:47 +08:00
HaochenYuan	3a89785f9a	[deployment] Fix deepseek671B grpo script (#3383 ) ### What does this PR do? The current script is not actual grpo script. This PR adds the missing parameters. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-07 21:30:30 +08:00
Chi Zhang	c3f63ebe9c	[misc] fix: set default value of ETP to 1 (#3371 ) ### What does this PR do? As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-07 20:00:44 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	f346f96d29	[training_utils] fix: stop using `math` naming under reward score" (#3378 )	2025-09-07 09:57:24 +08:00
Yuyang Ding	cb01f10ba0	[worker,sglang] refactor: deprecate fsdp/megatron reward model with server mode (#3352 )	2025-09-06 23:45:41 +08:00
Chi Zhang	7bc70bbf0b	[trainer] feat: add CI for accuracy alignment of SFT trainer with model engine (#3363 ) ### What does this PR do? - Add CI for SFT trainer with various fsdp and megatron configurations and make sure their output matches ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-06 10:48:28 +08:00
Alex Kim	10054da277	[doc] fix: fix typo in skypilot_examples.rst (#3368 )	2025-09-06 07:41:51 +08:00
Mighten Dai	0b533f7bcf	[rollout, vllm, sglang] fix: allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout (#3309 ) Allow user customization of `repetition_penalty` to avoid watchdog timeout during GRPO rollout ### What does this PR do? This PR adds an interface for users to specify `repetition_penalty`, which helps avoid repetition in LLM generation and prevents watchdog timeouts during GRPO rollout. If not specified, `repetition_penalty` will remain at its default value of `1.0`. ### Checklist Before Starting - [X] Search for similar PRs. No similar PRs found. - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test This PR can be vetted by existing CI test cases. ### API and Usage Example Previously, users could not specify `repetition_penalty`, but this PR adds support for it. For example, users can now start GRPO training with a command like: ```bash python -m verl.trainer.main_ppo \ +actor_rollout_ref.rollout.repetition_penalty=1.05 \ # other params here... ``` ### Design & Code Changes This PR adds an interface allowing users to specify the `repetition_penalty` (e.g., `1.05`), while maintaining backward compatibility with the default value of `1.0`. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-05 12:43:17 +08:00
baymax591	0e25bad451	[vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue (#3345 ) ### What does this PR do? After [pr#3285](`19020f6188`), [issue 2564](https://github.com/volcengine/verl/issues/2564) began to reappear. Following the modification of [pr#2782](https://github.com/volcengine/verl/pull/2782), [issue 2564](https://github.com/volcengine/verl/issues/2564) was solved. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [pr#2782](https://github.com/volcengine/verl/pull/2782) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-05 09:52:08 +08:00
OC	e90f18c40a	[model] feat: support ByteDance Seed-OSS 36B model (#3347 ) ### What does this PR do? support ByteDance Seed-OSS 36B model: 1. add RL and SFT example 2. support mfu metrics Requirement: pip install transformers>=4.56.0 Notes: vllm v0.10.0 does not support Seed-OSS, but can fail back to transformers to get it working. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test (TaskRunner pid=373084) step:2 - global_seqlen/min:6260 - global_seqlen/max:11318 - global_seqlen/minmax_diff:5058 - global_seqlen/balanced_min:8466 - global_seqlen/balanced_max:8468 - global_seqlen/mean:8467.375 - actor/entropy:0.47251570224761963 - actor/kl_loss:0.03297248564194888 - actor/kl_coef:0.001 - actor/pg_loss:-0.0494408356025815 - actor/pg_clipfrac:0.019900403218343854 - actor/ppo_kl:0.020935473148711026 - actor/pg_clipfrac_lower:9.349289757665247e-05 - actor/grad_norm:0.47875913605093956 - perf/mfu/actor:0.2823303751694612 - perf/max_memory_allocated_gb:134.74115753173828 - perf/max_memory_reserved_gb:141.615234375 - perf/cpu_memory_used_gb:150.75712203979492 - actor/lr:1e-06 - training/global_step:2 - training/epoch:0 - critic/score/mean:0.3515625 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.3515625 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.023741308599710464 - critic/advantages/max:0.7071057558059692 - critic/advantages/min:-0.7071057558059692 - critic/returns/mean:-0.023741308599710464 - critic/returns/max:0.7071057558059692 - critic/returns/min:-0.7071057558059692 - response_length/mean:444.4296875 - response_length/max:1024.0 - response_length/min:50.0 - response_length/clip_ratio:0.140625 - response_length_non_aborted/mean:444.4296875 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:50.0 - response_length_non_aborted/clip_ratio:0.140625 - response/aborted_ratio:0.0 - prompt_length/mean:84.78125 - prompt_length/max:141.0 - prompt_length/min:54.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:6.250300793908536e-05 - timing_s/generate_sequences:21.979598999023438 - timing_s/generation_timing/max:22.295286178588867 - timing_s/generation_timing/min:21.753456115722656 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:39.58543623800506 - timing_s/reward:0.031087818002561107 - timing_s/old_log_prob:17.46088112698635 - timing_s/ref:5.804751824995037 - timing_s/adv:0.003937039989978075 - timing_s/update_actor:57.383965655986685 - timing_s/step:120.27422251200187 - timing_s/stop_profile:6.923600449226797e-05 - timing_per_token_ms/gen:0.6958608511260053 - timing_per_token_ms/ref:0.08569290696637147 - timing_per_token_ms/adv:5.8120727940744256e-05 - timing_per_token_ms/update_actor:0.8471333449857052 - perf/total_num_tokens:67739 - perf/time_per_step:120.27422251200187 - perf/throughput:70.40057980133741 ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-04 22:41:58 +08:00
Chi Zhang	72e88ecd79	[trainer] feat: support sft_trainer with model engine (#3341 ) ### What does this PR do? - support sft_trainer with model engine - fix engine interface to handle missing data from non-pp - add gsm8k multi-turn dataset - add left-right padding to MultiTurnDataset so that the data format of SFT matches with RL - add sft e2e runnable tests with fsdp and megatron backend ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 19:40:41 +08:00
Chi Zhang	90acc8abc1	[doc] fix: Update skypilot_examples.rst (#3344 ) ### What does this PR do? As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-04 19:01:59 +08:00
Alex Kim	f356fc1e56	[deployment, doc] feat: Add SkyPilot integration examples (#3333 ) ### What does this PR do? Adds SkyPilot integration examples for running verl training jobs on Kubernetes/cloud platforms with GPUs. Includes configurations for PPO, GRPO, and multi-turn tool usage training. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+skypilot - [x] Format the PR title as `[{modules}] {type}: {description}` ### Test Validated SkyPilot YAML configurations for Ray cluster initialization, dataset downloading, and distributed training setup with H100 GPUs. ### API and Usage Example ```bash # Launch PPO training on 2 nodes sky launch -c verl-ppo examples/skypilot/verl-ppo.yaml --secret WANDB_API_KEY -y # Launch GRPO training sky launch -c verl-grpo examples/skypilot/verl-grpo.yaml --secret WANDB_API_KEY -y # Launch multi-turn tool usage training sky launch -c verl-multiturn examples/skypilot/verl-multiturn-tools.yaml --secret WANDB_API_KEY --secret HF_TOKEN -y ``` Design & Code Changes - Added 3 SkyPilot YAML configurations for PPO, GRPO, and multi-turn training - Added `examples/skypilot/README.md` with setup guide - Added `docs/examples/skypilot_examples.rst` documentation - Updated `docs/index.rst` and `docs/start/multinode.rst` with references ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-04 16:56:00 +08:00
alexchiu	4d45c12408	[recipe] fix: (dapo_ray_trainer) use global_steps to determine is_last_step when resuming (gen_steps not restored) (#3336 ) ### What does this PR do? - When resuming from a checkpoint, gen_steps is not correctly restored, causing is_last_step to be misdetected. - Switch is_last_step logic from gen_steps to self.global_steps to remove the dependency on gen_steps. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-04 11:30:40 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	a8238d4745	[training_utils] fix: Using a non-tuple sequence for multidimensional indexing is deprecated (#3314 )	2025-09-03 20:49:25 +08:00
baymax591	47a483b620	[recipe] fix: bugfix of refactor omissions (#3328 )	2025-09-03 20:48:30 +08:00
The-Hierophant	9ccaabf5ef	[doc]Update README.md, add related works (#3331 )	2025-09-03 20:45:54 +08:00
kang sheng	bc7c86398c	[misc] feat: create issue template for verl (#3330 )	2025-09-03 20:45:20 +08:00
Chi Zhang	d7a0469977	[model] feat: polish model engine (#3321 )	2025-09-03 20:44:39 +08:00
Geaming	1f533d65e2	[doc] feat: Adding PACS to the Awesome work (#3327 )	2025-09-03 19:35:07 +08:00
Cheng	2d6c6dbb39	[trainer] fix: Correct off-by-one error in SFT loss mask slicing (#3287 ) ### What does this PR do? This PR fixes the SFT loss mask, which always masked the first generated token and would lead to the SFTed model behaving as generating the wrong first token. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... No similar PRs found - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A for automated unit/integration tests. Manually verified the fix with an overfitting experiment described below, as this logic bug is best demonstrated through training behavior rather than a simple unit test. ### API and Usage Example Don't affect the current veRL SFT training usages. ### Design & Code Changes before ```python loss_mask = batch.pop("loss_mask")[:, :-1].reshape(-1).to(self.device_name) ``` now ```python loss_mask = batch.pop("loss_mask")[:, 1:].reshape(-1).to(self.device_name) ``` ### Overfitting Experiments We did a one-example overfitting SFT experiment using `qwen2.5-1.5b-base` for 5 epochs to test the necessity and functionality of this change. The training example is from reasoning data. ``` Input: def reverse_string(s: str) -> str:\n """\n Returns the reverse of the input string.\n >>> reverse_string("hello") "olleh" [...omitted input] Output: <think>Okay, I need to write a Python function called reverse_string that takes a string s and returns its reverse. Let\'s see. How do I reverse a string in Python? [...omitted output] ``` In the expected case of model inference, the SFTed model would easily output `<th`, i.e., the first token of `<think>` per the Qwen tokenizer, as the first token. Before fix In the model inference, the top 10 first token probabilities are: ``` --- Top 10 Next Token Predictions --- 1. Token: 'Hmm' (ID: 80022) - Probability: 0.4090 2. Token: 'Let' (ID: 10061) - Probability: 0.0492 3. Token: 'The' (ID: 785) - Probability: 0.0473 4. Token: 'This' (ID: 1986) - Probability: 0.0373 5. Token: 'Okay' (ID: 32313) - Probability: 0.0362 6. Token: 'def' (ID: 750) - Probability: 0.0228 7. Token: 'So' (ID: 4416) - Probability: 0.0226 8. Token: 'We' (ID: 1654) - Probability: 0.0215 9. Token: '```' (ID: 73594) - Probability: 0.0153 10. Token: 'Oh' (ID: 11908) - Probability: 0.0118 ------------------------------------- Probability for token '<th' (ID: 13708): 9.012579539557919e-06 ``` However, the top 1 should be `<th` while it gets very low prob. After fix The top 10 first token probabilities are ``` --- Top 10 Next Token Predictions --- 1. Token: '<th' (ID: 13708) - Probability: 1.0000 2. Token: '<' (ID: 27) - Probability: 0.0000 3. Token: '>' (ID: 29) - Probability: 0.0000 4. Token: 'def' (ID: 750) - Probability: 0.0000 5. Token: 'think' (ID: 26865) - Probability: 0.0000 6. Token: '<pre' (ID: 10120) - Probability: 0.0000 7. Token: '-th' (ID: 7563) - Probability: 0.0000 8. Token: '<td' (ID: 6868) - Probability: 0.0000 9. Token: '(th' (ID: 24365) - Probability: 0.0000 10. Token: '<thead' (ID: 58167) - Probability: 0.0000 ------------------------------------- Probability for token '<th' (ID: 13708): 0.9999651908874512 ``` which is expected. The following are some selected indicative token logits during the overfitting training after fix (below the `<\|endoftext\|>` is a padding token): <img width="2388" height="1386" alt="image" src="https://github.com/user-attachments/assets/1c7149e7-6738-40ec-8164-f1ca614c1036" /> In summary, the previous SFT loss mask mistakenly shifted one bit, so the model failed to learn the first generated token. The trained model behaves like adding one undesired noisy token after the input question, as shown in the top 10 first token probabilities. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-03 14:28:47 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	02e06fa2e5	[trainer] fix: `ray.state.available_resources_per_node` is deprecated (#3313 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Get rid of the following warning: ```log DeprecationWarning: `ray.state.available_resources_per_node` is a private attribute and access will be removed in a future Ray version. ``` Getting available resource per node becomes a DeveloperAPI starting from ray v2.10.0, so it should be pretty safe to make this change: `04c7b49a91` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-09-03 14:23:46 +08:00
Joel	19020f6188	[rollout] feat: deprecate all rollout sharding manager (#3285 ) ### What does this PR do? Deprecate all rollout sharding manager and replaced by `trainer_mode` and `rollout_mode` in hybrid worker.	2025-09-03 13:34:26 +08:00
lantian7	1c6d9feff4	[single_controller, ray] fix: shut ray down after initializes it (#3317 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. To prevent Ascend NPU TBE errors caused by resource leakage, ensure that ray.shutdown()is explicitly called after initializing Ray with ray.init(). Address the first issue in https://github.com/volcengine/verl/issues/3316 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: lantian7 <liuchun22@huawei.com>	2025-09-03 10:51:36 +08:00
X. HU	2bef4acb73	[ci, model] feat: add qwen3 CI testcase on ASCEND NPU (#3300 ) ### What does this PR do? - add qwen3-0.6b grpo in tests/special_npu - set use_torch_compile=False for testcases since the torch_npu version in the test image is 2.5.1 which doesn't support compile mode > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-03 10:51:17 +08:00
echo-rain	844c9299d6	[BREAKING][rollout] feat: Added asynchronous reward model calculation in agent loop (#3152 ) ### What does this PR do? > This PR will be based on [PR#3055](https://github.com/volcengine/verl/pull/3055), and will further support asynchronous calculation of reward models based on the agent loop which only supports asynchronous reward function calculation. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > If you want to use this feature, you need to add the following configuration to the startup script configuration item ```python reward_model.enable_resource_pool=True reward_model.n_gpus_per_node=1 reward_model.nnodes=1 ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-02 19:25:05 +08:00
Jiarui Fang（方佳瑞）	ef43469162	[doc] fix: add rStar2-Agent as work using verl (#3298 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: ByteDance <wangjerry@bytedance.com>	2025-09-02 16:36:32 +08:00
Chunyu	abe5e719ee	[perf] feat: add npu silu &expand the scope of patch models (#3260 ) ### What does this PR do? - Add npu optimized silu. - Patch silu and RMSNorm for more models. - Refresh the performance of Qwen3-8B PEFT SFT. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-02 16:35:53 +08:00
Shizhan Lu	f1aeb929c7	[rollout] feat: Refactor agentloop multiturn (#3171 ) refactoring agentloop's if-else-based logic to a state machine pattern, with a strong focus on reusability. 1. Add Interaction in toolagentloop 2. Refactor agentloop to FSM 3. Designed for reusability	2025-09-02 09:38:11 +08:00
Chi Zhang	91ee0a2c08	[fsdp, model] feat: support FSDP model engine (#3270 ) ### What does this PR do? - Support FSDPEngine and FSDPEngineWithLMHead - Add tests and show that fsdp engine matches with mcore and huggingface on QWen 2.5 0.5b model ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: ziheng.jiang <ziheng.jiang@bytedance.com>	2025-09-01 16:17:45 +08:00
Shangwei-Li	c780fc34b4	[fsdp] feat: add NPU fusion kernels for Qwen3 MoE (#3221 ) ### What does this PR do? This PR adds following NPU fusion kernels to Qwen3 MoE model in Transformers: GroupedMatMul, SwiGLU, RMSNorm and RoPE. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=fusion++npu+moe - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Tested with Qwen3-30B-A3B sp8 fsdp32 on Ascend A2: Without kernel fusion: <img width="1832" height="468" alt="image" src="https://github.com/user-attachments/assets/a8632a94-3a27-46f6-b408-2ebc09a37aa3" /> WIth kernel fusion <img width="1842" height="440" alt="image" src="https://github.com/user-attachments/assets/50b8cc21-6720-42bc-9a9d-ae684f4cb0bf" /> Test results with train_prompt_bsz=512 sp8 fsdp32 on Ascend A2. The orange line represents GPU, the pink line represents NPU, max absolute error in reward is less than 5%. <img width="718" height="444" alt="image" src="https://github.com/user-attachments/assets/3d4a47f6-fb91-40a6-a8e6-bf39545f8375" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: Shangwei-Li <lishangwei2@huawei.com>	2025-09-01 11:41:49 +08:00
Ethan (Yusheng) Su	fd1a121324	[hardware] fix: update source in dockerfile.rocm (#3284 ) ### What does this PR do? > Update the resource in `Dockerfile.rocm` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > docker build -f Dockerfile.rocm -t verl-rocm:local . ``` docker run --rm -it verl-rocm:local python -c "import torch; print('ok')" ``` ### Design & Code Changes > Update the resource in `Dockerfile.rocm` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-01 11:32:44 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	14227201ec	[training_utils] fix: allow empty image_key/video_key in rl dataset (#3281 )	2025-08-31 17:35:00 +08:00
Minghui Jia	98676e8add	[misc] fix: use uid for grouping in validation to avoid prompt confusion in multimodal tasks (#3280 ) ### What does this PR do? Fix #3238. Follow #2815. #2815 seems to have no follow-up process. This PR switched from text prompt to grouping by uid when calculating validation metrics. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2815. - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-31 10:13:34 +08:00
Mighten Dai	f9035b7016	[data] fix: `None` has no attribute `get` when `extra_info` in Parquet is NaN (#3272 ) ### What does this PR do? This PR wants to fix a bug in rl_dataset.py ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test All modifications can be covered with existing CI test cases. ### API and Usage Example API and usage remain the same. ### Design & Code Changes This PR injects a default dict when `extra_info` is None, due to the `extra_info` field in Parquet file is NaN. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-30 09:50:26 +08:00
kAIto47802	a73b2aba85	[worker] fix: Fix missing `rollout_log_probs` argument in policy loss functions (#3274 ) ### What does this PR do? <!-- > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. --> In the recent PR: - https://github.com/volcengine/verl/pull/2953, the file `workers/actor/dp_actor.py` was updated so that `rollout_log_probs` is passed to `policy_loss_fn`: `38d23914ee/verl/workers/actor/dp_actor.py (L448-L456)` In that PR, the "vanilla" policy loss function was modified to accept `rollout_log_probs` as an argument. However, other policy loss functions (e.g., "gspo") were not updated accordingly, which leads to an error such as: ``` TypeError: compute_policy_loss_gspo() got an unexpected keyword argument 'rollout_log_probs' ``` when setting `config.policy_loss.loss_mode` to one of these alternatives. Therefore, in this PR, `rollout_log_probs` is also added as an argument to the other policy loss functions. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-30 09:08:20 +08:00
richard wang	e1603dc97f	add gptoss grpo example script (#3212 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Adding a script to run gpt-oss 20B model with VeRL. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: RichardW <richard.junwang@bytedance.com> Co-authored-by: GeLee-Q <leege233@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-08-29 11:32:24 -07:00
ZLiao	4d24449193	[recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. (#3252 ) ### What does this PR do? Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. Related issue: [issue](https://github.com/volcengine/verl/issues/3248) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Design & Code Changes Removed the two unnecessary parameters dp_model_parallel_size and rollout_world_size from the relevant files. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-29 21:48:42 +08:00
Minghui Jia	fc05070fa0	[ckpt] fix: TypeError when save VL model ckpt (#3268 ) ### What does this PR do? Fix #3267 . ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+checkpoint+ - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>	2025-08-29 21:41:09 +08:00
Huazhong	cc2799b235	[hardware] fix: Call synchronization when using the td.to("cpu") operation on NPU to avoid potential precision issues (#3222 ) ### What does this PR do? In verl, the driver process aggregates the computation results of workers via Ray. Therefore, after a worker completes its computation job, it will package the output using tensordict and transfer it to the CPU. Since the `to` operation of tensordict is non-blocking, when transferring data from a device to the CPU, it is necessary to ensure that a batch of data has been completely transferred before being used on the host; otherwise, unexpected precision issues may arise. Tensordict has already noticed this problem and fixed it. Ref: https://github.com/pytorch/tensordict/issues/725 However, the relevant modifications only cover CUDA and MPS devices and do not take effect for third-party devices such as NPUs. This patch fixes this issue, and the relevant modifications can be removed once the fix is merged into tensordict. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-28 20:42:24 +08:00
Chi Zhang	1065a29d14	[megatron, model] feat: add MegatronEngine, MegatronEngineForCausalLM (#3235 )	2025-08-28 19:36:05 +08:00
Changyi Yang	e95bd9edf2	[sglang] feat: add native sgl server (#3090 ) ### What does this PR do? Summary This PR introduces a native HTTP server implementation for SGLang, aiming to fundamentally improve flexibility, scalability, and integration capabilities. By transitioning to a more robust client-server architecture, this change addresses several core bottlenecks in the current design. Key Changes * Engine Replacement – Replaced the original `sgl.Engine` instance with a native HTTP server. ✅ Completed * Distributed Optimization – Utilizing a server-based architecture to remove the requirement of gathering all data to TP rank 0. This change resolves the previous `dist.barrier` timeout issue by replacing the collective wait with per-sample synchronization. 🚧 In Progress * Router Integration – Plan to integrate with the native SGLang router for streamlined request handling. 💡 Nice to have Motivation The current `sgl.Engine` driver model presents several architectural challenges, particularly in complex distributed environments. Moving to an HTTP server architecture is motivated by the need to solve the following critical issues: 1. Eliminate Data Flow Bottlenecks and Improve Performance: * Problem: The data flow logic of the existing driver process is misaligned with the training data flow. It requires all data for a single SGLang instance to be gathered to TP rank 0. This data is then processed by the tokenizer manager and sent via ZMQ to the various schedulers. As a result, the `preprocess` and `postprocess` steps are slower than expected. * Solution: The HTTP server architecture decentralizes this process, allowing each rank to handle requests independently. This removes the "gather to rank 0" bottleneck, dramatically improving data throughput and overall performance. 2. Resolve CPU Resource Contention: * Problem: At the request level, the SGLang driver object cannot be pickled for use in subprocesses. This limitation means that the request-level asynchronous rollout logic and the engine itself are forced to compete for the same CPU time slices, leading to performance degradation. * Solution: By decoupling the request handling (client) from the inference engine (server), we isolate the processes, eliminating the CPU contention and allowing for more efficient resource utilization. 3. Fix Distributed Synchronization Timeouts: * Problem: The `dist.barrier` timeout is a frequent issue where worker ranks remain idle while waiting for TP rank 0 to complete its intensive processing. This collective wait time creates inefficiency and can lead to failures. * Solution: The HTTP server model shifts this from a collective barrier to a per-sample synchronization. Workers communicate with the server as needed, removing the long wait times and making the distributed setup more stable and efficient. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-28 12:40:19 +08:00
Yuhang Liu	1e413344a2	[recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding (#3242 ) ### What does this PR do? This PR introduces a new recipe, `infigui-g1`, for training Multimodal Large Language Models (MLLMs) in GUI grounding tasks. This recipe implements a reinforcement learning approach that significantly improves the model's ability to understand and interact with graphical user interfaces. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/search?q=repo%3Avolcengine%2Fverl+gui&type=pullrequests - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test The effectiveness of this recipe has been validated through experiments. Key results are as follows: - The training curves for reward, validation accuracy, and exploration success rate all show a upward trend. - After 156 steps of training on sample data, the 3b model achieves a score of 41.2 on the `screenspot-pro` benchmark, a substantial improvement over the base model's score of 18.2. <img width="345" height="291" alt="Screenshot 2025-08-27 172010" src="https://github.com/user-attachments/assets/9ecd93d5-4f9b-4c40-831c-79a50fd197c4" /> <img width="347" height="292" alt="Screenshot 2025-08-27 171902" src="https://github.com/user-attachments/assets/2e437c1f-9eb0-4106-a6c3-b22125026a79" /> <img width="346" height="293" alt="Screenshot 2025-08-27 171928" src="https://github.com/user-attachments/assets/9c94515d-1501-40f4-979c-95e2f819dc62" /> ### API and Usage Example The recipe is self-contained and can be run using the provided scripts. For example, to run training with the 3B parameter model: ```bash # In verl path bash recipe/infigui-g1/run_3b.sh ``` ### Design & Code Changes This PR adds a new, independent recipe located in `recipe/infigui-g1/`. The changes are fully encapsulated within this directory and do not affect any other part of the codebase. The new files include: - `recipe/infigui-g1/README.md`: An introduction to the recipe. - `recipe/infigui-g1/run_3b.sh`, `run_7b.sh`: Scripts to launch training. - `recipe/infigui-g1/reward_fn.py`: Custom reward function implementation. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-27 23:35:22 +08:00
Kion Fallah	53b68c638b	[fsdp, training_utils] Fix: LoRA w/ VLMs when Using Layered Summon (#3231 ) ### What does this PR do? Currently, LoRA parameters are not correctly streamed when training Qwen 2.5 VL with `layered_summon=True`. This is due to a missing prefix for the Qwen VL models. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: "VLM LoRA", "LoRA" - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Without this change, when using (1) Qwen VLM, (2) LoRA, (3) layered summon, we see this log when weight updates are sent to vLLM: ``` [36m(WorkerDict pid=424928)[0m INFO:2025-08-26 22:22:24,788:vLLM load weights, loaded_params: 0 ``` After: ``` [36m(WorkerDict pid=424928)[0m INFO:2025-08-26 22:22:24,788:vLLM load weights, loaded_params: 504 ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-27 21:51:42 +08:00
A1waysBeenHere	b7df22ec51	[trainer] fix: Unified use of the def to() in Class DataProto (#3227 ) Removed all `.to()` operations work on TensorDict instance directly, make them use `def to()` in DataProto instead. > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-27 19:55:08 +08:00
H (on leave)	ff4f30b467	[doc] fix: fix slack invitation link (#3230 )	2025-08-27 07:45:09 +08:00
Feng Yao	b8dc5377c6	[BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling (#2953 ) ### What does this PR do? Support [vLLM-FSDP off-policy importance sampling correction](https://fengyao.notion.site/off-policy-rl) using Truncated Importance Sampling (TIS): <img width="859" height="382" alt="TIS" src="https://github.com/user-attachments/assets/adc8f797-aa14-4b29-b265-a682c281d08e" /> ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=gae \ data.train_files="$train_files" \ data.val_files="$test_files" \ data.train_batch_size=1024 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=Qwen/Qwen2.5-32B-Instruct \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ critic.optim.lr=1e-5 \ critic.model.use_remove_padding=True \ critic.model.path=Qwen/Qwen2.5-32B-Instruct \ critic.model.enable_gradient_checkpointing=False \ critic.ppo_micro_batch_size_per_gpu=8 \ critic.model.fsdp_config.param_offload=False \ critic.model.fsdp_config.optimizer_offload=False \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger='["console","wandb"]' \ trainer.project_name='verl_example' \ trainer.experiment_name='Qwen2.5-32B-Instruct_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=4 \ trainer.save_freq=20 \ trainer.test_freq=10 \ trainer.total_epochs=15 \ actor_rollout_ref.rollout.calculate_log_probs=True \ # add this config to return rollout prob +actor_rollout_ref.actor.behav_imp_weight_cap=10.0$@ # add this config to set up C value in TIS ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Narsil-Dinghuai Zhang 张鼎怀 <dinghuai233@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: LiyuanLucasLiu <llychinalz@gmail.com>	2025-08-26 14:06:07 -07:00
sty-yyj	5362d704be	[rollout] fix: Restore the parameter 'limit_images' in RolloutConfig (#3217 ) ### What does this PR do? - This PR adds the parameter `limit_images` in RolloutConfig. Users can specify the image limit in vllm by setting `+actor_rollout_ref.rollout.limit_images=xxx` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-26 20:30:52 +08:00
Blue Space	9f0f8b0e7c	[ci] fix: fix type convergence check (#3219 ) ### What does this PR do? [ci] fix: fix type convergence check ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-26 14:17:18 +08:00
Huapeng Zhou	27b63c724a	[env, sglang] feat: Bump new sglang version to fix vlm OOM (#3216 ) ### What does this PR do? - Bump new version of sglang - This version's sglang can fix vlm OOM issue, detail are in: https://github.com/sgl-project/sglang/issues/9365 ### Test Using instruction following https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/latest_sglang.md Now we have new version of sglang: <img width="786" height="154" alt="image" src="https://github.com/user-attachments/assets/bcec557e-196c-40c0-aa0f-c19d9f5c3e98" /> `gsm8k`: using `verl/examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` [Wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/dtcdin9b?nw=nwuserpopsoda) <img width="532" height="329" alt="image" src="https://github.com/user-attachments/assets/12f67d1a-a57e-497d-bfe5-6ff8c642e83f" /> It can work well. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-26 13:29:36 +08:00
Chi Zhang	4ed7811813	[megatron] refactor: refactor MegatronPPOActor (#3206 ) ### What does this PR do? - Make megatron related print only print on rank zero - Remove unused code in megatron actor - Modularize megatron loss computation so that it can be used for SFT as well ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-26 10:41:57 +08:00
Slim Frikha	7592d69cbb	[trainer] refactor: PPO config validation fast fail (#3187 ) ### What does this PR do? Make main ppo script validate config as soon as all needed info is available. this enables the script to fail as fast as possible in case of bug in config. New changes would avoid downloading and loading tokenizer and loading data before validating config solve #3182 ### Design & Code Changes Isolated config validation in utils (out of PpoRayTrainer) and call it from main_ppo as soon as possible.	2025-08-26 10:31:39 +08:00
Liwei Ma	b4a410197c	[doc] fix: fix a documentation typo for nsys (#3214 ) ### What does this PR do? [doc] fix: fix a documentation typo for nsys ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-26 10:11:15 +08:00
Zhunheng Wang	f67dc19503	[rollout] fix: apply copy_to_local before init hf config (#3204 ) Change-Id: Ic0ddfdfa13a38a56571b9c59125e9ebeea5c7802 ### What does this PR do? - Fixed a bug where the original HDFS path was passed due to not using `copy_to_local` when initializing the hf config. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: wangzhunheng <wangzhunheng@bytedance.com>	2025-08-26 09:26:00 +08:00
pool	40bf9628ee	[data] fix: update parquet_files type check to support multi-file input (#3211 )	2025-08-26 05:18:59 +08:00
Blue Space	9b6a07fa77	[docker] feat: update to vllm 0.10.0, mcore 0.13, transformers 4.55.4 (#3192 )	2025-08-26 05:17:57 +08:00
YumiMom	a5df7d31ea	[perf] fix: fix profiler discrete mode unavailability (#3188 ) ### What does this PR do? - Fix the issue where profiling cannot be collected in discrete mode, for both NPU and nsys. - Adjust the corresponding unit tests accordingly. - Adjust the npu profiler script due to changes in ref.yaml In discrete mode, distribution is handled through the `annotate` class method of the `DistProfiler` class in `verl/utils/profiler/profile.py`. Adjust the `annotat` method of NPUProfiler and NsightSystemsProfiler to be instance method. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-25 19:39:31 +08:00
Shangwei-Li	2398d36be3	[recipe] feat: Add Qwen3 30B MoE NPU recipe (#3189 ) ### What does this PR do? > Update recipe/dapo/run_dapo_qwen3_30b_npu.sh. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=fsdp+npu+30b+recipe - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Critic/rewards/mean Comparison Chart, where the orange line represents ascend NPU, the pink line represents GPU. <img width="3182" height="1272" alt="image" src="https://github.com/user-attachments/assets/5c275127-6cb3-4bf9-ac89-0fa6abb668c0" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```shell # Add code snippet or script demonstrating how to use this cd /path/to/verl bash recipe/dapo/run_dapo_qwen3_30b_base_npu_fsdp.sh ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: Shangwei-Li <lishangwei2@huawei.com>	2025-08-25 19:38:23 +08:00
Chi Zhang	e243d6dd66	Revert "[rollout] feat: use dummy load_format when init AsyncServer" (#3207 ) Reverts volcengine/verl#3184	2025-08-25 19:15:52 +08:00
Slim Frikha	11a43b6cad	[env] fix: Improve License Check Hook Flexibility (#3202 ) ### What does this PR do? Solve #3201 #### Problem The existing license check hook scans all directories recursively from a single root directory, which causes issues in local development environments: * Virtual environments (`.venv`, `venv/`) get scanned and fail license checks * No easy way to exclude common build/cache directories without hardcoding exclusions * Different behavior between local development (with venvs) and CI/CD (clean environment) #### Solution Modified the `check_license.py` script to accept multiple target directories instead of a single root directory with exclusions. ### Design & Code Changes Changed argument from `--directory` to `--directories` * Now accepts multiple `Path` arguments using `nargs="+"` * Allows specifying exactly which directories to scan * in local mode: `--directories examples recipe scripts tests verl setup.py` * in github workflow: `--directories .`	2025-08-25 16:50:15 +08:00
none0663	58c847b17f	[doc] fix: set use_dist_checkpointing to False for ref model in qwen3moe-30b script (#3198 ) ### What does this PR do? Set use_dist_checkpointing to False for ref model in qwen3moe-30b script, because there is not dist_megatron_ckpt model path for ref model.	2025-08-25 12:33:24 +08:00
Joel	cb5818c6fc	[rollout] fix: add missing extra_reward_info to AgentLoopOuput (#3194 ) ### What does this PR do? Fix https://github.com/volcengine/verl/pull/3055, add missing `extra_reward_info` to AgentLoopOuput, which is needed by metrics calculation.	2025-08-25 12:23:32 +08:00
Huapeng Zhou	7ff2386987	[rollout, sglang] feat: Add sync mode for bash (#3186 ) ### What does this PR do? - Use `sync` mode for `dapo`, `gsm8k` and `geo` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`	2025-08-24 20:43:11 -07:00
Chi Zhang	2c7a9c5708	[rollout] feat: use dummy load_format when init AsyncServer (#3184 ) ### What does this PR do? - Loading weights in AsyncServer is duplicated and is time-consuming for large models - Use dummy weights instead as the actual weights will be transferred by the trainer ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-25 10:30:48 +08:00
董益宏	28a3e418d8	[misc] feat: Add RL-PLUS to awesome work list (#3197 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Adding RL-PLUS to the README as a list of work that used veRL, with only a 1-line change to the README.md. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-25 09:33:33 +08:00
Zhe Zhang	4ea0583bad	[Optimize]Safe tool parameter access standardization in SGLang rollout (#3196 ) Fix https://github.com/volcengine/verl/issues/3195 Changes: 1. 🔒 Replace all direct dict[key] access with .get(key, {}) pattern for tool kwargs 2. ✅ Add validation in _preprocess_prompt_to_async_rollout_requests 3. 🧪 New test cases covering: • Missing tool configs • Partial execute_kwargs • Empty tool schemas Impact: • Prevents KeyError crashes when tools/kwargs are missing • Maintains existing flexible tool parameter system • Zero breaking changes to valid configurations	2025-08-24 12:42:58 -07:00
Chayenne	3a394c9bd0	[recipe] fix: Setting DAPO baseline in SGLang multi-turn RL (#3175 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR adds the dapo baseline in SGLang multi-turn rollout. Basically speaking, the previous DAPO multi-turn baseline with retool doesn't actually converge, since we find that the previous reward of retool is just encouraging the model to generate more turns to call more tools. The answers are not actually correct. In this fix, we (SGLang RL Group) do a manual SFT and make a new model `font-info/qwen3-4b-sft-SGLang-RL` instead of `Qwen/Qwen3-4B-Instruct-2507`. Without finetune, the model can not converge. In the same time, we reduce the default value of minial reward in retool, from 0 to -0.6, `result["score"] = min(-0.6, result["score"] + tool_call_reward)`. Thus, if a model can not generate the correct answer, it will get a score as -0.6, rather than 0. So in our demonstration, we do converge! ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: Zhuorany <yzr1914001753@gmail.com> Co-authored-by: mao cheng <maocheng@berkeley.edu> Co-authored-by: Hecate0821 <hec4te0821@gmail.com> Co-authored-by: maocheng23 <maocheng@berkeley.edu>	2025-08-22 21:26:44 -07:00
Chi Zhang	bf56a2aa27	[megatron] feat: set_expandable_segments for megatron (#3181 ) ### What does this PR do? - As title - We use set_expandable_segments to resolve memory fragmentation ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-23 10:49:44 +08:00
Fabian Joswig	ce89063712	[misc] feat: Add L40S and A40 flop counts (#3177 ) ### What does this PR do? Adds flop counts for more GPUs ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-22 17:31:46 +08:00
OC	f6f910069b	[doc] fix: add qwen3moe-30b script and fix error in qwen3-235b (#3174 ) 1. add qwen3moe-30b script for 1 to 4 H20 nodes with best performance 2. fix error in qwen3-235b: - vllm enable_expert_parallel may result invalid output - megratron num_layers_in_last_pipeline_stage is a depreciate option --------- Co-authored-by: Yan Bai <bayan@nvidia.com>	2025-08-22 13:59:24 +08:00
Huapeng Zhou	0e15c9b11c	[sglang] fix: remove unused padding in SGLang rollout (#3138 ) ### What does this PR do? What does this PR do? There are some unused padding talked in this issue: https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/193 - There are just 5 key fields which need to return back after rollout(example in `agent_loop`): ```python batch = TensorDict( { "prompts": prompt_ids, # [bsz, prompt_length] "responses": response_ids, # [bsz, response_length] "response_mask": response_mask, # [bsz, response_length] "input_ids": input_ids, # [bsz, prompt_length + response_length] "attention_mask": attention_mask, # [bsz, prompt_length + response_length] "position_ids": position_ids, # position_ids: [bsz, 3, prompt_length + response_length] or [bsz, prompt_length + response_length] }, batch_size=len(inputs), ) ``` - Remove some unused variable like `prompt_loss_mask` - Make `response_position_id` all zero tensor - Copy class to avoid constructing a new class ### Test `over_sample = 0.1` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/1p87zi7v?nw=nwuserpopsoda) <img width="1555" height="680" alt="image" src="https://github.com/user-attachments/assets/b837acab-824d-42c6-ad3d-8342d06397d1" /> No issue. `over_sample = 0.0` [wandb](https://wandb.ai/popsoda-university-of-washington/multi-turn-grpo-qwen2.5-3b-sglang/runs/xloii5wm?nw=nwuserpopsoda) <img width="1532" height="683" alt="image" src="https://github.com/user-attachments/assets/fd69be47-8182-4461-86d0-86063e6f8e1a" /> As expected too ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`	2025-08-21 14:01:50 +08:00
Chayenne	5b5e09d9cc	[sglang] fix: fall back to default FSDP1 (#3156 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-08-20 15:04:59 -07:00
Chayenne	864ba99876	[fsdp, trainer, tool] feat: add memory snapshot & visualization support for debugging GPU memory leaks (#3099 ) ### What does this PR do? This PR adds a memory snapshot and visualization tool to help identify potential GPU memory leaks during training. In some training runs, we observed increasing GPU memory usage across steps, suggesting memory might not be properly released. To support debugging, this PR enables: * Periodic memory snapshot dumping via PyTorch's internal APIs. * Manual snapshot dumping at key points (e.g., after each step). * Easy integration with `torch.memory_viz` for post-hoc visualization. --- ### Checklist Before Starting * [x] Search: [[memory snapshot PRs](https://github.com/volcengine/verl/pulls?q=is%3Apr+memory+snapshot)](https://github.com/volcengine/verl/pulls?q=is%3Apr+memory+snapshot) * [x] Title: `[fsdp, trainer, tool] feat: add memory snapshot & visualization support` --- ### Test * Enabled `enable_memory_visualize` in config and verified snapshot `.pickle` files are generated. * Confirmed snapshot files work with `torch.memory_viz`. * Validated both periodic and manual snapshot dumping. --- ### API and Usage Example Enable in config: ```yaml fsdp_config: enable_memory_visualize: true memory_snapshot_interval_sec: 300 memory_snapshot_out_dir: "./mem_snapshots" ``` Manually dump after each step: after each step, adds like this: ```python if self.config.actor_rollout_ref.actor.fsdp_config.enable_memory_visualize: self.actor_rollout_wg.dump_memory_snapshot( tag=f"post_update_step{self.global_steps}", sub_dir=f"step{self.global_steps}" ) ``` --- ### Design & Code Changes * New FSDP config fields: `enable_memory_visualize`, `memory_snapshot_interval_sec`, `memory_snapshot_out_dir` * New utility functions in `memory_utils.py`: * `enable_memory_visualize()` * `dump_memory_snapshot(...)` * `MemorySnapshotSampler` (background thread) * Integrated into `FSDPWorkers` and training loop (`ray_trainer.fit()`) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: AniZpZ <aniz1905@gmail.com> Co-authored-by: narutolhy <582909902@qq.com>	2025-08-20 14:07:04 -07:00
Joel	31771cfade	[rollout] feat: add response token logprobs in agent loop output (#3151 ) ### What does this PR do? Add response token logprobs in agent loop output	2025-08-20 23:19:52 +08:00
ZLiao097	d2126e7afd	[recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU (#3146 ) ### What does this PR do? Provide an script for DAPO-training qwen2.5-32B on NPU, and update experiment result. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [[recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU](https://github.com/volcengine/verl/pull/2836) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. The following are some comparison charts of relevant data from script testing, where red represents NPU and blue represents GPU. Critic/rewards/mean Comparison Chart <img width="1314" height="714" alt="image" src="https://github.com/user-attachments/assets/3c303100-7106-491b-a6ea-e0bd1926076c" /> Response_length/mean Comparison Chart <img width="1322" height="714" alt="image" src="https://github.com/user-attachments/assets/9fa01f6f-2774-4b07-a38b-71cb6b5c8359" /> Val-core/math_dapo/acc/mean@32 Comparison Chart (Test by aime-2024) <img width="1320" height="716" alt="image" src="https://github.com/user-attachments/assets/b6912e3c-89c6-4999-90bb-fa961edc6e4a" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash cd /path/to/verl bash recipe/dapo/run_dapo_qwen2.5_32b_npu.sh ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-20 20:13:57 +08:00
Chayenne	012d972223	[fsdp, sglang] fix: Using Agreesive Empty Cache instead (#3136 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-20 19:32:48 +08:00
Kiv Chen	944264b583	[rollout] fix: KeyError "CPU" init agent loop workers (#3141 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Fix #3137. Take into consideration the ray nodes set to `num_cpus=0`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 19:31:49 +08:00
Chi Zhang	23d6f77513	[megatron] fix: fix megatron micro_batch_size assertion (#3142 ) ### What does this PR do? - fix megatron micro_batch_size assertion. When using `use_dynamic_bsz`, we don't need to set `micro_batch_size` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 14:37:34 +08:00
Blue Space	ae46f5a41a	[ci] fix: model tests, transformers 4.55 has troubles with backward (#3139 ) ### What does this PR do? [ci] fix: model tests, transformers 4.55 has troubles with backward ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 13:33:12 +08:00
none0663	9d66534060	[doc] feat: documentation Update, Ray Job Management Commands (#3131 ) ### What does this PR do? Added two Ray CLI commands to the documentation for better job monitoring: 1. Job ID Retrieval Command `ray job list \| grep submission_id \| grep JobStatus \| grep RUNNING \| grep -oP 'raysubmit_[^'\''"]+' \| head -n 1` This pipeline fetches the latest running job's submission ID by: - Filtering active jobs (`RUNNING` status) - Extracting `raysubmit_` IDs - Returning the first match 2. Continuous Log Streaming* `ray job logs <Submission ID> --follow` Added the `--follow` parameter to enable real-time log streaming, allowing users to: - Continuously monitor job output - Debug long-running processes interactively - Maintain persistent log connection until job completion These additions enhance operational visibility for Ray job management workflows.	2025-08-20 11:03:48 +08:00
binary-husky	de0b31ceee	[sglang] feat: make sglang properly handle the `max_num_seqs` configuration (#3134 ) ### What does this PR do? vllm async engine receives the `max_num_seqs` option from yaml, but sglang ignore it, this PR patches this issue. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.	2025-08-20 10:59:35 +08:00
Wei Wu	26ccffa83f	[rollout] fix: numpy.int64 serialization error in Weave tracing during validation (#3112 ) ### What does this PR do? During validation steps, the following Pydantic serialization error occurs when Weave tracing is enabled: ```bash (AgentLoopWorker pid=2278557, ip=x) weave: Task failed: PydanticSerializationError: Unable to serialize unknown type: <class 'numpy.int64'> [repeated 16286x across cluster] (AgentLoopWorker pid=2278557, ip=x) ERROR:2025-08-18 16:45:08,385:Task failed: PydanticSerializationError: Unable to serialize unknown type: <class 'numpy.int64'> [repeated 16279x across cluster] ``` The issue occurs in: `313366fd85/verl/experimental/agent_loop/agent_loop.py (L315)` When the batch doesn't contain an "index" field (which commonly happens during validation), `np.arange()` creates a numpy array with numpy.int64 elements. These values are then passed through the following chain: 1. `get_trajectory_info()` → `trajectory_info` dict with `sample_index`: `numpy.int64` 2. `rollout_trace_attr()` → `attributes` dict with `sample_index`: `numpy.int64` 3. `weave.attributes(attributes)` → `Pydantic serialization fails` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Design & Code Changes Convert numpy array to Python native integers to ensure Pydantic compatibility. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 10:12:11 +08:00
Chunyu	fc2bfd9a72	[misc] fix: update peft's version in requirements-npu.txt (#3127 ) ### What does this PR do? As title. Limit the version of PEFT to ensure sft's workflow is not interrupted on npu. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 10:11:29 +08:00
Tialo	6469be213e	[recipe] fix: make compute of `step` consistent across all trainers (#3132 ) ### What does this PR do? follow-up to #3117 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 09:54:29 +08:00
Blue Space	47033fc8a2	[megatron] fix: mbridge save/load (#2519 ) ### What does this PR do? Currently mbridge will not save optimizers. Fix mbridge save and load path. Add CI test. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-08-20 09:09:56 +08:00
Xinyan Guan	5deb8cc9a6	[megatron] fix: add temperature parameter for logits scaling (#3133 ) ### What does this PR do? > This PR fixes the handling of the temperature parameter in Megatron by explicitly propagating it through the forward path. Logits are now scaled by dividing with temperature during processing, aligning Megatron with the FSDP logits handling implemented in [dp_actor.py#L192](https://github.com/volcengine/verl/blob/main/verl/workers/actor/dp_actor.py#L192). The issue became apparent when running Megatron training with temperature=0.9. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > Here is the comparison plot for `raw-mcore (t=0.9)`, `fsdp (t=0.9)`, and `fixed-mcore (t=0.9)`. <img width="949" height="497" alt="image" src="https://github.com/user-attachments/assets/7f06120b-bf8f-4222-86ed-138fbca382f7" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: xinyanguan <xinyanguan@tencent.com>	2025-08-20 08:46:13 +08:00
Tialo	afd759789b	[trainer] fix: move `testing` out of `step` timings (#3117 ) ### What does this PR do? Possible fix of #3116 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [query](https://github.com/volcengine/verl/issues?q=step%20timing) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submittin - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-19 19:54:59 +08:00
none0663	6e0fa3f0df	[trainer] fix: only load memory in micro batch for compute_log_prob, compute_values and update_critic (#3094 ) ### What does this PR do? Modified data loading logic to transfer only micro-batches to GPU memory during training/inference instead of the entire batch for saving memory. Like the pr `12c83e8ada` and https://github.com/volcengine/verl/pull/2908	2025-08-19 18:18:01 +08:00
Chi Zhang	04efe11df6	[ci] fix: fix precommit (#3128 ) ### What does this PR do? - fix precommit ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-19 18:17:05 +08:00
Joel	c3c2f9a9bc	[rollout] feat: compute reward score in agent loop (#3055 ) ### What does this PR do? Compute reward score for each prompt once the agent loop is finished, this can significantly hide the reward computation time. https://github.com/volcengine/verl/issues/2618	2025-08-19 16:38:23 +08:00
Chi Zhang	8494135e5c	[rollout] feat: use rollout worker in MegatronWorker (#3111 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-19 15:07:52 +08:00
none0663	43cb93c8d1	[trainer] fix: only load memory in micro batch for megatron backend (#3106 ) ### What does this PR do? Modified data loading logic to transfer only micro-batches to GPU memory during training instead of the entire batch for saving memory for megatron backend. Like the pr `12c83e8ada` https://github.com/volcengine/verl/pull/2908, and https://github.com/volcengine/verl/pull/3094	2025-08-19 13:04:34 +08:00
Zhe Zhang	dd13051602	Fix python version (#3103 )	2025-08-18 20:46:09 -07:00
Sahil Patel	6e55669fd0	[trainer, worker] fix: setting old log probs equal to log probs for on policy training (#3119 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Since training backend (FSDP/Megatron)'s recompute of log probs are not accurate, so given an exact batch forwarding twice, the `old_log_probs` vs `log_probs` are not the same even in on policy training. This PR quickly fix this issue. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-19 09:43:59 +08:00
Blue Space	603c07d999	[doc, perf] feat: add profiling doc (#3113 )	2025-08-19 09:06:33 +08:00
Chi Zhang	313366fd85	[misc] fix: fix precommit (#3109 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-18 16:46:32 +08:00
thibautbar	ee5ac8e182	[doc] feat: Add Kimina-Prover-RL to awesome work (#3108 ) ### What does this PR do? Add Kimina-Prover-RL to the list of awesome work using verl. Kimina-Prover-RL is a training pipeline designed to teach large language models to solve formal proof goals in Lean 4, using a two-stage output structure: a natural language reasoning trace followed by corresponding Lean code. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-18 16:43:33 +08:00
YumiMom	97b65c63c7	[perf] fix: fix npu profiler and add mstx UT (#3052 ) ### What does this PR do? - fix the parameter passing error for profile_level - fix the error when creating npu profiler in discrete mode - modify the execution script - modify ascend profiling doc - add the discrete parameter in tool_config - add mstx_profile UT ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-18 15:18:08 +08:00
Qizhi Chen	507e932941	[fsdp, trainer, ckpt] feat: support custom model init and merging for FSDP (#3012 ) ### What does this PR do? This PR adds support for custom model initialization and merging in fsdp. Custom models are no longer required to follow the naming conventions like `xxxForCausalLM` or `xxxForConditionalGeneration`. Besides, it can be loaded using different AutoClass specified by `auto_map`, such as using `AutoModelForCausalLM` to load `xxxForConditionalGeneration` or `xxxForChat`, etc.. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test 1. Custom Model Definition ```python from transformers import Gemma3ForConditionalGeneration class CustomModelForChat(Gemma3ForConditionalGeneration): ... ``` 2. Custom Model Config ```json { "architectures": [ "CustomModelForChat" ], "auto_map": { "AutoConfig": "configuration_custom.CustomModelConfig", "AutoModelForCausalLM": "modeling_custom.CustomModelForChat" }, ... "transformers_version": "4.53.0", } ``` 3. Testing If the model config has `auto_map`, then load the specified AutoClass, otherwise, fall back to the default architectures handling. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-18 14:56:32 +08:00
syt-nju	c86c831893	[recipe] fix: checkpoint in last step might be ignored to save in dapo (#3034 ) 1. The is_last_step variable is not updated in a timely manner and should be updated promptly after self.gen_step is modified. 2. If, in the last step, the batch is not fully formed due to the filter_group logic, it will trigger a "continue" statement, thereby skipping the checkpoint saving logic. ### What does this PR do? This PR fixes two related issues: Ensures the is_last_step flag is correctly updated after self.gen_step changes, to properly indicate the last generation step. Prevents the checkpoint-saving logic from being incorrectly skipped in the last step when the batch is not full due to filtering (e.g., via filter_group). These changes help ensure that checkpoints are saved appropriately at the end of generation, improving reliability and consistency in training or inference workflows. ### similar PR https://github.com/volcengine/verl/pull/2619#issue-3242284106 This PR seems to address a similar issue, but I still encountered problems when using its code. Therefore, I made further modifications based on that version. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-18 11:19:10 +08:00
Chi Zhang	bc1b760fb1	[BREAKING] [rollout] feat: add a separate rollout worker (#3071 ) ### What does this PR do? - Introduce a separate rolloutworker that can be instantiated without hybridengine - Introduce a ModelConfig that wraps all model related config - Remove hf_rollout (will replace with TP support in the future if needed) - Next PR: modify MegatronWorker to use separate rollout worker ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-18 10:57:33 +08:00
Chayenne	4c3310db28	[sglang] fix: Qwen VLM Baseline and sgl CI (#3101 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. fix recent borken CI on SGLang and asend. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-17 19:28:25 -07:00
zlHuang	966719c36a	Update ray_trainer.py (#3092 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-17 20:29:43 +08:00
Chayenne	e32cceea4a	[sglang] fix: Qwen VLM Baseline (#3083 ) ### What does this PR do? This PR fix the script in https://github.com/volcengine/verl/blob/main/examples/grpo_trainer/run_qwen2_5_vl-7b.sh The core issue was `TypeError: 'NoneType'` object is not callable which occurred because the variable flash_attn_varlen_func was assigned None. This happened when the primary import from `transformers.modeling_flash_attention_utils` failed. I add a nested try...except block to first attempt the import from transformers, and if that fails, to then try importing `flash_attn_varlen_func` directly from the `flash_attn` package as a solution. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. I added a new test script here: `examples/grpo_trainer/run_qwen2_5_vl-7b-sglang.sh` ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-16 18:22:31 -07:00
Wei (Will) Feng	e764d408df	[fsdp] fix: patch fsdp2 to support hf transformer==4.54.0 and above (#3072 ) ### What does this PR do? @ETOgaosion found that fsdp2 cannot work with GenericForTokenClassification in transformer==4.54.0+ https://github.com/pytorch/pytorch/issues/160068 https://github.com/volcengine/verl/pull/2947 FSDP2 complains about object layerout mismatch when constructing FSDPGenericForTokenClassification. The solution is to let FSDPModule inherit ABC as well this should unblock @vermouth1992 on deprecating fsdp1 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test add following script to apply_fsdp2 ``` # pip install transformers==4.54.0 from transformers import AutoModelForTokenClassification model = AutoModelForTokenClassification.from_pretrained("Qwen/Qwen2-0.5B") with maybe_patch_fsdp_module(model): fully_shard(model, **fsdp_kwargs) return ``` ``` PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=console \ trainer.val_before_train=False \ trainer.n_gpus_per_node=2 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 \ critic.strategy=fsdp2 \ actor_rollout_ref.actor.strategy=fsdp2 ``` > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-16 15:12:31 +08:00
Hsiang-Yu Tsou	6bbbff13a1	[fsdp] fix: add missing mixed precision configuration to FSDPEngineConfig (#3068 ) ### What does this PR do? The `FSDPEngineConfig` dataclass was missing the `mixed_precision` field that the runtime code expected. By adding: ```python mixed_precision: Optional[dict[str, Any]] = None ``` The dataclass now properly supports the mixed precision configuration that the FSDP workers code uses with `fsdp_config.get("mixed_precision", None).` `55e3c5bc09/verl/workers/fsdp_workers.py (L371)` Otherwise, if we run with: ```bash python3 -m verl.trainer.main_ppo \ actor_rollout_ref.actor.fsdp_config.mixed_precision.param_dtype=bf16 \ actor_rollout_ref.actor.fsdp_config.mixed_precision.reduce_dtype=fp32 \ actor_rollout_ref.actor.fsdp_config.mixed_precision.buffer_dtype=fp32 \ # ... other parameters ``` The following error may occur: ```bash raise InstantiationException(msg) from e hydra.errors.InstantiationException: Error in call to target 'verl.workers.config.engine.FSDPEngineConfig': TypeError("FSDPEngineConfig.__init__() got an unexpected keyword argument 'mixed_precision'") full_key: actor_rollout_ref.actor.fsdp_config ``` ### Backward compatibility No behavior change for existing configs (default remains None). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=mixed_precision - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-16 08:20:28 +08:00
kang sheng	e31a883121	[rollout] fix: vllm sleep level=2 bug (#3082 ) ### What does this PR do? 1. vllm sleep level=2 has bug and has been fixed: https://github.com/vllm-project/vllm/pull/16889 and the bug fixed is released in version 0.8.5: https://github.com/vllm-project/vllm/releases/tag/v0.8.5 2. fix a typo in deepseek benchmark doc.	2025-08-16 08:19:06 +08:00
Qizhi Chen	2bbd09245c	[ray] feat: add support for ray init kwargs (#3049 ) ### What does this PR do? This PR adds support for passing parameters to `ray.init`. Users can now dynamically configure settings such as `address`, `port`, `_temp_dir`, and more based on their specific needs. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ```bash # when /tmp/ray/ is used by others # when ray is initialized at 6379 by others # when the dashboard is not accessible at localhost # ... bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh \ +ray_kwargs.ray_init._temp_dir=/tmp/ray/my_dir \ +ray_kwargs.ray_init.address=127.0.0.1:6378 \ +ray_kwargs.ray_init.dashboard_host=0.0.0.0 ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-15 20:02:56 +08:00
Necolizer	55e3c5bc09	[tool] fix: support non-ascii characters in search results (#3044 ) ### What does this PR do? A small change from `json.dumps({"result": final_result})` to `json.dumps({"result": final_result}, ensure_ascii=False)`, supporting customized search engines that return docs containing non-ascii characters (e.g., CJK characters). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/1682 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A ### API and Usage Example N/A ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-15 13:55:18 +08:00
Joel	d253526c73	[ray] feat: remove worker group register center (#3066 ) ### What does this PR do? Remove worker group register center, instead we schedule a task in first placement group to get `MASTER_ADDR` and `MASTER_PORT`.	2025-08-15 13:54:46 +08:00
Chunyu	28f6e4af7e	[doc]fix: optimize ascend docs (#3063 ) ### What does this PR do? - 修复ascend_quick_start.rst中一些依赖软件的版本匹配错误。 - 支持现状表格中增加对actor.strategy和rollout.name的说明。 - 重命名ascend_profiling_en.rst和ascend_profiling_zh.rst，使文档标题看起来更美观些。 <img width="402" height="103" alt="image" src="https://github.com/user-attachments/assets/8f9ece22-315e-4f80-8157-04838f7467a3" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-15 13:24:21 +08:00
kang sheng	bd756c15c8	[BREAKING][rollout] feat: allow users pass all vllm/sglang engine args (#3037 ) This PR allows users to pass all vllm/sglang engine args and optimizes qwen3 rollout speed through vllm Engine argument. 1. deprecate the default value of previous engine_kwargs 2. pass all the engine_kwargs to vllm/sglang engine 3. optimize Qwen3-235B rollout speed by setting TP=8 and enabling expert parallel. From top to bottom: tp=16 without EP, tp=8 without EP and tp=8 with EP. <img width="1000" height="808" alt="image" src="https://github.com/user-attachments/assets/6b096be4-3896-4e96-8916-d8d6e13a58cc" /> PS: The DeepSeek-V3's rollout slows down after enabling expert parallelism.	2025-08-14 19:12:26 +08:00
A1waysBeenHere	bd3b735514	[trainer] fix: Remove redundant 'data.to()' codes (#3051 ) ### What does this PR do? Removed redundant ```data.to()``` codes. `data.batch = data.batch.to("cpu")` in `def update_actor()`: The data already loaded into CPU after latest computation, which is`def compute_log_prob()` in DAPO for example. `data = data.to(get_device_id())` in `def compute_ref_log_prob() & def compute_log_prob()`: Both of data are gonna be move to GPU or NPU... at `self.actor.compute_log_prob(data=data, calculate_entropy=True)`. > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-14 19:10:21 +08:00
codingma	76e41368b4	[hardware] add flops count support for A3 device (#3053 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. add flops count support for A3 device ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-14 19:09:49 +08:00
Joel	8aa09db4b6	[rollout,vllm] feat: support multi-modal in agent loop (#3016 ) ### What does this PR do? Follow https://github.com/volcengine/verl/pull/2398, support vLLM multi-modal.	2025-08-14 19:08:47 +08:00
Chayenne	1a62568f80	[rollout] feat: remove over-catched errors in SGLang rollout (#3047 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. As named, the catch should not cover abort. Abort should work as expected. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com> Co-authored-by: ChangyiYang <changyiyang2023@gmail.com>	2025-08-13 23:19:21 -07:00
kang sheng	e2e4c35ecb	[doc] feat: add benchmark for deepseek (#3046 ) Add a benchmark result for Deepseek. Other benchmark results are on the way.	2025-08-14 13:29:28 +08:00
Chayenne	ea885f32f0	[rollout] feat: support over sampling rollout in SGLang Rollout (#2929 ) ### What does this PR do? This PR introduces an over-sample strategy for verl's SGLang multi-turn rollout to address the long-tail problem, where a few slow requests disproportionately increase the overall rollout time. The core idea is to over-sample the number of requests at the start of the rollout and then aggressively cancel any requests that haven't finished once a target number of completions is met. - Improves rollout efficiency for multi-turn conversations by reducing total time spent waiting for slow requests. - Implements a new request monitoring and cancellation mechanism to cut off unnecessary computation. wandb results is as follow: https://wandb.ai/zhaochenyang20/benchmark_over_sample_2/workspace?nw=nwuserzhaochenyang20 ----- Of course, this strategy has its share of issues. For example, many might question why the over-long requests that are dropped aren't simply saved and continued in the next round. This is certainly possible—it's a partial rollout strategy—but it would require verl to have a data buffer, which is beyond the scope of this PR. Furthermore, saving and continuing these requests would introduce an off-policy problem. There is also a valid concern that this rather "brutal" dropping strategy could unfairly ignore very long requests. I agree this is a very reasonable point, but currently, we don't have a lossless solution. However, our dropping strategy is very flexible and could even change with our curriculum learning. For instance, in the example I gave, I just directly dropped the last 20% of requests. In practice, we can dynamically adjust this drop rate and even set different dropping methods. For example, we could record the return time (t) for the 80% of requests and then drop any requests that haven't returned after 1.5t. We've provided an initial, validated idea and have completed its implementation. We welcome everyone to join the discussion on how to accelerate multi-turn rollouts with acceptable losses. ### Test The new over-sample strategy was tested with an 8-GPU setup on the gsm8k dataset, yielding the following results: - Rollout Time: Significant reduction in overall rollout time per step. - Training Rewards: - The reward metric for training steps shows a positive bias. This is because we exclude the aborted requests (which are typically more difficult and have lower rewards) from the reward calculation. - The reward metric for validation steps remains accurate and aligns with the baseline. This is because the cancellation logic is not triggered during validation, ensuring a fair and complete evaluation. ### API and Usage Example This feature modifies `sglang_rollout.py` and `metric_utils.py`. To use it, follow the standard setup and then run the training script with the over-sample parameters. https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/release_log/over_sample.md ### Design & Code Changes The design is centered on three main functions that orchestrate the over-sampling logic: `run_with_cancellation`, `process_request_with_monitoring`, and `monitor_and_cancel`. These functions rely on global variables, such as `all_tasks` and `completion_lock`, to manage state. - `run_with_cancellation`: This is the entry point. It launches all requests as `process_request_with_monitoring` tasks concurrently with a single `monitor_and_cancel` task. It uses `asyncio.gather` to wait for all tasks to complete (or be canceled) and converts any exceptions from canceled tasks into padding requests before returning the final output. - `process_request_with_monitoring`: This async function handles a single request. It waits for the request to complete using `_async_rollout_a_request` and then checks a shared counter, `completed_count`, using a `completion_lock` for thread safety. If the target completion count has not been reached, it returns the real result. If the target has been met, it returns padding data instead, effectively "discarding" the late result. - `monitor_and_cancel`: This is a separate async task that polls the `completed_count`. Once the count reaches the `target_completion` threshold, it immediately cancels all remaining tasks and sends an `abort_requests` signal to the SGLang engine, halting any ongoing GPU computation for those requests. Key code changes: - `sglang_rollout.py`: - Adds the three core asynchronous functions for the over-sample strategy. - The `AsyncEngine` class now includes a new `abort_request` method that calls the synchronous `abort_request` in the `tokenizer_manager`. - `metric_utils.py`: - The `compute_data_metrics` function is updated to exclude the aborted requests (identified by padding) from the denominator when calculating average rewards during training. This prevents the training reward from being artificially lowered by the zero-reward aborted requests. This implementation is designed to be a straightforward and effective solution for the long-tail problem, though some aspects of the asynchronous design and the impact on training variance require further investigation. --------- Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com> Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com> Co-authored-by: ChangyiYang <changyiyang2023@gmail.com> Co-authored-by: PrinsYin <yzr1914001753@gmail.com> Co-authored-by: WindowsXp-Beta <xinpwei@amazon.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-08-13 21:12:57 -07:00
Chi Zhang	0807da9115	[misc] feat: add B200 and GB200 flops count (#3041 ) ### What does this PR do? - add B200 and GB200 flops count ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-14 09:49:24 +08:00
Jingcheng Yang	b6cdcdf805	[doc] feat: Add VTool-R1 in the list of "awesome works using verl (#3036 ) Add VTool-R1 into `Awesome work using verl` ### What does this PR do? This PR adds a recent work built upon verl into the "Awesome work using verl" Section of the README.md file. Add VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use, into `Awesome work using verl` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-08-13 18:32:58 +08:00
OC	e6843cc82b	[fsdp] fix: set _set_allocator_settings to True to avoid fsdp2 oom (#3020 ) ### What does this PR do? Enable expandable_segments to avoid the increasing memory fragmentation caused by temporary variables during the training process of fsdp2, which may trigger probabilistic out-of-memory (OOM) errors. Since both sglang and vllm can not work with expandable_segments:True, it has to be turn off during rollout. ### Test Without this fix, memory reserved could be very high after compute_log_prob or update_actor. ``` (WorkerDict pid=339320) [2025-08-11 17:43:01] dp actor After compute_log_prob, memory allocated (GB): 5.53, memory reserved (GB): 73.59, device memory used/total (GB): 77.47/79.15 ``` With this fix, it stays low during training. ``` (WorkerDict pid=396879) [2025-08-12 07:39:42] dp actor After compute_log_prob, memory allocated (GB): 4.95, memory reserved (GB): 14.20, device memory used/total (GB): 17.72/79.15 ``` --------- Co-authored-by: narutolhy <luhongyu.4869@bytedance.com>" Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-13 15:59:58 +08:00
Blue Space	22a15365db	[ci] fix: try fix vllm test network issue (#3031 ) ### What does this PR do? [ci] fix: try fix vllm test network issue ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-13 14:01:32 +08:00
Philipp Normann	83cfc76f73	[recipe] fix: make LangGraph agent example runnable out-of-the-box (#3029 ) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test ✅ End-to-end validation: ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` Note on `GPUS_PER_NODE` and `NNODES`: - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` Results:* * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example No breaking API changes. Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes Core runability fixes: * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults Docs & polish: * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting Impact: Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened	2025-08-13 11:02:51 +08:00
kang sheng	65c59c719c	[trainer,rollout,doc] feat: reduce minimum gpus to 96 for deepseek-v3 (#3019 ) ### What does this PR do? reduce minimum gpus to 96 for deepseek-v3 and 32 for Qwen3-235B change details: 1. use cpu adam to save GPU memory 2. change vllm sleep level to 2 to save CPU memory 3. fix conflict between megatron HybridDeviceOptimizer and verl load_megatron_optimizer. 4. provide new training scripts and document. training logs: DeepSeek-V3 with 12 Nodes: <img width="3420" height="1308" alt="image" src="https://github.com/user-attachments/assets/23bec729-bf39-41c8-a4c2-c51f389d052c" /> Qwen3-235B with 4 Nodes: <img width="3426" height="1380" alt="image" src="https://github.com/user-attachments/assets/4eeacab4-833f-4409-b294-10bd51d0fde9" /> sleep1 vs sleep2 speed on Qwen2.5-7B: level1 mean:8.26s level2 mean: 8.33s <img width="698" height="638" alt="image" src="https://github.com/user-attachments/assets/e3dcbb4b-f841-4d7c-b60f-40da8ffe6c42" />	2025-08-13 10:52:45 +08:00
Chunyu	3cc7695f4c	[hardware, recipe] chore: support retool sft &update peft sft perf on npu (#3000 ) ### What does this PR do? - Add time statistics for all train steps. - Support retool sft on ascend npu. - Update peft sft performance. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Tonyztj <1445297443@qq.com>	2025-08-13 10:51:46 +08:00
Chi Zhang	5957412767	[rollout] feat: add rollout config (#3010 ) ### What does this PR do? - Add rollout config ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-13 10:50:27 +08:00
Chi Zhang	3315c1ab1e	[misc] chore: add GPU memory to names that train large models (#3023 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 18:37:10 +08:00
Chi Zhang	0123ca6ce1	[misc] chore: add gpu memory to deepseek script (#3022 ) …atron_80gb.sh ### What does this PR do? - Rename run_deepseek671b_math_megatron.sh to run_deepseek671b_math_megatron_80gb.sh ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 17:46:47 +08:00
Joost van Doorn	77b79cba55	[rollout] fix: Add soft node affinity to the agent loop workers (#3006 ) ### What does this PR do? This adds (soft) node affinity such that AgentLoopWorkers get scheduled in the same node if possible. > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [link](https://github.com/volcengine/verl/issues?q=is%3Aissue%20%20node%20affinity) - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Tested by running on our cluster with a ray cluster with multiple nodes, and verified AgentLoopWorker's are assigned to the same node id through the dashboard. > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 17:20:59 +08:00
Blue Space	45b4ce910a	[perf] feat: Add rollout longtail observation metrics (#3009 ) ### What does this PR do? [perf] feat: Add rollout longtail observation metrics, show max and min rollout timing and top 10% rollout take-up ratio. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 13:03:52 +08:00
黄石	92cbc2f417	[misc] feat: Support trackio (#3017 ) ### What does this PR do? Support Trackio, a lightweight experiment tracking library from Hugging Face. Features are listed in https://huggingface.co/blog/trackio ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 13:02:20 +08:00
Chi Zhang	492bd63e7c	[ci] fix: add `flash_attn_supports_top_left_mask` to ignore list (#3004 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 11:17:47 +08:00
Huazhong	b8c4871c21	[trainer] fix: reduce memory footprint by moving data to the device only in mini batch (#3011 ) ### What does this PR do? Reduce peak memory usage during update_actor/critic by moving data to the device only in mini batch. Same operation can be seen in [fsdp_workers.py](https://github.com/volcengine/verl/blob/main/verl/workers/fsdp_workers.py#L729) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 10:37:45 +08:00
Minghui Jia	9f4161e250	[recipe] feat: add deepeyes recipe (#2398 ) ### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - DeepEyes Recipe Integration: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - Visual Tool Support: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - Tool Call Statistics: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - Bug Fixes: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-08-12 09:51:58 +08:00
Blue Space	b79263ad60	[perf] refactor: part 2 - Profiler ci test and fixes (#3001 ) ### What does this PR do? [perf] refactor part 2: Profiler ci test and fixes ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 08:59:39 +08:00
HaochenYuan	6110410797	[sglang]fix: Reduce memory footprint during rollout by adding load_grad=False when loading megatron weights. (#3007 ) When I ran grpo training with sglang on DeepSeek-V3 671B with 256*H100, found OOM error here. There is no need to load grad when attempts to convert mcore weights and then conducts rollout generation. ### What does this PR do? Reduce peak memory usage during rollout generation by not loading gradient when calling `load_megatron_model_to_gpu()`. Same operation can be seen in [megatron_vllm.py](`814e421c54/verl/workers/sharding_manager/megatron_vllm.py (L150)`) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+rollout+grad) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-11 19:29:46 +08:00
Joel	814e421c54	[rollout,vllm] feat: unify vllm and sglang method to async (#2982 ) ### What does this PR do? Change vLLM method to async to unify with SGLang.	2025-08-11 14:24:06 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	61dde81e0a	[trainer] feat: Specify apply_chat_template_kwargs from config (#2998 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Now we support specifying apply_chat_template_kwargs from config ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. e.g. For #1711, now users can directly use ``` +data.apply_chat_template_kwargs.enable_thinking=False ``` to disable thinking mode in Qwen3, without the need to modify the code. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. Users can pass and append anything to `data.apply_chat_template_kwargs`, and this will be passed when `.apply_chat_template()` is called. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-08-11 13:52:48 +08:00
李琼羽	a901f56b8f	[model] fix: Handle flash_attn_supports_top_left_mask import for older transformers (#2985 ) ## Summary Fix ImportError when using older transformers versions that don't have `flash_attn_supports_top_left_mask` function. ## Root Cause The `flash_attn_supports_top_left_mask` function was added in newer versions of transformers. Users with older versions encounter ImportError. ## Solution - Add try/except blocks to handle the import gracefully - Provide a safe fallback (return False) for older versions - Applied to all affected model files ## Changes - `verl/models/transformers/qwen2_vl.py` - `verl/models/transformers/qwen2.py` - `verl/models/transformers/llama.py` - `verl/models/transformers/kimi_vl.py` ## Testing - ✅ Tested import compatibility - ✅ Verified Python syntax - ✅ Code follows existing patterns in the codebase Fixes #2968 --------- Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-11 13:38:56 +08:00
Ata Fatahi	e63f6acbb7	[ray] fix: Fix function name in worker helper (#2868 ) ### What does this PR do? Fix function name in worker helper. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Signed-off-by: Ata Fatahi <immrata@gmail.com> Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-11 10:03:40 +08:00
Blue Space	545f899844	[BREAKING] [perf] refactor: Profiler api refactor (#2894 ) ### What does this PR do? Refactor profiler CI to a unified way. TODO: - nsys use `save_path` - nsys descrete tests are disabled - torch profiler cc: @davidmlw ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example Global profiler config: ```yaml global_profiler: _target_: verl.utils.profiler.ProfilerConfig tool: null steps: null profile_continuous_steps: false save_path: outputs/profile tool_config: nsys: _target_: verl.utils.profiler.config.NsightToolConfig discrete: false npu: _target_: verl.utils.profiler.config.NPUToolConfig discrete: false contents: [] level: level1 analysis: true torch: _target_: verl.utils.profiler.config.TorchProfilerToolConfig step_start: 0 step_end: null ``` Local profiler config: ```yaml profiler: # Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs _target_: verl.utils.profiler.ProfilerConfig # profiler tool, default same as profiler.tool in global config # choices: nsys, npu, torch tool: ${oc.select:global_profiler.tool,null} # whether enable profile on critic enable: False # Whether to profile all ranks. all_ranks: False # The ranks that will be profiled. [] or [0,1,...] ranks: [] # profile results saving path save_path: ${oc.select:global_profiler.save_path,null} # specific tool config tool_config: ${oc.select:global_profiler.tool_config,null} ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-11 09:52:41 +08:00
Narsil-Dinghuai Zhang 张鼎怀	287ef7e262	[rollout] fix: avoid repeated multiplication by n for GRPO (#2881 ) For GRPO the number of generation has already been specified at `2fdfbdcba6/verl/trainer/ppo/ray_trainer.py (L1117)`, so the original code in huggingface rollout will generate $n^2$ responses for each prompt.	2025-08-11 09:46:23 +08:00
H	cb809d66e4	[doc] feat: update contact and news (#2993 ) ### What does this PR do? Update contact email. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-08-10 09:16:33 +08:00
OC	da7fc8e015	[rollout,trainer] feat: offload param before wake up inference engine (#2977 )	2025-08-09 06:57:05 +08:00
Joel	beb6246100	[rollout,vllm] fix: max_num_seqs not take effect (#2960 )	2025-08-09 06:55:21 +08:00
Qizhi Chen	980b018c85	[ray, trainer] fix: fix working_dir when launching via uv (#2859 ) ### What does this PR do? This PR fix the ray working_dir when launching via `uv run` `ray` will change the runtime_env when running via uv: [_maybe_modify_runtime_env](`b62ce29706/python/ray/_private/worker.py (L1322-L1346)`) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ```python @ray.remote(num_cpus=1) # please make sure main_task is not scheduled on head class TaskRunner: """Ray remote class for executing distributed PPO training tasks. This class encapsulates the main training logic and runs as a Ray remote actor to enable distributed execution across multiple nodes and GPUs. """ def run(self, config): print(os.getcwd()) ``` 1. When launching not using uv: `path/to/verl` 2. When launching using uv: `/tmp/ray/session_2025-08-01_09-07-33_265741_98359/runtime_resources/working_dir_files/_ray_pkg_225a7865bd7dee4c`, then checkpoints will be saved on this temp dir. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-08 23:18:04 +08:00
Rasul Alakbarli	21b99ed741	[misc] feat: Added: "tensorboard" to the requirements.txt (#2900 ) ### What does this PR do? > This PR adds tensorboard as a dependency to requirements.txt file, across several Dockerfiles (Dockerfile.ngc.vllm, Dockerfile.ngc.vllm0.8, Dockerfile.ngc.vllm0.8.sagemaker), a setup script (install_vllm_sglang_mcore.sh), and the main setup.py file. This change ensures that the tensorboard package is consistently installed, enabling visualization of training metrics for various configurations and deployment environments. This is a maintenance task that enhances the project's observability without altering core functionality. ### Test > This change is a dependency update and doesn't require specific testing beyond confirming the installation is successful. ### API and Usage Example > No API changes are introduced. The usage of TensorBoard would be initiated by the user after installing the requirements. ```python # No code snippet is applicable for this change	2025-08-08 22:39:53 +08:00
OC	12c83e8ada	[trainer] fix: only load memory in micro batch (#2908 ) ### What does this PR do? In update_actor, it load the whole bath into GPU memory, actually only the micro batch is necessary. It is a regression from https://github.com/volcengine/verl/pull/2477 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+micro+batch - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test <img width="700" height="325" alt="截屏2025-08-05 下午1 01 53" src="https://github.com/user-attachments/assets/31dc4fea-8cb0-4f51-8ed2-f93d90a94040" /> <img width="1359" height="607" alt="截屏2025-08-05 下午12 45 50" src="https://github.com/user-attachments/assets/747636e6-b919-4eca-a3eb-5baf3722b5fc" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-08-08 22:38:22 +08:00
xylcbd	31ac4dc6fa	[data] fix: fix bug of '_io.BytesIO' object has no attribute 'startswith' (#2430 ) ### What does this PR do? FIX: '_io.BytesIO' object has no attribute 'startswith' https://github.com/volcengine/verl/issues/1976 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test 1. download the test dataset. ``` huggingface-cli download --repo-type dataset xylcbd/pgdp5k_mini ``` 2. convert data to parquet format ``` import argparse import os import datasets if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--local_dir", default="~/data/pgdp5k_mini") args = parser.parse_args() data_source = "xylcbd/pgdp5k_mini" dataset = datasets.load_dataset(data_source) train_dataset = dataset["train"] train_dataset.to_parquet(os.path.join(args.local_dir, "train.parquet")) ``` 3. test dataset loading: ``` import os import torch from omegaconf import OmegaConf from torch.utils.data import DataLoader from verl.utils import hf_processor, hf_tokenizer from verl.utils.dataset.rl_dataset import RLHFDataset, collate_fn model_path = "Qwen/Qwen2.5-VL-3B-Instruct" tokenizer = hf_tokenizer(model_path) processor = hf_processor(model_path) config = OmegaConf.create( { "prompt_key": "prompt", "max_prompt_length": 1024, "filter_overlong_prompts": True, "filter_overlong_prompts_workers": 2, } ) dataset = RLHFDataset( data_files=os.path.expanduser("~/data/pgdp5k_mini/train.parquet"), tokenizer=tokenizer, config=config, processor=processor, ) dataloader = DataLoader(dataset=dataset, batch_size=2, shuffle=True, drop_last=True, collate_fn=collate_fn) a = next(iter(dataloader)) from verl import DataProto tensors = {} non_tensors = {} for key, val in a.items(): if isinstance(val, torch.Tensor): tensors[key] = val else: non_tensors[key] = val data_proto = DataProto.from_dict(tensors=tensors, non_tensors=non_tensors) assert "multi_modal_data" in data_proto.non_tensor_batch, data_proto assert "multi_modal_inputs" in data_proto.non_tensor_batch, data_proto data = dataset[0]["input_ids"] output = tokenizer.batch_decode([data])[0] print(f"type: type{output}") print(f"\n\noutput: {output}") ``` 4. Error reported before repair (no error reported after repair) ``` AttributeError: '_io.BytesIO' object has no attribute 'startswith' ``` ### API and Usage Example No change for the API. ### Design & Code Changes No change for the design. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-08-08 22:24:39 +08:00
H	01b4a290b3	[trainer] refactor: make main_ppo TaskRunner more modular (#2885 ) ### What does this PR do? - Added `__init__()` method to initialize `self.role_worker_mapping = {}` - Extracted worker setup logic into dedicated methods: - `add_actor_rollout_worker()` - handles strategy-specific worker imports and setup (lines 130-153) - `add_critic_worker()` - sets up critic worker role mapping (lines 170-176) - `init_resource_pool_mgr()` - creates resource pool specifications (lines 178-187) - `add_reward_model_worker()` - conditionally adds reward model workers (lines 195-203) - `add_ref_policy_worker()` - conditionally adds reference policy workers (lines 205-208) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test relying on existing unit tests ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-08-08 21:04:04 +08:00
OC	ff6978c14c	[rollout] feat: add cudagraph_capture_sizes option to customize cuda graph memory (#2956 ) ### What does this PR do? 1. enable vllm cuda graph in default 2. add a `cudagraph_capture_sizes` option to customize cuda graph memory vllm cuda graph can improve performance in every case I have tested. It is better to enable in default as sglang. <img width="1145" height="321" alt="截屏2025-08-07 上午11 59 37" src="https://github.com/user-attachments/assets/b750fb93-f42b-48e8-a5e5-6c5c67e8a5ac" /> The default cudagraph_capture_sizes has best performance, but also come with larger memory occuption. If oom occurred during update policy, `cudagraph_capture_sizes ` option can help to reduce memory. <img width="1043" height="318" alt="截屏2025-08-07 下午12 03 02" src="https://github.com/user-attachments/assets/2892a67c-6ba9-448c-ae42-2833f010ff06" /> Additional memory\latency\batch size testing data from NV: ![20250807-120513](https://github.com/user-attachments/assets/74b2382a-7f17-42da-ab12-922e29cfa3e2) ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-08 21:00:13 +08:00
Blue Space	6bd44e6313	[megatron] feat: Allow override optimizer config (#2959 ) ### What does this PR do? Megatron allow override optimizer config to enable features like cpu adam. But specific feature enable needs debug and implementation. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-08 20:59:00 +08:00
none0663	d527d91d12	[sglang] fix: Fix No command 'hf' found for dapo multi-turn as alternative baseline (#2973 ) ### What does this PR do? > Fix No command 'hf' found for dapo multi-turn as alternative baseline > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-08 15:30:06 +08:00
Chi Zhang	083da9ab13	[misc] fix: fix DataProto __getstate__ bug (#2962 )	2025-08-08 08:24:31 +08:00
Nariaki Tateiwa	ae285703a8	[doc] fix: fix typo in docs/preparation/prepare_data.rst (#2957 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. I fixed the typo from RewardModule to RewardModel in docs/preparation/prepare_data.rst ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-07 21:16:26 +08:00
ChangyueLiao	68598bd31d	[rollout] fix: Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES (#2967 ) ### What does this PR do? Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES ### Checklist Before Starting [done] Search for similar PR(s). ### Design & Code Changes change verl/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py ### Checklist Before Submitting [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). --------- Co-authored-by: liaochangyue <liaochangyue@bytedance.com>	2025-08-07 20:58:11 +08:00
Blue Space	7bece3cf59	[ci] fix: limit e2e_one_step_off_policy timeout (#2964 ) ### What does this PR do? e2e_one_step_off_policy may encounter network hanging issue, occupy GPUs over 1h, which normally execute in 2~3 minites. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-07 18:01:37 +08:00
杨睿	3ebe6717ad	[megatron] fix: retain MLA config in mcore config converter (#2933 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. - in the current `check_and_disable_incompatible_configs` function, we will drop config if it's not an attribute of `TransformerConfig`, however when using `MLATransformerConfig`, this funcion will drop mla config like `q_lora_rank`, and cause a lots of problems in the downstream pipeline - this pr refactored `check_and_disable_incompatible_configs` to a factory function `check_and_construct_configs `, which accecpt a class type bounded with TransformerConfig, and return a TransformerConfig instance. @ETOgaosion ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-08-07 12:35:18 +08:00
Chayenne	6f559540e7	[sglang] feat: add dapo multi-turn as alternative baseline (#2952 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. as named ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-06 18:27:07 -07:00
Qizhi Chen	05eb0c7a6d	[tool] feat: handle cases when func calling without params (#2936 ) ### What does this PR do? This PR enhances tool calling without params. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test input tool config: ```yaml tools: - class_name: verl.tools.my_tool.RandomNumTool config: type: native tool_schema: type: function function: name: random description: Generate a random number. ``` --- parsed tool: ```bash <tools> {"type": "function", "function": {"name": "random", "description": "Generate a random number.", "parameters": OpenAIFunctionParametersSchema(type='object', properties={}, required=[]), "strict": false}} </tools> ``` > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-06 21:29:15 +08:00
Yan Bai	aebc51a235	[megatron] chore: update example 671B script, no offline dist-ckpt needed any more (#2945 ) ### What does this PR do? update example 671B script, no offline dist-ckpt needed any more	2025-08-06 21:07:01 +08:00
Wei (Will) Feng	8e1fc242d3	[fsdp] fix: call reshard() to resolve no shard attribute (#2941 ) ### What does this PR do? we should call .reshard() otherwise it throws error of undefined attribute "shard". this is a typo from a recent PR https://github.com/volcengine/verl/pull/2843 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-06 16:26:37 +08:00
黄石	c344e9eb2c	[megatron] feat: support for pipeline layout with vpp in mcore 0.13.0 (#2749 ) ### What does this PR do? Add support for pipeline layout with vpp in mcore 0.13.0. Breaking change for user in 0.12.0. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: donpromax <donlv1997@163.com> Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-08-06 15:58:44 +08:00
Chi Zhang	d37674c8ae	[misc] refactor: deprecate sharding manager (part 1) (#2912 ) ### What does this PR do? - Since we introduce register device_mesh inside the worker, there is no need to use sharding manager any longer. We will remove the usage of sharding manager gradually in the main branch. - This PR removes the sharding manager usage inside fsdp_workers ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-06 11:05:11 +08:00
Stefan He	796871d7d0	[sglang] fix: remove unnecessary maybe_set_triton_cache_manager (#2926 ) ### What does this PR do? remove unnecessary maybe_set_triton_cache_manager > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-05 14:17:11 -07:00
杨睿	02f4386ae8	[megatron] fix: qwen2vl megatron fused forward param bug (#2595 ) ### What does this PR do? fix: qwen2vl megatron fused forward param bug. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>	2025-08-05 16:13:46 +08:00
Nan Jiang	8fd671638c	[rollout, sglang] fix: fix encoding logic bug (#2901 ) ### What does this PR do? Fix the `input_ids` encoding logic bug. This bug appears when we have tool to init and tool init return some text or images. The output will be `<im_start>assistant\ntool\nXX` but we want just `<im_start>tool\nXX`. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Run any multiturn code and no this error ```python logger.warning( f"Inconsistent training and inference tokenization detected{mode_str}. This may lead to " f"unexpected behavior during training. Please review your chat template to determine if this " f"is intentional. For more information, refer to the multiturn README.md." ) ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-05 15:53:29 +08:00
Blue Space	d0ecc3fad5	[megatron] refactor: simplify module init in megatron_workers, extract common operations (#2400 ) ### What does this PR do? [megatron] refactor: simplify module init in megatron_workers, absorb common operations ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-08-05 15:33:19 +08:00
Chi Zhang	c0f99f3da2	[BREAKING] [ray, megatron] feat: remove RayMegatronWorker (#2895 ) ### What does this PR do? - Following https://github.com/volcengine/verl/pull/2893, we can now directly register dispatch and collect function inside the worker. So, there is no need to maintain RayMegatronWorker and RayMegatronWorkerGroup, which is a hacking solution ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-05 11:05:38 +08:00
Long(Tony) Lian	e74fade589	[doc] fix: Specify rollout engine in quickstart.rst (#2905 )	2025-08-04 17:56:35 -07:00
Chi Zhang	3e2bceb1af	[ray] feat: support directly register dispatch device mesh (#2893 ) ### What does this PR do? A better solution than https://github.com/volcengine/verl/pull/1260 The current dispatch methods are quite limited: In hybrid engine, we would like to dispatch infer_tp x infer_dp for generation and train_tp x train_dp x train_pp for training. However, currently implementation can only dispatch train_tp x train_dp x train_pp for training and dp for generation and perform allgather inside the workergroup. When two megatron models colocate, their device mesh has to be identical. We have to subclass RayWorkerGroup in order to implement various distributed strategies. This makes create_colocated_worker hacky to implement in the future. The difference in this implementation: - We register directly inside the worker of the dp_rank and whether the output of this rank will be collected. - By doing so, we can 1) completely remote MegatronWorker and the necessity to subclass RayWorkerGroup in the future to implement flexible dispatch methods. 2) remove all other dispatch/collect methods ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-04 19:57:42 +08:00
Lingfeng Wang	13731e553b	[rollout] feat: add rollout_skip to skip rollout by reusing previously generated sequences (#2602 ) ### Adds rollout_skip to skip rollout Adds rollout_skip to skip rollout by reusing previously generated sequences from a specified dump directory. Two parameters need to be configured: `actor_rollout_ref.rollout.skip_rollout=True` Enables the rollout skipping functionality `actor_rollout_ref.rollout.skip_use_default_dir="/tmp/rollout_dump"` Sets the dump directory path for storing rollout results #### Behavior: On first run: The system will generate and dump the rollout inference results to the specified directory On subsequent runs: The system will automatically check for and load results from this directory, skipping the rollout computation > Note: The directory path should be persistent across runs to maintain the caching benefit > If either of these parameters changes between runs: > - actor_rollout_ref.rollout.n > - data.gen_batch_size > > The trainner will: > 1. Ignore previously dumped data > 2. Regenerate new rollout sequences > 3. Create a new dump folder with the naming pattern: `InferGBS{gen_gbs}__N{n}` > (where {gen_gbs} is the current gen_batch_size and {n} is the current rollout.n value) This feature is particularly valuable for: - Development iterations with same parameters - Debugging sessions #### Result example: worked with NPU <img width="899" height="681" alt="image" src="https://github.com/user-attachments/assets/f6253e36-14b8-47ab-9817-ce6c42b3168d" /> worked with GPU <img width="1809" height="950" alt="image" src="https://github.com/user-attachments/assets/0920e50e-e415-40cb-80b5-2e148015a8e4" /> ### Checklist Before Starting - [x] Search for similar PRs. - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example This is an example of how to patch rollout_skip in `RayPPOTrainer`. > Both `RayDAPOTrainer()` (in `verl/recipe/dapo/dapo_ray_trainer.py`) and `RayPPOTrainer()`(in `verl/trainer/ppo/ray_trainer.py`) have already been adapted. ```python from verl.utils.rollout_skip import RolloutSkip ... class RayPPOTrainer: ... def fit(self): ... # Add code as follow: rollout_skip = RolloutSkip(self.config, self.actor_rollout_wg) rollout_skip.wrap_generate_sequences() ... for epoch in range(self.config.trainer.total_epochs): for batch_dict in self.train_dataloader: ... ``` To enable this PR's functionality, simply add these two parameters to your launch script: ```bash actor_rollout_ref.rollout.skip_rollout=True \ actor_rollout_ref.rollout.skip_dump_dir="/tmp/rollout_dump" \ ``` 1. `actor_rollout_ref.rollout.skip_rollout=True` - Enables the rollout skipping functionality 2. `actor_rollout_ref.rollout.skip_use_default_dir="/tmp/rollout_dump"` - Sets the dump directory path for storing rollout results ### Design & Code Changes Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-04 19:42:16 +08:00
Ethen	b3e999fa74	[FSDP] feat: Allows specifying a different reference model (#2050 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Extend FSDP worker to allow end users to specify separate actor/reference model. There were two similar issues/asks https://github.com/volcengine/verl/issues/699 https://github.com/volcengine/verl/issues/744 Wanted to get some initial feedback if this is on the right track. Completely fine if someone from verl core team can take up this task in separate PR to speed up development cycle. ### Usage Example Default config.yaml will have model = null under reference model section to preserve the original behavior of using actor model as reference model. End users can change path to a different model if they wish to use a separate reference model. ```yaml # default behavior, same as before actor = ref model ref: model: null # new, specify a different ref model ref: model: path: "model path" ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-08-04 16:04:02 +08:00
TomQunChao	551d4cc56d	[misc] feat: support logging rollout prob vs. actor probs in multi-turn for debugging purpose, follow up of #1712 (#2808 ) ### What does this PR do? This PR is a follow-up to https://github.com/volcengine/verl/pull/1712. - adds support for recording rollout log-probs in multi-turn conversations - moves the diff-computation code into a separate file. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-04 11:54:46 +08:00
GOFISHING	4e9d2878ee	[fsdp, trainer] fix: save config parameters to wandb in SFT (#2884 )	2025-08-03 20:16:05 -07:00
Blue Space	0da8eb67e6	[ci] fix: retry type check on cpu (#2887 )	2025-08-03 20:12:00 -07:00
pool	483cd55c76	[trainer] chore: Add ground truth data to generation dumps in RayPPOTrainer (#2353 )	2025-08-03 07:39:18 -07:00
Nan Jiang	6017c9e2fc	[tool, sglang] feat: add tool create info (#2870 )	2025-08-03 07:38:23 -07:00
kang sheng	65c74dda9b	[doc] fix: multi turn argument is not available (#2883 ) ### What does this PR do? Removed a deprecated parameter comment ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: kangsheng <kangsheng.ks@bytedance.com>	2025-08-03 18:31:16 +08:00
Stefan He	06bc679a57	[sglang] chore: bump transformer formers 4.54.0 and fix QWen VL issues (#2869 ) ### What does this PR do? Do not merge before: https://github.com/volcengine/verl/pull/2720 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Bump xformers, fixing patched model code ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-03 18:30:51 +08:00
Nan Jiang	d31d1bebd8	[trainer] fix: move UID generation before batch processing for future conditioning support (#2880 ) ### What does this PR do? Moves UID generation to the beginning of batch processing, before any `pop` operations or generation steps. This change: 1. Fixes timing for future conditioning: Enables UID generation to condition on original batch data (e.g., prompt content) before any data is removed via `pop` operations The change is backward-compatible and doesn't affect current functionality, but enables future enhancements where UID generation might need access to the complete original batch data. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-03 16:04:19 +08:00
Liang Tang	1836d9537e	[misc] feat: add nccl timeout configuration to fsdp workers (#2321 ) ### What does this PR do? add nccl timeout config for fsdp backend --------- Signed-off-by: shinytang6 <shinytang6@gmail.com> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-08-02 21:33:05 -07:00
Le Xue	3f71144961	[trainer, ci] fix: fix error variable in new engine impl and add ci test, fix math_dataset path error (#2647 ) ### What does this PR do? PR #1977 is a great job, I tried using the new engine and found some minor problems and add ci test for FSDPEngine. - Use newest name `gather_outputs_and_unpad` for the function `gather_outputs_and_unpad`. - Removed invalid calculations originally used for gradient accumulation (gradient accumulation has been moved to loss_fn in new engine). - Fixed misuses of two variable. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: ShareLer <ShareLe@163.com> Co-authored-by: eric-haibin-lin <linhaibin.eric@gmail.com>	2025-08-02 21:32:33 -07:00
Qiao	2fdfbdcba6	[doc] fix: Fix the role assignment error in the interaction demo file and doc. (#2476 ) ### What does this PR do? Fix the role assignment error in the interaction demo file verl/interactions/gsm8k_interaction.py and doc. The assistant is expected to solve problems, while users provide problems and feedback within the messages list. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Update tests/interactions/test_gsm8k_interaction.py. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-08-02 17:04:15 -07:00
Frederick Robinson	a24241092d	[misc] refactor: Add `AbstractRewardManager` abstract class (#2763 ) ### What does this PR do? Adds a new `AbstractRewardManager` class to codify the interface for a reward manager. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-02 16:39:58 -07:00
Wenkai Fang	ae3506dd33	[data] feat: dump train/test example as JSON (#2666 ) ### What does this PR do? This PR adds functionality to save one training and one testing example as JSON files for reference, making it easier to inspect dataset formatting and preprocessing. Related to potential future debugging and reproducibility improvements. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Manually verified that two files train_example.json and test_example.json are saved correctly in the specified local_dir. ### API and Usage Example This change does not alter the public API. ### Design & Code Changes - Added code to save train_dataset[0] and test_dataset[0] as JSON files in local_dir - Helps with quick inspection and reproducibility of dataset inputs ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: easy code - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-08-02 11:09:56 -07:00
Chunyu	afc9de1eba	[trainer, hardware] chore: add pin_memory_device when pin_memory is enabled (#2871 ) ### What does this PR do? To use pin_cemory, we need to set pin_memory_name="npu". About pin_memory, see: [torchdata/stateful_dataloader/stateful_dataloader.py](https://github.com/pytorch/data/blob/main/torchdata/stateful_dataloader/stateful_dataloader.py) ``` if self._pin_memory: self._pin_memory_thread_done_event = threading.Event() # Queue is not type-annotated self._data_queue = queue.Queue() # type: ignore[var-annotated] if self._pin_memory_device == "xpu": current_device = torch.xpu.current_device() # type: ignore[attr-defined] elif self._pin_memory_device == torch._C._get_privateuse1_backend_name(): custom_device_mod = getattr(torch, torch._C._get_privateuse1_backend_name()) current_device = custom_device_mod.current_device() else: current_device = torch.cuda.current_device() # choose cuda for default pin_memory_thread = threading.Thread( target=_utils.pin_memory._pin_memory_loop, args=( self._worker_result_queue, self._data_queue, current_device, self._pin_memory_thread_done_event, self._pin_memory_device, ), ) pin_memory_thread.daemon = True pin_memory_thread.start() # Similar to workers (see comment above), we only register # pin_memory_thread once it is started. self._pin_memory_thread = pin_memory_thread else: self._data_queue = self._worker_result_queue # type: ignore[assignment] ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-02 10:34:41 -07:00
Chi Zhang	67187c4fd4	[ci] fix: fix fsdp test in transformers 4.54.1 (#2874 ) ### What does this PR do? When we upgrade to transformers 4.54.1, the fsdp checkpoint manager test breaks, and here are some observations: - If we switch the "attn_implementation" to "eager" or "sdpa", everything works fine. So, it suggests that the issue lies within the flash_attention_2 backend of transformers. - Previously, this test passes in input_ids and attention_mask. However, workers in verl passes in input_ids and position_ids to utilize rmpad. After we switch the input to `input_ids` and `position_ids`, all the tests passed. - If we do not call loss.backward, everything works fine - So, FSDP works fine, checkpoint manager works fine. The problem must lie in how transformers handles different type of input combinations in flash_attention_2 backend. - In this PR, we modify the input to pass the test. TODO: write a test to replicate the issues when passing in input_ids and attention_mask to transformers library ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-02 17:11:54 +08:00
Stefan He	53f9b2ba5e	[fsdp,megatron,sglang] feat: Accelerate and Simplify Update weights logic and bump SGLang to 0.4.9.post6 (#2720 )	2025-08-02 08:01:06 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0da1a3de06	[megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts (#2864 ) ### What does this PR do? They were removed in #2651, but #2691 overlooked some of them. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] (CI is not needed for this change) Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-08-01 20:51:33 +08:00
Chi Zhang	f0fbd67a5d	[recipe] feat: modify dapo deepseek megatron script (#2711 ) ### What does this PR do? As title. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-01 17:52:19 +08:00
Chi Zhang	e68dcb7884	[fsdp] feat: optimize fsdp2 (#2843 ) ### What does this PR do? - fix fsdp2 load/offload - optimized fsdp2's sharding placement ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-01 17:48:43 +08:00
633WHU	a970718ea5	[misc] feat: optimize GRPO-family algorithms with torch.stack and improve tensor creation consistency (#2827 ) ### What does this PR do? ## 🚀 Performance Optimization for GRPO Algorithms This PR delivers significant performance improvements to GRPO and related advantage estimation algorithms through comprehensive tensor operation optimizations. ### 📈 Performance Gains - GPU: 6.5x speedup by fixing device placement issues - CPU: 40% faster tensor creation operations - Memory: Reduced redundant allocations in training loops ### 🔧 Key Optimizations #### 1. Device-Aware Tensor Creation - Replace `torch.tensor()` with `torch.stack()` for scalar tensor lists - Fixes device placement bugs where `torch.tensor()` forces CPU placement - Preserves GPU tensors on GPU, eliminating costly CPU-GPU transfers #### 2. Eliminate Redundant Operations - Remove duplicate tensor creation in statistical computation loops - Optimize tensor reuse for both mean and standard deviation calculations - Standardize tensor creation patterns across all algorithms ### 🎯 Functions Optimized - `compute_grpo_outcome_advantage` - core GRPO algorithm - `compute_reinforce_plus_plus_baseline_outcome_advantage` - RF++ baseline - `compute_rloo_outcome_advantage` - RLOO algorithm - `compute_opo_outcome_advantage` - OPO algorithm - `compute_gpg_outcome_advantage` - GPG algorithm ### 🔍 Technical Details Root Cause: `torch.tensor(list_of_tensors)` always creates result on CPU Solution: `torch.stack(list_of_tensors)` preserves input tensor device Before: ```python scores_tensor = torch.tensor(id2score[idx]) # Forces CPU, created twice id2mean[idx] = torch.mean(scores_tensor) scores_tensor = torch.tensor(id2score[idx]) # Redundant creation id2std[idx] = torch.std(scores_tensor) After: scores_tensor = torch.stack(id2score[idx]) # Preserves device, created once id2mean[idx] = torch.mean(scores_tensor) id2std[idx] = torch.std(scores_tensor) # Reuses same tensor ``` ✅ Safety & Compatibility - ✅ Zero functional changes: Maintains identical mathematical results - ✅ Fully backward compatible: No API modifications - ✅ Extensively tested: CPU/GPU validation with various configurations - ✅ Production ready: All tests pass with identical numerical outputs ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. - Fixed inconsistent tensor creation: Changed torch.std(torch.tensor([id2score[idx]])) to torch.std(torch.tensor(id2score[idx])) to match the pattern used in mean calculation on the same function - Applied to both instances: Fixed lines 313 and 675 in verl/trainer/ppo/core_algos.py - Added comprehensive test coverage: Created tests/trainer/ppo/test_grpo_consistency.py with multiple test scenarios ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: chiliu <chiliu@paypal.com>	2025-08-01 12:00:24 +08:00
CurryRice233	e2b773528f	[megatron] feat: Add MindSpeed support on the NPU device (#2707 ) ### What does this PR do? Add MindSpeed(Megatron) support on the NPU device. First, import the Megatron adapter to avoid import errors, and reapply the patch according to the configuration. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-01 10:58:29 +08:00
Clark	2633140a73	[doc] feat: add verl multinode SkyPilot example (#2849 )	2025-07-31 12:48:24 -07:00
zhihe-wang	c70b7470c1	[recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU (#2836 ) ### What does this PR do? >Provide qwen3-8B/14B DAPO training script on ASCEND NPU, and update experiment result. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [[hardware] feat: support qwen2_5_vl on ASCEND NPU](https://github.com/volcengine/verl/pull/1924/) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. #### Qwen3-8B-Base model ##### Throughput Comparison <img width="1058" height="508" alt="image" src="https://github.com/user-attachments/assets/dd818187-bce2-4b9f-a442-b29a7acedd55" /> ##### Rewards Comparison <img width="1048" height="518" alt="image" src="https://github.com/user-attachments/assets/66d00cc7-efb6-4426-932a-cd63a69474dc" /> ##### Test Comparison (aime-2024) <img width="1060" height="506" alt="image" src="https://github.com/user-attachments/assets/000cebf3-1d5b-402b-b1e6-2cfa5ee7a3ad" /> ##### Response_length Comparison <img width="1280" height="608" alt="image" src="https://github.com/user-attachments/assets/4fe77406-a43b-4d3b-bf13-7a6417887831" /> #### Qwen3-14B-Base model ##### Throughput Comparison <img width="1130" height="614" alt="image" src="https://github.com/user-attachments/assets/5d03b334-b9c9-485d-ba84-23e628d2f573" /> ##### Rewards Comparison <img width="1114" height="534" alt="image" src="https://github.com/user-attachments/assets/aba90536-eb66-430b-83b6-c4e86a90e917" /> ##### Test Comparison (aime-2024) <img width="1126" height="538" alt="image" src="https://github.com/user-attachments/assets/44c59e5b-9f77-48fc-8bce-9d431f5f3e87" /> ##### Response_length Comparison <img width="1280" height="692" alt="image" src="https://github.com/user-attachments/assets/c008a419-9a1e-4b59-81e1-23b5b3d97660" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash ray start --head bash run_dapo_qwen3_8b_base_npu.sh ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-01 00:21:16 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	6e37279f8e	[training_utils] feat: Support `assert_case` for sandbox fusion (#2374 ) Support `assert_case` for sandbox fusion Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-07-31 18:58:20 +08:00
ChangyueLiao	5f8cd1bf38	[CI] feat: update npu image to vLLM-ascend-v0.7.3.post1+mindspeed0.12.1 (#2838 ) ### Checklist Before Starting [done] Search for similar PR(s). ### Design & Code Changes Change .github/workflows/e2e_ascend.yml ### Checklist Before Submitting [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). --------- Co-authored-by: liaochangyue <liaochangyue@bytedance.com>	2025-07-31 18:43:22 +08:00
vllbc02	f5bc3cac78	[rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset (#2740 ) … in dataset ### What does this PR do? > tool_agent_loop did not pass in the call tool's' creat_kwargs', resulting in a missing ground_truth ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### issue In the previous implementation, the parameters for tool calls in the dataset were not passed in, resulting in the absence of ground_truth in the gsm8k task. Like: <img width="2022" height="186" alt="80084dd040d1a105c12403928ba36d08" src="https://github.com/user-attachments/assets/51ed35c6-3cab-4feb-a560-5cf6f64feced" /> On this basis, passing tool_kwargs can solve this problem. ```python async def _call_tool(self, tool_call: FunctionCall, tools_kwargs: dict[str, Any]) -> dict[str, str]: """Call tool and return tool response.""" tool, instance_id = None, None try: # TODO: append malformed tool_call to the prompt: invalid function name or arguments tool_name = tool_call.name tool_args = json.loads(tool_call.arguments) tool = self.tools[tool_name] kwargs = tools_kwargs.get(tool_name, {}) instance_id = await tool.create(create_kwargs=kwargs.get("create_kwargs", {})) tool_response, _, _ = await tool.execute(instance_id, tool_args) ``` So the `ground_truth` can be used in Tool: <img width="1984" height="188" alt="image" src="https://github.com/user-attachments/assets/08f75753-4bcb-42f9-a878-5d455e8ed552" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-31 15:27:03 +08:00
Mathew Han	a6002de8ac	[tool] fix: load MCP tools in async rollout mode (#2821 ) ### What does this PR do? Currently, the tool registry isn't aware of an event loop already existing, so it may fail when using the new async rollout architecture. This PR allows `initialize_tools_from_config` to load MCP tools when using the async architecture by spawning a new, temporary event loop in a separate thread to load from config. There is also a minor bugfix to `mcp_base_tool` which fixes a possibility of concatenating a string to None. NOTE: in the future, we should use async methods entirely, since this fix is not the most elegant. This fix works for now as verl is transitioning to a full async architecture. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-31 14:22:32 +08:00
Blue Space	0e14d812da	[ci] fix: vllm no dataset (#2831 ) ### What does this PR do? ci fix: vllm no dataset ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-31 13:56:01 +08:00
Liwei Ma	4a651f5425	[perf, doc] feat: Add profiling continous steps in one database (#2695 ) ### What does this PR do? Some customers would like to observe continuous steps in one database, so the gap between steps can be eliminated. The feature will dump the continuous steps in `profile_steps` into one database controlled by a new config, `trainer.profile_continous_steps`. For example [1, 2, 5], 1 and 2 will be in one database, 5 will be in another. Also add warning when nvtx is not available in cuda platform. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-31 12:26:10 +08:00
Chi Zhang	1fe72ba510	[sglang] fix: fix missing engine_kwargs (#2823 ) ### What does this PR do? - fix missing engine_kwargs that causes CI on main to fail ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-07-31 12:23:51 +08:00
Blue Space	f32e54deaa	[docker] feat: Upgrade sglang 0.4.9 + transformers 4.53.2 (#2794 ) ### What does this PR do? feat: Upgrade sglang 0.4.9 + transformers 4.53.2 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-31 00:49:27 +08:00
Joel	a479fc81b9	[rollout] feat: pass all dataset fields to agent loop run (#2810 ) ### What does this PR do? Pass all dataset fields from `RLHFDataset` to agent loop run, including: - raw_prompt - tools_kwargs - multi_modal_data - ...	2025-07-31 00:34:44 +08:00
X. HU	cc1d89b7ad	[sglang] fix: support the configuration of attention_backend in sglang (#2818 ) ### What does this PR do? This resolves issue https://github.com/volcengine/verl/issues/2769. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-30 20:46:23 +08:00
Qunhong Zeng	b75b1f0bf1	[algo] feat: add GSPO-token policy loss computation function (#2775 ) ### What does this PR do? This PR implements the GSPO-token policy loss calculation proposed by paper https://arxiv.org/pdf/2507.18071 ### Test <img width="1341" height="637" alt="image" src="https://github.com/user-attachments/assets/bc5e2245-b0f5-4a1f-aa7c-4c2b28d95142" /> Compared GRPO and GSPO under the same settings. GRPO uses the following script: ``` sh python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=512 \ data.max_prompt_length=512 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=128 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.actor.policy_loss.loss_mode="vanilla" \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=10 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger='["console","wandb"]' \ trainer.project_name='verl_gspo_cmp' \ trainer.experiment_name='qwen2.5-3B-GRPO' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=1 \ trainer.save_freq=20 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` GSPO uses the following script: ```sh python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=512 \ data.max_prompt_length=512 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=128 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.actor.policy_loss.loss_mode="gspo" \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=10 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger='["console","wandb"]' \ trainer.project_name='verl_gspo_cmp' \ trainer.experiment_name='qwen2.5-3B-GRPO' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=1 \ trainer.save_freq=20 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### API and Usage Example To use GSPO, users only need to set `actor_rollout_ref.actor.policy_loss.loss_mode` to `gspo`. ```shell python3 -m verl.trainer.main_ppo \ ... \ actor_rollout_ref.actor.policy_loss.loss_mode="gspo" \ ... ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: BounharAbdelaziz <bounhar.abdelaziz@gmail.com>	2025-07-30 18:47:09 +08:00
Blue Space	d04c69f47f	Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" (#2813 ) Reverts volcengine/verl#2739 For https://github.com/volcengine/verl/pull/2794 to solve all CI faults.	2025-07-30 16:56:37 +08:00
leo-pony	bf89f612e8	[vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue (#2782 ) ### What does this PR do? > Handle the use case of verl + vllm + vllm-ascend(v0.9.1), detail information see #2564 vllm-ascend v0.9.1 is the next upcoming commercial release branch, with the previous commercial release branch is vllm-ascend v0.7.3. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > Test on ascend npu: GRPO, FSDP backend, Qwen2.5-0.5B mode. ### API and Usage Example > No changes ### Design & Code Changes #### vllm+vllm-ascend(v0.9.1) normal use case: In vllm 0.7.3 uses pytorch2.5.1, and the type hint for infer_schema of mode is List[int]. vllm 0.9.1 uses pytorch 2.7, and to keep consistence with pytorch2.7, and vllm changed hint type to list[int] to infer_schema of mode. Type hint List[int] and list[int] is not compatible. <img width="754" height="434" alt="image" src="https://github.com/user-attachments/assets/40a40e4f-6092-4d89-baff-95c88437a13b" /> vllm-ascend version 0.9.1 needs to be used in conjunction with vllm 0.9.1. But vllm-ascend 0.9.1 max supportted pytorch version is 2.5.1. As pytorch 2.5.1 using List[int] as type hint, vllm-ascend needs add patch to infer_schema of mode to successfully running in pytorch 2.5.1 environment. The patch workflow as following graph display. As vllm hardware limits, vllm-ascend currently patch type hint list[int] during vllm LLM instance creating process. This is okay for most vllm-ascend applications. #### vllm+vllm-ascend(v0.9.1) currently workflow in verl: For verl+vllm-ascend+pytorch2.5.1 there has a problem, as following: verl currently import vllm modes before create vllm LLM instance operation. So error take place: pass list[int] to infer_schema which needs List[int], and then running failed. <img width="954" height="660" alt="image" src="https://github.com/user-attachments/assets/844631b5-1d60-4412-9feb-4324d80d415d" /> #### workflow in this PR in verl to handle hint type mismatch issue: Actully patch_vllm_moe_model_weight_loader is after the operation of LLM create. "import vllm modes" action just been prematurely executed during _build_rollout processing, just file becasue vllm_utils.py in which there has another functions that needs by _build_rollout, by the way those functions been imported, vllm mode been imported. Lets vllm-hardware plugins makes it's patchs take effect, then this problem fixed: move patch_vllm_moe_model_weight_loader funciton to independent file, and import patch_vllm_moe_model_weight_loader just before wight loader patching operation. ![image](https://github.com/user-attachments/assets/70a7527e-2671-4435-9d96-ca8595b534c7) ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-07-30 13:10:29 +08:00
kibitzing	23aa10533f	[training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn (#2741 ) ### What does this PR do? This PR updates the `collate_fn` logic inside `verl.utils.dataset.rl_dataset` to consistently handle non-tensor fields as 1D object arrays, preventing runtime errors during concatenation in downstream code such as `recipe/dapo/dapo_ray_trainer.py`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * Tested at: https://github.com/kibitzing/verl/tree/test_tool_n1 * Note: This branch is for testing purposes only and is not intended for merge. * The data used for testing comes from the `train.parquet` and `test.parquet` files released by the [Tool N1 repository](https://github.com/NVlabs/Tool-N1). * part of training script ```python python3 -m recipe.dapo.main_dapo \ data.train_files=$HOME/Tool-N1/verl/verl/data/train.parquet \ data.val_files=$HOME/Tool-N1/verl/verl/data/test.parquet \ data.prompt_key=prompt \ data.truncation='left' \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.gen_batch_size=32 \ data.train_batch_size=24 \ actor_rollout_ref.rollout.n=5 \ algorithm.adv_estimator=grpo \ algorithm.filter_groups.enable=True \ algorithm.filter_groups.max_num_gen_batches=10 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ ... ``` ### Before vs After Behavior (Real Output Logs) * Before: Inconsistent Shape ``` (TaskRunner pid=114826) Training from scratch (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=3 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=1. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=8 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=2. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32, 1) (TaskRunner pid=114826) num_prompt_in_batch=13 < prompt_bsz=24 (TaskRunner pid=114826) num_gen_batches=3. Keep generating... (TaskRunner pid=114826) new_batch.non_tensor_batch["conversations"].shape=(32,) ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s) ``` This caused shape inconsistency across steps, leading to downstream errors during concatenation. * After: Consistent (32,) Shape ``` (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=4 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=1. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=10 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=2. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=12 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=3. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=15 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=4. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=19 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=5. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) (TaskRunner pid=133725) num_prompt_in_batch=23 < prompt_bsz=24 (TaskRunner pid=133725) num_gen_batches=6. Keep generating... (TaskRunner pid=133725) new_batch.non_tensor_batch["conversations"].shape=(32,) ``` With the updated logic, the shape is consistently (32,). * The issue was traced back to the `"conversations"` field in the Tool N1 dataset. This key contains a list of human–gpt messages. In most examples, it's a single-turn conversation (list with length 1), but in some cases, it's a multi-turn conversation (list with length > 1). ### Design & Code Changes The current `collate_fn` processes non-tensor values with: `1df03f3abf/verl/utils/dataset/rl_dataset.py (L62-L63)` While this generally works, it leads to a subtle issue: If `val` is a list of lists and all inner lists happen to be of the same length, NumPy will interpret it as a 2D array with shape (N, L). However, in many RL scenarios, the structure of non-tensor data (e.g. variable-length lists across batches) is not guaranteed to be uniform, which means: - One batch may produce shape `(N, L)` - Another may produce `(N,)` where each element is a list of different lengths - Another may have shape `(N, L')` This causes downstream errors like: `ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)` Specifically, this occurs when multiple step-wise batches are concatenated with: `1df03f3abf/recipe/dapo/dapo_ray_trainer.py (L240)` To enforce consistent 1D object arrays regardless of content, this PR replaces the original line with: ```python for key, val in non_tensors.items(): non_tensors[key] = np.empty(len(val), dtype=object) non_tensors[key][:] = val ``` This ensures that`non_tensors[key]` always has shape (N,) which makes concatenation in downstream logic safer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-30 13:07:47 +08:00
hanhui	2cccd7f09d	[vllm,rollout] fix: vllm rollout lock file permission (#2805 ) ### What does this PR do? This is an naive solution for [issue 2781](https://github.com/volcengine/verl/issues/2781). While it is not an elegant implementation, it works fine for me. > Why use `getpass.getuser()` instead of `os.getlogin()`? > The latter causes errors while running by ray actor/task. [Here is an related issue](https://github.com/python/cpython/issues/84998). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: qinghan <qinghan@dewu.com>	2025-07-30 13:02:53 +08:00
Huapeng Zhou	c3df0b5eb8	[perf] feat: Padding before batch post-process in agent-loop to save time (#2773 ) ### What does this PR do? From issue here: https://github.com/volcengine/verl/issues/2677 Try to pad the `prompt`, `response` & `mask` before batch post-processing to save time Main idea: <img width="1978" height="916" alt="image" src="https://github.com/user-attachments/assets/bf16d45b-9da8-4d07-aab4-d8773e5ab705" /> ```python # prompt_ids: left padded with zeros (e.g., [0,0,0,0,1,2,3,4]) # response_ids: right padded with zeros (e.g., [5,6,7,8,0,0,0,0]) # input_ids: concatenation of prompt + response # Mask: # For example, if the prompt is [1,2,3,4] and the response is [5,6,7,(tool start)8,9(tool end),10,11,12] # - prompt_attention_mask: 0s for padding, 1s for tokens # e.g., [0,0,0,0,1,1,1,1] # - response_attention_mask: 0s for padding, 1s for tokens # e.g., [1,1,1,1,1,1,1,1,1,1,1,0,0,0,0] # attention_mask: concatenation of prompt_attention_mask and response_attention_mask # e.g., [0,0,0,0,1,1,1,1(prompt),1,1,1,1,1,1,1,1,1,1,1,0,0,0,0(response)] # - response_mask: 1s for LLM generated tokens, 0 for tool response/padding tokens # e.g., [1,1,1,1,1,1,1,(tool start),0,0(tool end),1,1,0,0,0,0] # - position_ids: sequential positions for tokens, starting at 0 # e.g., [0,0,0,0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,0,0,0,0] ``` ### Test Environment setup: follow this [tutorial](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/agent_loop.md) Test config in 4 * H100 ```bash #!/bin/bash # run on 8xH100 with optimizations for stability # make sure your current working directory is the root of the project set -x ulimit -n 65535 # 增加网络稳定性环境变量 export CUDA_HOME=/usr/local/cuda export CUDA_VISIBLE_DEVICES=4,5,6,7 PROJECT_DIR="$(pwd)" CONFIG_PATH="$PROJECT_DIR/examples/sglang_multiturn/config" python3 -m verl.trainer.main_ppo \ --config-path="$CONFIG_PATH" \ --config-name='gsm8k_multiturn_grpo' \ algorithm.adv_estimator=grpo \ data.train_batch_size=256 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.return_raw_chat=True \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=128 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=sglang \ actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ actor_rollout_ref.rollout.n=16 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=32 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger='["console","wandb"]' \ trainer.project_name='gsm8k_async_rl' \ trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16-agent-loop-v1' \ trainer.n_gpus_per_node=4 \ trainer.nnodes=1 \ trainer.save_freq=-1 \ trainer.test_freq=20 \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \ trainer.total_epochs=15 \ actor_rollout_ref.rollout.update_weights_bucket_megabytes=128 \ actor_rollout_ref.rollout.trace.backend=weave \ actor_rollout_ref.rollout.trace.token2text=True \ actor_rollout_ref.rollout.mode=async \ actor_rollout_ref.rollout.multi_turn.enable=true ``` Before(v1) & After(v2) <img width="831" height="632" alt="image" src="https://github.com/user-attachments/assets/033737e2-1b63-4b25-8b26-ab593db28a90" /> <img width="1674" height="1272" alt="image" src="https://github.com/user-attachments/assets/296fbb37-430f-4f45-84c1-e003930a1896" /> > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`	2025-07-30 12:27:37 +08:00
William Zeng	4857997201	[tool] fix: Typo fix -- Rename `to_openai_function_tool_schema` to `get_openai_tool_schema` (#2806 ) ### What does this PR do? Fixes a typo in the docstring of some tools. `to_openai_function_tool_schema()` does not exist. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+to_openai_function_tool_schema - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A ### API and Usage Example N/A ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-30 09:34:08 +08:00
Liu Yue	977b7d9ae8	[recipe] feat: @register_policy_loss("geo_mean"); Geometric-Mean Policy Optimization (#2795 ) ### What does this PR do? > This is the official implementaion of paper [*Geometric-Mean Policy Optimization*](https://arxiv.org/abs/2507.20673). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > The code has trained for 100 iterations, and is still running. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. A new policy loss function has been added into "verl/trainer/ppo/core_algos.py" ```python @register_policy_loss("geo_mean") def compute_policy_loss_geo_mean( old_log_prob: torch.Tensor, log_prob: torch.Tensor, advantages: torch.Tensor, response_mask: torch.Tensor, loss_agg_mode: str = "token-mean", config: Optional[DictConfig \| AlgoConfig] = None, ) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch. Tensor]: ... ``` We also added directory "examples/gmpo_trainer" for quick start. ### Design & Code Changes > see above ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-29 22:17:57 +08:00
Chi Zhang	aec8cf40ce	[recipe] feat: add QWen2.5-7b-instruct retool (#2800 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-29 17:50:31 +08:00
none0663	76298addd0	[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process (#2739 ) ### What does this PR do? Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process. This capability is particularly beneficial when the model server shares resources with a training workload on the same machine. It allows the reward model service to be temporarily offloaded (to free up GPU memory) during intensive training sessions and reloaded when the service is required again.	2025-07-29 13:10:20 +08:00
Cheetah	d640f99219	[recipe] fix: fix issue when running split ppo (#2745 )	2025-07-29 07:32:59 +08:00
Blue Space	d255783a0a	[docker] feat: upgrade vllm to 0.9.1 (#2747 )	2025-07-29 07:32:04 +08:00
H	f98ee1c697	[cfg] fix: fix failing rollout config test on main (#2771 ) ### What does this PR do? The cpu unit test is broken when https://github.com/volcengine/verl/pull/2757/files is merged. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-07-28 16:43:56 +08:00
kibitzing	35dc0e6490	[doc] fix: fix typo in agentic RL documentation (#2777 ) ### What does this PR do? Fix a typo in agentic RL documentation. * current `bash examples/data_preprocess/gsm8k_tool_agent_loop.py` * fixed `python examples/data_preprocess/gsm8k_tool_agent_loop.py` > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-28 16:20:51 +08:00
Chi Zhang	c9ccbd5c4b	[recipe] fix: fix retool SFT dataset (#2764 ) ### What does this PR do? - Fix retool data preprocessing (now tools requires to be a list) - Use more common path to save dataset ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-28 10:03:28 +08:00
Mike Dean	00ac37fe58	[misc] fix: Handle N-D arrays and complex objects in union_numpy_dict (#2768 ) ### What does this PR do? This PR fixes a bug in `verl.protocol.union_numpy_dict` where it would crash on NumPy arrays with more than 2 dimensions. It replaces the underlying comparison logic with a robust, recursive function that can handle N-D arrays, nested objects, `NaN` values, and circular references. This resolves issue #2766. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test A comprehensive unit test suite has been added to `tests/test_protocol_on_cpu.py`. The new tests cover the following scenarios, all of which now pass: * Merging dictionaries with identical 3D (and higher) dimensional arrays. * Correctly failing when N-D arrays with the same shape but different values are merged. * Handling nested `object`-dtype arrays containing other arrays, strings, and `None`. * Correctly treating `NaN` values at the same position as equal, mimicking pandas' behavior. * Safely handling circular references without causing a `RecursionError`. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-27 17:24:43 +08:00
Chi Zhang	2e1a1a6603	[BREAKING] [rollout] chore: remove default rollout selection (#2757 ) ### What does this PR do? As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-26 10:11:24 -07:00
Frederick Robinson	ea4442470e	[algo] refactor: don't special-case `compute_policy_loss` (#2701 ) ### What does this PR do? currently the vanilla policy loss mode is special cased. this moves vanilla onto the shared interface and stops speical-casing it. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Fred <frederrx@amazon.com>	2025-07-26 10:09:42 -07:00
H	0f5ab5c854	[doc] feat: add retool blog (#2761 ) ### What does this PR do? add link to the retool blog ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-07-26 13:13:55 +08:00
YumiMom	92e81cfcfd	[perf] feat: add optional role selection in discrete mode for NPU Profiler (#2750 ) ### What does this PR do? Currently, whether in `end-to-end` mode or `discrete` mode, all roles are fully collected. As the sequence length continues to increase, the volume of collected data becomes large, leading to slow parsing. Therefore, we introduce a new feature in the NPU Profiler that allows optional role selection in `discrete` mode, enabling quick collection of specific roles. We have added a new roles parameter in `npu_profile.yaml` to specify the roles to be collected. The currently supported options are: `all`, `rollout_generate`, `actor_compute_log_prob`, `actor_update` and `ref_compute_log_prob`. Setting roles to `["all"]` means all roles will be collected. Other options can be freely combined, for example: `["actor_update", "ref_compute_log_prob"]` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-25 21:53:09 +08:00
Joel	f107800837	[rollout] feat: remove chat scheduler (#2725 ) ### What does this PR do? Remove chat scheduler as describe in #2618	2025-07-25 21:46:35 +08:00
Yeonwoo Sung	58d698e04b	[trainer] refactor: Make sure to keep the type checking (#2634 ) ### What does this PR do? Some codes in the `ppo/ray_trainer.py` fails static type checking (i.e. `invalid type hints` or `function call with nullable variables`). This PR fixes these issues to keep the static type checkers of IDE to track the code syntax properly. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-24 22:32:07 -07:00
Tingberer	caec858ebb	[doc] style: change resize handle from gradient to plain color (#2746 )	2025-07-24 21:20:07 -07:00
Frederick Robinson	f407887414	[CI] feat: add `mypy` to pre-commit (#2614 )	2025-07-25 11:36:34 +08:00
Yan Bai	dc8b5076c3	[megatron] feat: a bunch of optimzation on vram, sequence packing (#2678 ) ### What does this PR do? add a bunch of optimizations for megatron training, including: 1. aggressive_empty_cache to avoid OOM on hybrid engine. Before this sometimes the cache could use as much as 30GB so bring OOMs. 2. better sequence packing pre/post-process. Before this there are a few times of d2h sync when pre/post-process the sequence packing. 3. make `override_ddp_config` compatible to mbridge. The optimized implementations have replaced the old ones, no options needed to enable them. > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-25 10:34:33 +08:00
Blue Space	4879d619fc	[docker] feat: upgrade to torch 2.7, sglang 0.4.8 (#2617 ) ### What does this PR do? [docker] feat: upgrade to torch 2.7, sglang 0.4.8 Stage 2: vllm 0.9.1 Stage 3: mcore 0.13.0 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>	2025-07-24 14:53:24 -07:00
Tingberer	bcd336fd46	[doc] feat: add resizable sidebar and improve layout (#2577 ) ## Summary This PR adds a resizable sidebar feature and improves the documentation layout for better user experience. ## Changes - Resizable sidebar: Users can drag to resize the sidebar, with preference saved in localStorage - Full-width layout: Documentation now uses full screen width for better readability - Responsive design: Better layout adaptation for different screen sizes - Navigation improvements: Attempts to improve table of contents navigation behavior ## Features - Drag handle on sidebar for resizing - Double-click to reset sidebar to default width - localStorage persistence for user preferences - Improved CSS for better visual experience ## Technical Details - Added `_static/custom.css` for styling improvements - Added `_static/js/resizable-sidebar.js` for functionality - Updated `conf.py` to include new CSS and JS files ## Testing Tested on the documentation build with successful functionality for sidebar resizing and layout improvements.	2025-07-24 14:46:38 -07:00
Blue Space	1df03f3abf	[ci] fix: release ascend test time, fix one step off-policy CI (#2731 ) ### What does this PR do? release ascend test time, recent PRs got cancelled but operation successfully, fix one step off-policy CI. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-24 16:58:16 +08:00
Joel	a0248a8f17	[recipe] chore: add retool training script (#2732 ) ### What does this PR do? Add retool training script.	2025-07-24 16:34:10 +08:00
Blue Space	8adcffa25a	[ci] fix: checkpoint_convertor ci miss a hf model download (#2730 ) ### What does this PR do? fix: checkpoint_convertor ci miss a hf model download ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-24 15:56:08 +08:00
Wang Zilong	88c084c4f3	[doc] feat: Add agent-lightning in the list of "awesome works using verl (#2726 ) Add agent-lightning into `Awesome work using verl` ### What does this PR do? This PR adds a recent work built upon verl into the "Awesome work using verl" Section of the README.md file. Add agent-lightning, a flexible and extensible framework that enables seamless agent optimization for any existing agent framework, into `Awesome work using verl` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-07-24 14:49:27 +08:00
Nan Jiang	dc3015e9af	[tool] fix: geo3k create return str instead of tuple (#2714 ) ### What does this PR do? change tool.create return from `instance_id, None` to `instance_id` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-23 22:56:13 -07:00
Bowei Song	73fc53f600	[megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification (#2484 ) ### What does this PR do? Fixes gradient computation conflict in `verl/workers/actor/megatron_actor.py` when entropy regularization is enabled: - Root Cause: The entropy calculation `entropy = vocab_parallel_entropy(logits)` fails during backward propagation because `log_probs = vocab_parallel_log_probs_from_logits(logits, label)` performs in-place modifications on the logits tensor earlier in the code. This corrupts the original computation graph needed for gradient calculation. - Fix: Decouples tensor dependencies by cloning logits before entropy calculation to preserve the original computation graph while maintaining existing log_probs computation flow. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test 1. Run modified training script: ```bash examples/ppo_trainer/run_qwen2-7b_math_gsm8k_megatron.sh \ --actor_rollout_ref.actor.entropy_coeff=0.01 ``` 2. The following error is observed (before repair): <img width="1396" height="605" alt="image" src="https://github.com/user-attachments/assets/0ed0f9f8-f4eb-41d3-9db8-c8f2163de910" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-24 13:37:18 +08:00
H	d57bfb02b3	[misc] chore: bump main branch version to v0.5.0.dev (#2718 ) ### What does this PR do? bump main branch version to v0.5.0.dev ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-07-24 10:46:16 +08:00
Chayenne	0eed7124fc	[sglang] fix: Adding strict naming sanity for sglang (#2719 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Thanks so much for pointing this out: https://github.com/volcengine/verl/pull/2672#issuecomment-3105253661 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>	2025-07-24 10:45:57 +08:00
Jason Chen	1862f748e5	[ray] feat: RayWorkerGroup support set worker env (#2685 ) ### What does this PR do? Support creating Ray worker with customized environment variable. > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: 陈齐翔 <chenqixiang.cqx@bytedance.com>	2025-07-24 10:07:35 +08:00
H	6a9a1b872d	[ci] test: add CriticWorker unit test, make some util CPU friendly (#2717 ) ### What does this PR do? add CriticWorker unit test, make some util CPU friendly TODO: - need to add option for attn_implementation. With this, the actor/critic test can run on CPU nodes without problems. - extend the test with sequence parallel & dynamic_bsz options ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Signed-off-by: ShareLer <ShareLe@163.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joel <wuxibin@bytedance.com> Co-authored-by: Cheetah <1659275352@qq.com> Co-authored-by: 杨睿 <yangruipis@163.com> Co-authored-by: X. HU <huxiaobo@zju.edu.cn> Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com> Co-authored-by: Ziheng Jiang <ziheng@apache.org> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-23 15:36:10 -07:00
H	4de3ecf0f0	[cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code (#2621 ) As initially mentioned in https://github.com/volcengine/verl/discussions/1941, having structured configuration classes in verl makes argument passing easier for testing and validation. This is an extended thread on the current implementation of configuration schema in verl. Related PRs: - https://github.com/volcengine/verl/pull/2117 - https://github.com/volcengine/verl/pull/2621 # Motivation By moving from loose `omegaconfig.DictConfig`-based parameters to structured dataclasses, we gain: - Type safety & IDE support when accessing fields (e.g. cfg.optim.lr). - Validation hooks via __post_init__ in each class. - Immutable defaults with controlled mutability (e.g., an extra field). - Seamless Hydra/OmegaConf integration and easy per-recipe extension. # Core: BaseConfig hydra natively provides support for converting DictConfig to dataclass, but dataclass does not support accessing attribute via `get()`. We introduce a base class to provide backward compatibility and make the change less abrupt for existing users. All config dataclasses inherit from BaseConfig, which: - Implements collections.abc.Mapping → dict-like iteration/access. - Freezes attributes once set, unless listed in _mutable_fields. - Provides an `extra: dict[str, Any]` for unchecked extensions. ```python @dataclass class BaseConfig(collections.abc.Mapping): """Dict-like, frozen dataclass with opt-in mutability.""" _mutable_fields: set[str] = {"extra"} extra: dict[str, Any] = field(default_factory=dict) def __setattr__(self, name: str, value): if name in self.__dict__ and name not in self._mutable_fields: raise FrozenInstanceError(f"Field '{name}' is frozen") super().__setattr__(name, value) # Mapping methods: get, __getitem__, __iter__, __len__ … ``` # Example Config Classes (verl/trainer/config) Each sub-component of the trainer has its own dataclass, inheriting BaseConfig. ```yaml: critic: checkpoint: _target_: verl.trainer.config.CheckpointConfig save_contents: ["model","optimizer","extra"] load_contents: ["model","optimizer","extra"] async_save: false ``` Definition: ```python @dataclass class CheckpointConfig(BaseConfig): """What to save/load and async behavior.""" save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"]) load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"]) async_save: bool = False def __post_init__(self): # validation checks go here after initialization ckpt_cfg = CheckpointConfig(async_save=True) print(ckpt_cfg.save_contents) print(ckpt_cfg.get("save_contents", default_value)) print(ckpt_cfg["save_contents"]) # converting hydra-generated omegaconf.DictConfig to the dataclass config: from verl.utils.config import omegaconf_to_dataclass ckpt_cfg_from_cli = omegaconf_to_dataclass(config.critic.checkpoint) ``` # Extending existing config classes Because now configs become structured, unexpected keys would raise exceptions. To add new keys, there are two ways: ## Explicit class extensions: ```python from verl.workers.config import FSDPActorConfig @dataclass class SPPOActorConfig(FSDPActorConfig): """Add SPPO-specific temperature/penalty.""" sppo_eta: float = 1.0 ``` When using yaml or from command line, update the target config class: ```yaml hydra: searchpath: - file://verl/trainer/config defaults: - ppo_trainer # base trainer config - _self_ # then apply these overrides actor_rollout_ref: actor: _target_: recipe.sppo.config.SPPOActorConfig # new target dataclass required for extension sppo_eta: 1.0 ``` or directly from command line: ```bash python main_sppo.py \ actor_rollout_ref.actor._target_=recipe.sppo.config.SPPOActorConfig \ actor_rollout_ref.actor.sppo_eta=1.0 ``` ## Leverage the `extra` field Adding more keys to the `extra` field of any dataclass that inherits from `BaseConfig` also works. This way there's no need to define your own dataclass in python: ```yaml hydra: searchpath: - file://verl/trainer/config defaults: - ppo_trainer # base trainer config - _self_ # then apply these overrides actor_rollout_ref: actor: extra: sppo_eta: 1.0 ``` # Declaring mutable fields For historical reasons some fields in the configs are mutated inplace in the codebase such as batch size for data/sequence parallelism. We are in the process of deprecating this kind of behavior. However, if you want to intentionally mutate one field, specify it with the `_mutable_fields` attr: ```python @dataclass class CheckpointConfig(BaseConfig): """What to save/load and async behavior.""" _mutable_fields = BaseConfig._mutable_fields \| {"save_contents"} # mark save_contents as mutable. save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"]) load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"]) async_save: bool = False ``` # Other helpful resources verl default trainer configs combines the following config files together, specified in the `_defaults_` field: https://github.com/volcengine/verl/blob/main/verl/trainer/config/ppo_trainer.yaml#L1-L36 - verl/trainer/config/ppo_trainer.yaml # main config for entrypoint - verl/trainer/config/actor/dp_actor.yaml - verl/trainer/config/critic/dp_critic.yaml - verl/trainer/config/reward_model/dp_reward_model.yaml - verl/trainer/config/rollout/rollout.yaml To quickly peek the default full config in a single file, you can check the auto-generated full config in https://github.com/volcengine/verl/blob/main/verl/trainer/config/_generated_ppo_trainer.yaml # Change log and impact on existing code This PR converts the following fields to structured dataclass in the training pipeline. More can be done in future PRs (contributions from the community is welcome) - [x] actor_rollout_ref.actor - [x] critic - [ ] actor_rollout_ref.rollout - [ ] actor_rollout_ref.ref - [ ] reward_model - [ ] data - [ ] trainer Changes needed for existing code that added new fields to config: - see recipe/sppo for an example - `OmegaConf.to_container(self.config.model.get("override_config", OmegaConf.create()))` now has to manually changed to `self.config.model.get("override_config", {})`. Because OmegaConf.to_container expects a DictConfig but config.model.override_config is already a dict. # Other Breaking Changes critic.optim.lr for megatron changed from 1e-6 to 1e-5 --------- Signed-off-by: ShareLer <ShareLe@163.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Joel <wuxibin@bytedance.com> Co-authored-by: Cheetah <1659275352@qq.com> Co-authored-by: 杨睿 <yangruipis@163.com> Co-authored-by: X. HU <huxiaobo@zju.edu.cn> Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com> Co-authored-by: Ziheng Jiang <ziheng@apache.org> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-23 11:45:14 -07:00
H	8fdc4d3f20	[misc] chore: bump version to v0.5.0 (#2716 )	2025-07-23 10:57:10 -07:00
Shawn/Yuxuan Tong	e13863e463	[ci] fix: auto-download model in Megatron-related CI tests (#2698 ) ### What does this PR do? Add a step that downloads the model needed for Megatron-related CI tests. ### Test See the CI result. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-23 10:49:09 -07:00
Nan Jiang	f926dc90b0	[sglang] fix: fix is_vlm issue (issue #2639 ) (#2667 )	2025-07-23 10:45:57 -07:00
Blue Space	4ed106698b	[megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS in ray error (#2709 ) ### What does this PR do? Try avoiding repeated env vars in ray runtime env. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-23 18:57:57 +08:00
Zhirong Chen	5bfb58e35d	[recipe] fix: fix dapo cannot save the checkpoint of last step (#2619 ) ### What does this PR do? This checkpoint fix the bug that in dapo recipe the dapo_ray_trainer cannot save the checkpoint of last step. ### Checklist Before Starting Similar PR https://github.com/volcengine/verl/pull/2090 ### Test <img width="645" height="21" alt="image" src="https://github.com/user-attachments/assets/cf501f2c-6b80-49aa-871a-3b066a2003c2" /> Can not save the last checkpoint. Only save the checkpoint with training steps % save_freq=0 ### Design & Code Changes dapo_ray_trainer.py only record training steps variable but not record generation steps. I add a variable gen_steps to record it. ### Others Load checkpoint logic is also incorrect here. <img width="624" height="340" alt="image" src="https://github.com/user-attachments/assets/8469de9d-0fcd-47f3-8b74-f4ad7f155802" /> Progress bar initial value should be self.gen_steps instead of self.train steps, thus we also need to fix load_checkpoint and save_checkpoint. 0	2025-07-23 17:26:35 +08:00
Shawn/Yuxuan Tong	e9072c58fa	[ci] feat: CI request via Feishu (#2699 ) ### What does this PR do? Add CI request via Feishu. ### Test n/a ### API and Usage Example n/a ### Design & Code Changes n/a ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-07-23 14:54:15 +08:00
Xihuai Wang	0404956290	[training_utils] fix: align tensorboard default dir for val_log_generation (#2696 ) ### What does this PR do? align tensorboard default dir for val_log_generation --------- Co-authored-by: wangxihuai <wangxihuai@meituan.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-23 14:09:58 +08:00
Stefan He	c95c9ef701	[fsdp,megatron,sglang] fix: Fix torch reduce to speed up update weights (#2692 ) ### What does this PR do? Speed up QWen3 MOE update weights from 110s to 37s Related to : https://github.com/sgl-project/sglang/pull/8267 Co-authored-by: CuiBo <82354186+SuperCB@users.noreply.github.com> Co-authored-by: GeLee <865038696@qq.com> > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: CuiBo <82354186+SuperCB@users.noreply.github.com> Co-authored-by: GeLee <865038696@qq.com>	2025-07-23 13:40:41 +08:00
OC	dc1599b7e4	[rollout] fix: bug in init_engine Method of AsyncSglangServer (#2664 ) Fix error in AsyncSglangServer.init_engine when find works. The correct logic should be based on: gpu_per_node * nodes = dp_size * tp_size Also added test steps reported from https://github.com/volcengine/verl/issues/2633.	2025-07-23 13:09:37 +08:00
Blue Space	4792b70dd4	[megatron] fix: reset recompute_granularity and add backward compatibility fix (#2693 ) ### What does this PR do? Reset `recompute_granularity` default to `None` to align with Megatron. Add backward compatibility fix. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-23 11:16:23 +08:00
Wei (Will) Feng	4c10dddf74	[fsdp] fix: use torch 2.7 state dict api for torch 2.6 to resolve OOM (#2606 ) ### What does this PR do? for torch==2.6.0, distributed state dict is buggy and can leads to OOM copy the fixed state dict api from torch==2.7.0 to verl/third_party. It's convinent for users who cannot upgrade to torch==2.7.0 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python HYDRA_FULL_ERROR=1 CUDA_LAUNCH_BLOCKING=1 python3 -m verl.trainer.main_ppo algorithm.adv_estimator=grpo data.train_files=$HOME/data/gsm8k/train.parquet data.val_files=$HOME/data/gsm8k/test.parquet data.train_batch_size=512 data.max_prompt_length=1024 data.max_response_length=2048 data.filter_overlong_prompts=True data.truncation='error' data.image_key=images actor_rollout_ref.model.path=Qwen/Qwen2.5-VL-32B-Instruct actor_rollout_ref.model.use_remove_padding=True actor_rollout_ref.model.enable_gradient_checkpointing=True actor_rollout_ref.actor.strategy=fsdp2 actor_rollout_ref.actor.optim.lr=1e-6 actor_rollout_ref.actor.ppo_mini_batch_size=128 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=10 actor_rollout_ref.actor.use_kl_loss=True actor_rollout_ref.actor.kl_loss_coef=0.01 actor_rollout_ref.actor.kl_loss_type=low_var_kl actor_rollout_ref.actor.entropy_coeff=0 actor_rollout_ref.rollout.name=vllm actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=20 actor_rollout_ref.rollout.tensor_model_parallel_size=4 actor_rollout_ref.rollout.gpu_memory_utilization=0.6 actor_rollout_ref.rollout.enable_chunked_prefill=True actor_rollout_ref.rollout.enforce_eager=False actor_rollout_ref.rollout.free_cache_engine=False actor_rollout_ref.rollout.n=5 actor_rollout_ref.ref.strategy=fsdp2 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=20 actor_rollout_ref.ref.fsdp_config.param_offload=True trainer.critic_warmup=0 trainer.logger=['console','tensorboard'] trainer.project_name='verl_grpo_example_geo3k' trainer.experiment_name='qwen2_5_vl_32b_function_rm' trainer.n_gpus_per_node=8 trainer.nnodes=1 trainer.save_freq=-1 trainer.test_freq=5 trainer.total_epochs=5 ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). FSDP2 memory snapshot: cpu offloading works and peak memory is slightly lower than FSDP1 <img width="1193" height="543" alt="Screenshot 2025-07-17 at 14 53 49" src="https://github.com/user-attachments/assets/2d5b88b2-0d9e-40f7-ad75-f42b9acf1bab" /> --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-07-22 19:54:33 -07:00
rj42	d20e5e07e1	[fsdp, ckpt] fix: Wrap `GenerationConfig.from_pretrained` with try-except to avoid crashes. (#2659 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Wrapping `GenerationConfig.from_pretrained` in a try-except block to prevent crashes during checkpoint saving. [Issue](https://github.com/volcengine/verl/issues/2658) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+GenerationConfig) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-23 10:18:35 +08:00
H	8888122a89	[megatron] fix: remove the demising model.enable_gradient_checkpointing flags in the script (#2691 ) ### What does this PR do? They were removed in https://github.com/volcengine/verl/pull/2651 ... @ETOgaosion ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-23 09:25:30 +08:00
Blue Space	f252da34cf	[megatron] fix: CUDA_DEVICE_MAX_CONNECTIONS not taking effect (#2687 ) ### What does this PR do? According to Kunlun Li 's detailed profiling work, envvar `CUDA_DEVICE_MAX_CONNECTIONS=1` was not taking effect, the benefit of this setting described here: https://github.com/NVIDIA/Megatron-LM/issues/533#issuecomment-1760193239 Try put this variable in ray `runtime_env` to take effect. This will make it a default option. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-22 20:51:12 +08:00
Blue Space	244481ac8f	[misc] fix: main pre-commit and API change (#2675 ) ### What does this PR do? Fix pre-commit error led by previous PR and cpu_unit test. Allow recompute API change. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-22 15:01:20 +08:00
Blue Space	c5b189a1af	[BREAKING][megatron] refactor: activation checkpointing APIs (#2651 ) ### What does this PR do? Since we directly offer `override_transformer_config` option, we directly use it to recompute activations. Default settings are the same with `megatron.training`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-22 10:24:28 +08:00
Chayenne	72cae971d0	[sglang] fix: rename Sglang to SGLang following SGLang's fashion (#2672 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. As titled. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>	2025-07-22 09:11:20 +08:00
Zhihui Xie	d062314a18	[data, recipe] fix: remove redundant json parsing (#2671 ) ### What does this PR do? > This PR fixes data preprocessing issues in MultiTurnSFTDataset. Specifically, `json.loads` should not be called in `__getitem__`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2233 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test After fixing the issue, the results for [ReTool's SFT recipe](https://github.com/volcengine/verl/blob/main/recipe/retool/run_qwen3_4b_sp4.sh) are as expected: <img width="5056" height="2656" alt="W B Chart 7_21_2025, 2_09_33 PM" src="https://github.com/user-attachments/assets/3252d8d2-7002-4a50-8329-0b0d4da1fa3e" /> > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example N/A ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-22 09:09:10 +08:00
Lin Yuan	2bcc5d1212	[misc] fix: fix prompt and response key in gemma7b example (#2610 ) ### What does this PR do? Fix the SFT gsm8k gemma7b example. Before this change create_sft_dataset would error out. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-21 16:06:52 -07:00
Xihuai Wang	e5f0b2aa80	[perf] feat: mistral and gemma3_text mfu compute support (#2622 ) ### What does this PR do? Add mistral and gemma3_text mfu compute support --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-21 16:54:11 +08:00
Hecate	ac826e0558	[tool] chore: Add log for AsyncRolloutRequest ID, and rollout viewr to support request id display and search (#2636 ) ### What does this PR do? Add log for AsyncRolloutRequest ID in PPO ray_trainer and sglang_rolllout. Update rollout viewr to support request id display and search ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failur	2025-07-21 12:01:37 +08:00
Guanning Zeng	3f6cd47926	[rollout,vllm] fix: A major issue in random sampling of vllm engine (#2646 ) There is a optional config `+actor_rollout_ref.rollout.seed`, which is used in [`fcb1e191b7/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py (L165-L185)`) This config ensures identical initialization of vllm engine in distributed systems. However in [`fcb1e191b7/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py (L202)`) `class SamplingParam` unexpectedly adopts this `seed` param again when `actor_rollout_ref.rollout.seed` is explicitly set. In sampling param, this means the reproducibility during vllm inference. This will cause serious problems, because in recent verl, the `ray_trainer.py` will first flatten the input prompts, e.g. [`fcb1e191b7/verl/trainer/ppo/ray_trainer.py (L1160)`) so if `+actor_rollout_ref.rollout.seed` is set, identical prompts will receive identical responses, leading to a completely collapse of GRPO training, as every advantage is zero, for example: <img width="1104" height="858" alt="Screenshot 2025-07-20 002009" src="https://github.com/user-attachments/assets/32eb1cc3-2ca2-41b9-9a9c-57b5dc557ed1" />	2025-07-21 12:00:28 +08:00
Chi Zhang	ac414d95c4	[recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node (#2645 ) ### What does this PR do? - As title - Achieves around 0.28 AIME'24 after 100 steps which takes around 1 day on a H800 single node - Note that we start from base model ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-20 18:49:21 -07:00
Aaron Yee	5d5ae81cdb	[sglang] fix: update response handling and scoring method in GSM8K interaction (#2428 ) ### What does this PR do? This PR corrects a mistake when calculating rewards during training with the `gsm8k_w_interaction` setting. - Changed the role check from "user" to "assistant" when extracting the last message content. - Simplified response assignment by removing unnecessary prefix checks. - Updated scoring method from "flexible" to "strict" for improved accuracy in GSM8K interactions. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example Docs update has been included in the PR. ### Design & Code Changes No change for the design. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-21 08:06:46 +08:00
beep-bebop	fcb1e191b7	[doc] fix: non-standardized path references (#2637 ) Fix non-standardized path references. ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Fix non-standardized path references in examples/grpo_trainer/run_moonlight16b_math_megatron.sh ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-20 18:49:16 +08:00
OC	7fc3029a1e	[doc] fix: add options to enable agent loop (#2624 ) ### What does this PR do? Add required options to enable agent loop in document. ### Checklist Before Starting - [ x] Search for similar PRs. Paste at least one query link here: ... - [ x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-20 06:03:06 +08:00
shaofei hu	5d52d15fd3	[trainer] feat: Add FSDPCheckpointManager for SFTtrainer, support resume training, manage the number of CKPTS in keep (#2292 ) [trainer] feat: Support resume from checkpoint, manage the number of CKPTS in keep, compatible with previously saved CKPTS ### What does this PR do? This PR adds checkpoint resume support to the FSDP SFT trainer using `FSDPCheckpointManager`, enabling seamless continuation of training with full state restoration — including model weights, optimizer, scheduler, and training progress. Introduces automatic checkpoint retention management, allowing control over how many recent checkpoints to keep during training. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example To use the resume functionality, set the following configuration to your trainer config: ```yaml trainer: save_freq: 100 max_ckpt_to_keep: 5 # Maximum number of checkpoints to keep, set to null to keep all # Resume mode: "auto", "disable", or "resume_path" # "auto": resume from last checkpoint if available # "disable": start from scratch # "resume_path": resume from a user-defined path resume_mode: auto # Path to resume training from (used when resume_mode is "resume_path" or "auto") resume_from_path: null checkpoint: # with 'hf_model' you can save whole model as hf format, now only use sharded model checkpoint to save space save_contents: ["model", "optimizer", "extra"] load_contents: ${trainer.checkpoint.save_contents} ``` Example Python usage: ```python # Set these options to your existing script trainer.save_freq=100 trainer.resume_mode=auto # "disable": start from scratch, "resume_path": resume from a user-defined path trainer.resume_from_path=null # "null" uses the latest ckpt when set resume_mode auto, or you can specifies the path trainer.max_ckpt_to_keep=5 # limit number of saved checkpoints (null for unlimited) trainer.checkpoint.save_contents=[model,optimizer,extra,hf_model] ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>	2025-07-19 12:15:23 +08:00
Blue Space	69a467f934	[docker] fix: downgrade TransformerEngine version 2.2.1 to allow mcore image using rope fusion and provide another set of v0.5 image (#2611 ) ### What does this PR do? Downgrade TransformerEngine version to allow mcore image using rope fusion and provide another set of v0.5 image. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-18 17:23:19 +08:00
Ziheng Jiang	9d7cba4e12	[trainer] refactor: Training Engine Interface and Development Plan (#1977 ) # [Refactor] Training Engine Interface and Development Plan ## Motivation See the original RFC for background: https://github.com/volcengine/verl/issues/1371 Modernizing our training loop requires that we: - Decouple training-backend implementation from algorithm code so each can evolve independently - Unify on a single, well-defined `Engine` interface across FSDP/Megatron/etc backends - Enable unit-testing of each backend implementation in isolation - Guarantee algorithm “roles” (Critic, Actor, Rollout, Ref) remain completely engine-agnostic. --- ## Current Implementation This PR: - Introduces an abstract `BaseEngine` class that defines a unified training‐engine interface. - Implements `FSDPEngine`, a concrete `BaseEngine` using PyTorch FullyShardedDataParallel. - Provides a `CriticWorker` based on `FSDPEngine` that plugs seamlessly into existing PPO training code without any changes. ### Classic Training Loop with the New Interface ```python # 1. Build and initialize engine engine = FSDPEngine(config) engine.init_model() engine.set_loss_fn(loss_fn) # 2. Training loop for epoch in range(config.num_epochs): for batch in train_loader: # a) zero gradients engine.optimizer_zero_grad() # b) forward + backward with engine.train_mode(): preds, loss, ctx = engine.forward_backward_step( batch, ctx, forward_only=False, preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn ) # c) update and schedule grad_norm = engine.optimizer_step() current_lr = engine.lr_scheduler_step() # 3. Evaluation with engine.eval_mode(): for micro_batch in data: preds, ctx = engine.forward_backward_step( micro_batch, ctx, forward_only=True, preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn ) ``` ### Detailed BaseEngine Interface We now introduce an abstract base class, `BaseEngine`, which defines our unified training-engine interface. Key enhancements over the original RFC: - `train_mode()` / `eval_mode()` Context managers to control parameter and activation load/offload at the start and end of each loop. - `shard_data()` / `unshard_data()` APIs for partitioning and gathering data across devices or workers. - `preprocess_fn` / `postprocess_fn` in `forward_backward_step()` Hooks to apply custom transformations before and after each micro-batch pass. Below are the detailed signatures for each core method. ```python class BaseEngine(object): """ Abstract base class defining the interface for model training engines. Engine implementations must subclass BaseEngine and provide concrete behavior for all methods. """ def __init__(self, config): """ Initialize the BaseEngine. Args: config: Configuration object containing parameters for engine setup. """ raise NotImplementedError def init_model(self): """ Instantiate or load the model, optimizer, and learning rate scheduler. Should prepare all components necessary for training or evaluation. """ raise NotImplementedError def train_mode(self): """ Context manager entry for switching the engine and model into training mode. Usage: with engine.train_mode(): # runs in training mode """ raise NotImplementedError def eval_mode(self): """ Context manager entry for switching the engine and model into evaluation mode. Usage: with engine.eval_mode(): # runs in evaluation mode """ raise NotImplementedError def forward_backward_step(self, batch, ctx=None, forward_only=False, preprocess_fn=None, postprocess_fn=None): """ Execute a forward pass (and optional backward pass) over a batch of data. Args: batch: Raw batch data (e.g., tensors or mappings) to process. ctx: Optional context dict passed to preprocess/postprocess functions. forward_only: If True, skip gradient computation and backward pass. preprocess_fn: Function(batch, ctx) -> (inputs, ctx), applied before model call. postprocess_fn: Function(outputs, ctx) -> (predictions, ctx), applied after model call. Returns: If forward_only: (predictions, ctx) Else: (predictions, loss, ctx) """ raise NotImplementedError def optimizer_zero_grad(self): """ Zero out gradients of all parameters before starting a new backward pass. """ raise NotImplementedError def optimizer_step(self): """ Perform an optimization step to update model parameters based on accumulated gradients. Returns: grad_norm (float): The norm of the gradients before clipping or update. """ raise NotImplementedError def lr_scheduler_step(self): """ Advance the learning rate scheduler by one step. Returns: current_lr (float or list[float]): Updated learning rate(s). """ raise NotImplementedError def shard_data(self, data): """ Shard or partition data for distributed training or parallel execution. Args: data: Data structure to be sharded across devices/workers. Returns: Sharded data in the same format as input. """ raise NotImplementedError def unshard_data(self, data): """ Reconstruct or gather sharded data back to a unified format. Args: data: Sharded data structure to reconstruct. Returns: Unsharded, combined data. """ raise NotImplementedError def set_loss_fn(self, loss_fn): """ Set the loss function to be used during training. Args: loss_fn: Callable(data, predictions, ctx) -> (loss_tensor, new_ctx) """ raise NotImplementedError def to(self, device: str, model: bool = True, optimizer: bool = True): """ Move model parameters, optimizer states, or both to the specified device. Args: device: Target device identifier (e.g., "cuda" or "cpu"). model: If True, move the model. optimizer: If True, move the optimizer states. """ raise NotImplementedError def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None): """ Save model, optimizer, and scheduler states to a checkpoint. Args: local_path: Local filesystem path to save checkpoint. hdfs_path: Optional HDFS path to copy checkpoint. global_step: Integer training step number for naming. max_ckpt_to_keep: Maximum number of recent checkpoints to retain. """ raise NotImplementedError def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_load=True): """ Load model, optimizer, and scheduler states from a checkpoint. Args: local_path: Local filesystem path of the checkpoint. hdfs_path: Optional HDFS path where checkpoint is stored. del_local_after_load: Whether to delete local copy after loading. """ raise NotImplementedError ``` ### FSDPEngine Implementaion A concrete `FSDPEngine` implements all methods using PyTorch FullyShardedDataParallel, supporting all the features that FSDP DPCritic Worker support: - Multi-GPU/model sharding - Activation- and optimizer-offload - LoRA & sequence parallelism - Dynamic batch size and remove padding ### CriticWorker Implementation based on the FSDPEngine - Unchanged public API - Each role calls only BaseEngine methods (init_model, train_mode/eval_mode, forward_backward_step, etc.) - No modifications needed in existing algorithms (e.g., PPOTraining) - New roles can be plugged in identically to legacy code ## Development Plan We’ll roll this out in three gated phases, controlled by a feature-flag (`use_legacy_worker_impl`). ### Phase 1: Engine Development > Flag: use_legacy_worker_impl = True (default) > New interface under active development - Refactor Critic, Actor, Rollout, Ref to use only BaseEngine APIs - Design a hierarchical, immutable config system for engine/backends - Ensure PPO training curves and final accuracy match legacy implementation ### Phase 2: Migration > Flag: use_legacy_worker_impl = False (default) – legacy path logs a deprecation warning > All new code targets the new interface; 2–3 months of integration/stress testing - Enforce new interface for all feature work - Gather benchmarks, bug reports, and performance data ### Phase 3: Cleanup > After Phase 2 validation: - Remove legacy worker code and flags - Finalize documentation, update changelogs, close deprecation notices Please review this refactor and share any feedback or concerns! Contributions are welcome.	2025-07-17 22:05:21 -07:00
Le Xue	223caf7022	[single_controller] fix: padding for kwargs (#2585 ) ### What does this PR do? 1. Fix bugs in func `_split_args_kwargs_data_proto_with_auto_padding`: - Fix the padding_size calculation in kwargs to prevent additional padding when `data_proto_len % chunks == 0`. - Add the missing padding processing in kwargs. 2. Abstract the repetitive processing logic to simplify the code. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: ShareLer <ShareLe@163.com>	2025-07-18 10:10:49 +08:00
X. HU	fb810355f3	[tool] fix: supports variable arguments for marked_timer (#2576 ) ### What does this PR do? bugfix for npu marked_timer ` File "xx/recipe/dapo/main_dapo.py", line 167, in run trainer.fit() File "xx/recipe/dapo/dapo_ray_trainer.py", line 134, in fit with marked_timer("gen", timing_raw, "red"): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/contextlib.py", line 301, in helper return _GeneratorContextManager(func, args, kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/contextlib.py", line 105, in __init__ self.gen = func(args, *kwds) ^^^^^^^^^^^^^^^^^^^ TypeError: marked_timer() takes 2 positional arguments but 3 were given` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-17 13:35:36 -07:00
杨睿	2b2aa9d3fd	[tool] chore: introduce RolloutViewer TUI tools (#2469 ) ### What does this PR do? Introduce a RolloutViewer TUI tools to visualize rollout and reward dumped results easily, which supports: - ⚡ async data loading, lightning open speed - ⌨️ full keyboard shortcut operation, you don't need a mouse - 🔍 text search and highlight, you won't miss anything - 📝 table or plain mode usage: ```bash python scripts/rollout_viewer.py ${trainer.rollout_data_dir} ``` here is the main window screen shot: <img width="2540" height="1416" alt="image" src="https://github.com/user-attachments/assets/e34e5157-2880-4a21-afb2-73885d0dfb11" /> > We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI Platform Technology Department , dedicated to developing high-performance, easily-scalable distributed post-training engines. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-17 13:30:41 -07:00
Cheetah	7459131411	[hardware] refactor: replace device_name with config.trainer.device (#2542 ) ### What does this PR do? In some methods, the get_device() method is redundant, and we plan to replace get_deivce with config.trainer.device ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-07-17 13:29:01 -07:00
Joel	2adedb77b4	[doc] chore: add agent loop design doc (#2598 ) ### What does this PR do? Add Agent Loop design doc.	2025-07-17 13:27:27 -07:00
H	332c7d53c1	[cfg] refactor: add flatten megatron trainer config generation and verification script (#2582 ) ### What does this PR do? - Added CONFIG_SPECS array: "config_name:output_file:config_arg" format - Now generates both _generated_ppo_trainer.yaml and _generated_ppo_megatron_trainer.yaml - Maintains identical output format and verification behavior ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-07-17 08:08:45 -07:00
H	0b62a6ece1	[cfg] feat: add critic config class (#2583 ) Added CriticConfig, MegatronCriticConfig, and FSDPCriticConfig dataclasses with a clear inheritance hierarchy for critic model configuration, and updated YAML files to support direct dataclass instantiation. ## Changes - Introduced dataclasses for critic configs, all inheriting from BaseConfig. - Added _target_ fields to critic YAML files for compatibility with omega_conf_to_dataclass. - Added unit tests to verify config instantiation and inheritance. ## Special notice both megatron and fsdp critic contains the following config: - model - optimizer however, the config names in these two configs are not yet consistent. In this PR, they are retreated as `dict[str, Any]` for flexibility. We shall introduce model config and optimizer config are they are consolidated. I've also removed kl_cntrol from megatron critic config, they're not used @ETOgaosion --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-17 15:59:47 +08:00
Xihuai Wang	40d638c63b	[doc] fix: typo in perf_tuning.rst (#2590 ) ### What does this PR do? typo in perf_tuning doc	2025-07-17 15:58:34 +08:00
meituan-search	648e3c95cc	[doc] fix: fix some contents for one step off policy (#2591 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. fix some contents for one step off policy ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Co-authored-by: ArronHZG <hou.zg@foxmail.com>	2025-07-17 15:54:06 +08:00
Qifan Zhang	1775bd638f	[trainer] fix: maybe_filter_out_long_prompts on image and video (#2553 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. The `filter_out_long_prompts` function incorrectly used the `messages` variable when it should have used `doc`. This led to prompts with images or videos not being filtered correctly based on length. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. The variable messages is of type list[dict], for example: ```python [ {"type": "text", "content": "xxx"}, {"type": "image", "content": "xxx"} ] ``` The variable doc is a dict, for example: ```python { "data_source": xxx. "prompt": xxx, "images": xxx, } ``` We need to retrieve the image or video column from the dataset, load the actual images or videos, and then pass them into the tokenizer to obtain the sequence length. Using messages here is incorrect — both the type and semantics are inappropriate. We should be using doc instead. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-17 14:17:20 +08:00
H	d51c52f754	[ci] chore: add codeowner for role/engine (#2587 ) ### What does this PR do? add codeowner for role/engine cc @ZihengJiang	2025-07-16 22:05:04 -07:00
Titanpku	64601e418c	set use_kl_in_reward=True in reinforce_plus_plus (#2580 ) set use_kl_in_reward=True in reinforce_plus_plus	2025-07-17 12:10:54 +08:00
imh966	503ea75f53	[trainer, fsdp, vllm, recipe] feat: one step off async training recipe (#2231 ) ### What does this PR do? This PR provides a simple implementation of one step off async training with fsdp and vllm backend. We conducted three different experiments with qwen2.5_3b model on 8 A100 GPUs: 1. baseline: all models are colocated 2. standalone rollout: rollout model runs on 4 GPUs and other models run on remaining 4GPUs 3. one step off: the same model placement as the second experiment, but with one step off async training The pictures below demonstrate the results of these experiments: <img src="https://github.com/user-attachments/assets/1df6af46-2242-48e7-a937-a817b278e644" width="30%" height="auto"><img src="https://github.com/user-attachments/assets/bd5c1345-466a-478f-b0d3-95d9a8706496" width="30%" height="auto"><img src="https://github.com/user-attachments/assets/4cf76800-6763-4468-8b1f-b8be9d0fef51" width="30%" height="auto"> In these experiments, baseline has the highest throughput, but we think it is just because we didn't find the best configure for one step off async training. The exciting point is that our nccl based weights updating for rollout model has great performance. The latency is showed below: <img src="https://github.com/user-attachments/assets/388e5736-ef84-4cf0-a586-6543cefb91be" width="30%" height="auto"> At most of time, the latency is under 300ms, which is negligible for RLHF. Although it is only implemented with fsdp and vllm now, we think it is not complex to extend it to the other backend. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. To use this feature, `hybrid_engine` option must be disabled to separate actor model and rollout model into difference GPU cluster. `rollout.n_gpus` option has been added to configure file to indicate how many GPUs rollout model would be occupied. The script below is an example to train `qwen2.5_3b` with 8 GPUs. ```shell python3 -m recipe.async.async_main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=1024 \ data.max_prompt_length=512 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.shuffle=False \ actor_rollout_ref.model.path=Qwen/Qwen2.5-3B-Instruct \ actor_rollout_ref.actor.optim.lr=3e-6 \ actor_rollout_ref.hybrid_engine=False \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=40 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.rollout.n_gpus=4 \ actor_rollout_ref.rollout.load_format=safetensors \ actor_rollout_ref.rollout.layered_summon=True \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.val_before_train=True \ trainer.logger=['console','wandb'] \ trainer.project_name='verl_grpo_example_gsm8k' \ trainer.experiment_name='qwen2.5_3b_grpo_async_one_step_off' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=1 \ trainer.save_freq=-1 \ trainer.test_freq=-1 \ trainer.total_epochs=15 $@ ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. nccl based weights updating for rollout model. 5. one step off async trainer. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: arron <hou.zg@foxmail.com> Co-authored-by: lalala-2 <yrzr12345678@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-07-16 19:45:53 -07:00
H	ef3fffc3a2	[trainer] refactor: no need to call load_reward_manager in compute_reward_async (#2557 ) ### What does this PR do? Simply make changes in https://github.com/volcengine/verl/pull/1406 backward compatible. We'll remove the args for config & tokenizer in next version. Credit to @emergenz ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Franz Srambical <franz.srambical@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-17 09:52:36 +08:00
none0663	f0964b6650	[rollout] fix: fix bug for remax when the rollout mode is async (#2574 ) ### What does this PR do? > fix bug for remax when the rollout mode is async, as metioned in https://github.com/volcengine/verl/issues/2551	2025-07-16 22:45:09 +08:00
Yuchen Cheng	3f63715a96	[doc] fix: fix non-existing tag of base image in docs (#2569 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This pull request fixes the non-existing tag of base image in the docs. `verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4-te2.3` => `verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4` Only [`verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4`](https://hub.docker.com/layers/verlai/verl/base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4/images/sha256-8338539fa36dd8780a9d09eef019f339aa2715f49ac3b6cf738d9ffdba00d75f) and [`verlai/verl:base-cu124-cudnn9.8-torch2.6-fa2.7.4-te2.3`](https://hub.docker.com/layers/verlai/verl/base-cu124-cudnn9.8-torch2.6-fa2.7.4-te2.3/images/sha256-6559fd00b049c43fb3eafc1a90ed7464b83653dd79d5c455b1a678dbdb88b3cd) exist on the Dockerhub. Guess the previous one is the correct one according to the commit history. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/search?q=repo%3Avolcengine%2Fverl+base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4&type=pullrequests - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. N/A ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` N/A ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: rudeigerc <rudeigerc@gmail.com>	2025-07-16 15:59:40 +08:00
杨睿	96b730bbed	[megatron] fix: wrong response_mask for megatron + sglang mutli-turn (#2543 ) ### What does this PR do? when multi-turn is enabled , we need to mask the observation response from input_ids, which is not generated by the model. so we should use `reponse_mask` instead of `attention_mask` for loss calculation ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-16 14:27:07 +08:00
OC	da2ab088d9	[doc] fix: correct link in agentic RL doc (#2567 ) fixed an invalid link in the doc.	2025-07-15 23:26:02 -07:00
Huapeng Zhou	152c599303	[perf] feat: Clip gsm8k solution string to optimize reward calculation (#2568 ) ### What does this PR do? Huapeng: For regular expression matching, sometimes it cost too long for reward calculation, so clip the last 300 chars to speed up. <img width="1974" height="1120" alt="image" src="https://github.com/user-attachments/assets/a339110c-c527-466c-aa83-5efa099b6ba8" /> Similar code(DAPO): https://github.com/BytedTsinghua-SIA/DAPO/blob/main/eval/math_dapo.py#L278 ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 22:51:44 -07:00
Joel	7aabfc437b	[rollout] feat: add ReactAgentLoop based on LangGraph (#2463 ) ### What does this PR do? This is an initial effort to integrate LangGraph into agent loop: 1. add a LangGraph react agent loop implementation 2. add math expression example to demonstrate react agent loop usage. ### Design & Code Changes New components - ChatModel: [custom chat model](https://python.langchain.com/docs/how_to/custom_chat_model/) using LangChain abstractions, implementing following abstract method: - bind_tools: bind tools to the model - _generate: native async generate chat completion message - ReactAgentLoop: [LangGraph react agent](https://langchain-ai.github.io/langgraph/agents/overview/) which can use tools to perform tasks. <img width="593" height="467" alt="image" src="https://github.com/user-attachments/assets/d629b170-03c5-4810-a6b0-4dc27a285c0e" /> ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-16 13:41:04 +08:00
杨睿	6e21c0a625	[megatron] feat: support distributed megatron model converter and merger (#2281 ) ### What does this PR do? - support distributed mcore model converter and merger, especially for huge models like dpskv3 671B - fix model merger bugs for dpskv3, related to https://github.com/volcengine/verl/pull/2125 background: https://github.com/volcengine/verl/pull/2125#issuecomment-2993276556 <img width="1189" height="371" alt="image" src="https://github.com/user-attachments/assets/a317b928-963a-41e5-ae81-d4b6aa669516" /> > We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI Platform Technology Department , dedicated to developing high-performance, easily-scalable distributed post-training engines. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-16 13:36:33 +08:00
Yuge Zhang	1a89141222	[training_utils] fix: uneven support in split (#2560 ) ### What does this PR do? As discussed in #2524, split should support uneven cases to avoid crash in edge cases. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Unit test added. ### API and Usage Example This PR avoids crashes like: ``` assert len(self) % split_size == 0, ( ``` ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-16 13:29:27 +08:00
OC	e300d0f099	[doc] feat: add document for agentic RL related features (#2563 ) ### What does this PR do? add a document to describe new features in Agentic RL scenario. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test n/a ### API and Usage Example n/a ### Design & Code Changes n/a ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-16 12:51:16 +08:00
Mathew Han	3f0773259c	[tool] fix: correctly convert 'None' to null in sandbox fusion _process_single_case (#2409 ) ### What does this PR do? Currently, `stdin_data` is passed into `_process_single_case` as None in [`sandbox_fusion_tools`](https://github.com/volcengine/verl/blob/main/verl/tools/sandbox_fusion_tools.py#L179). In [`_process_single_case`](https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/sandbox_fusion/utils.py#L301), we will call `str(None)` which erroneously converts it to `'None'` (a string) when stdin should be empty. ```python api_response, error_msg = call_sandbox_api( sandbox_fusion_url=sandbox_fusion_url, code=current_generation_code, stdin=str(stdin_data), compile_timeout=timeout, run_timeout=timeout, memory_limit_mb=memory_limit_mb, language=language, ) ``` This PR adds a check for if `stdin_data` is None so that it doesn't get converted and passed into stdin. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Design & Code Changes Add a line of logic to check whether or not `stdin_data` is None. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 20:53:39 -07:00
Chayenne	5f687b211d	[sglang] fix: adding missing param for sgl async unit test (#2561 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Sorry for the carelessness that do not pass the unit test at `tests/workers/rollout/test_sglang_async_rollout_w_interaction.py`. https://github.com/volcengine/verl/actions/runs/16306898259/job/46054785740 Just fix it in the `get_rollout_config` function. The e2e training is correct. Just fix the unit test. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>	2025-07-15 20:22:43 -07:00
H	218298720f	[ci] chore: add single-controller reviewer (#2554 ) ### What does this PR do? add single-controller reviewer so changes are automatically notified. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` cc @hongpeng-guo	2025-07-16 08:59:45 +08:00
Chayenne	f0d4c76ed6	[sglang] feat: update weights in batch with FSDP (#2559 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Thanks so much to @Yangruipis and @zhuzilin, we implemented the group-wise weights update for SGLang in FSDP. We are still testing the speed up in megtron and FSDP. For megatron: https://github.com/volcengine/verl/pull/2418 At sgl, we're currently exploring two approaches to optimize resharding: 1. Grouped calls to `update weights from tensor`: Previously, we called this endpoint for each tensor individually. We're now grouping tensors to reduce the CPU overhead of these calls. 2. Single large data buffer update: We're investigating whether we can form a single large data buffer to update a group of tensors all at once. This would reduce the number of times the IPC handler is opened and closed. For the first approach, we're implementing it separately in Megatron and FSDP. I'm starting by merging the FSDP implementation, and then I'll create a common interface for Megatron. We're still evaluating the second approach to see if it's feasible. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>	2025-07-15 16:57:20 -07:00
杨睿	1fe5daf7f1	[sglang, megatron, perf] feat: speed up megatron sglang weight update by 10x (#2418 ) ### What does this PR do? optimize the performance of sglang+megatron weight update refer to the bucketing implementation of [`THUDM/slime`](`fb7605cc5f/slime/ray/ppo_actor.py (L452)`). \|model\| bucket size MB \|boost \| \| ---- \| ----- \| ---- \| \| Moonlight16B @ 8xH20 \| 512MB \| 175s -> 18s \| \|DeepseekV3 671B @ 512xH20\| 512MB \| ONGOING \| releated to issues https://github.com/volcengine/verl/issues/2419 , https://github.com/sgl-project/sglang/issues/6762 https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/169 similar fixes for FSDP: https://github.com/volcengine/verl/pull/2499 > We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI Platform Technology Department , dedicated to developing high-performance, easily-scalable distributed post-training engines. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-15 14:46:45 -07:00
Nan Jiang	a63243b0dd	[fsdp] fix: change geo3k model name from non-vl to vl (#2555 ) ### What does this PR do? Fix geo3k script `model_name` from non vl model to vl model ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 12:07:42 -07:00
H	166d91a62e	[trainer] refactor: minor code cleanup (#2537 ) ### What does this PR do? clean up entrypoint and train loop ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Rely on existing tests. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-15 09:24:49 -07:00
Joel	2c0ae781d9	[ray] fix: strip [] for ipv6 address (#2545 ) ### What does this PR do? Strip square brackets of ipv6 address `[::1]`, torch `MASTER_ADDRESS` doesn't need it. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 20:29:45 +08:00
Joost van Doorn	2dea2598a1	[data] fix: Add missing init files in verl experimental data folders (#2548 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Upon import of version from main we get this error due to the missing `__init__.py` files. ``` from verl.experimental.dataset.sampler import AbstractSampler ModuleNotFoundError: No module named 'verl.experimental.dataset' ``` The pr in https://github.com/volcengine/verl/pull/2381 forgot to add these files. In this PR I followed what's in existing files and added the missing `__init__.py` files. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 20:29:29 +08:00
ShareLer	10f4eb8cfc	[misc] chore: fix typo in function name (#2525 ) ### What does this PR do? fix typo `gather_outpus_and_unpad` -> `gather_outputs_and_unpad` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: ShareLer <ShareLe@163.com>	2025-07-15 19:06:20 +08:00
Yuge Zhang	473d8ff0c1	[env] fix: bump tensordict to 0.9.1 (#2541 ) ### What does this PR do? Bump to tensordict 0.9.1 and ban 0.9.0 per discussions in #2460. This bug: https://github.com/pytorch/tensordict/issues/1374 has an impact on dp_actor, making it crash because of the wrong batch size. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 19:04:07 +08:00
Simiao Zhang	bbd1288353	[data, megatron] feat: add dynamic batching computational workload balance (#2452 ) ### What does this PR do? To improve computational workload balance when using `use_dynamic_batch`. Sort the resulting micro-batches by their sum of squared sequence lengths (approximate the computation cost of attention) in descending order. This can help reduce imbalance in data parallelism and pipeline parallelism. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2381 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test `using_dynamic_batch_balance` (The line with the suffix `sort` in the figure below) can get better MFU in Qwen2.5-Math-7 DAPO. <img width="835" alt="MFU" src="https://github.com/user-attachments/assets/bc56711c-3d5f-4e91-83e7-29a65f195e57" /> More comprehensive [Experiment Report](https://api.wandb.ai/links/ai4env/tw0zfh5o) ### API and Usage Example modify [`./recipe/dapo/test_dapo_7b_math_megatron.sh`](https://github.com/volcengine/verl/blob/main/recipe/dapo/test_dapo_7b_math_megatron.sh) ```bash python3 -m verl.trainer.main_ppo \ --config-path=config \ --config-name='ppo_megatron_trainer.yaml' \ ... actor_rollout_ref.actor.use_dynamic_bsz=True \ actor_rollout_ref.actor.use_dynamic_bsz_balance=True \ ... ``` ### Design & Code Changes Specific changes: sort the micro batch by their computation workload (approximate by the Attention) after the partition of dynamic batch. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 14:17:28 +08:00
Yaowei Zheng	83d6a80ac0	[fsdp] fix: vlm dynamic batch & unify dynamic batch api (#2524 ) ### What does this PR do? The use of `len(data)` is incorrect since the data is a dict if we enable dynamic batch for VLMs. It will return the number of keys in the dict instead of the number of batch samples. `0b508ab803/verl/workers/actor/dp_actor.py (L540-L542)` `0b508ab803/verl/workers/actor/dp_actor.py (L432-L434)` It can work correctly with pure-text LLMs because the data here is a tensordict that has a `len` API. `0b508ab803/verl/workers/actor/dp_actor.py (L441-L443)` To solve this problem, we use `response_mask.shape[0]` to get the number of samples in dynamic batch. Nevertheless, I think the current implementation isn't elegant because the underlying object processed here can be either a dict or a tensordict. So I unify the APIs of dynamic batch and provide two functions: `prepare_dynamic_batch` and `restore_dynamic_batch`. They can be used for both computing log probs and updating actor. They remove the redundant code and make a clean view for the fsdp workers. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Test with Qwen2.5-VL-3b dynamic batch on the Geo3k dataset ```bash python examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k python -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$HOME/data/geo3k/train.parquet \ data.val_files=$HOME/data/geo3k/test.parquet \ data.train_batch_size=512 \ data.max_prompt_length=1024 \ data.max_response_length=2048 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ actor_rollout_ref.model.path=Qwen/Qwen2.5-VL-3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=128 \ actor_rollout_ref.actor.use_dynamic_bsz=True \ actor_rollout_ref.actor.ppo_max_token_len_per_gpu=6144 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=20 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=6144 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='verl_nightly_ci' \ trainer.experiment_name='qwen2_5_vl_3b_function_rm' \ trainer.n_gpus_per_node=4 \ trainer.nnodes=1 \ trainer.save_freq=-1 \ trainer.test_freq=5 \ trainer.total_epochs=15 ``` Results: orange: before this PR, blue: after this PR <img width="4432" height="1290" alt="image" src="https://github.com/user-attachments/assets/abce366a-98f9-4d97-8a33-9c8a2818c362" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python def prepare_dynamic_batch(data: DataProto, max_token_len: int) -> tuple[list[DataProto], list[list[int]]]: """ Prepare a batch for dynamic batching. Args: data (DataProto): The input data. max_token_len (int): The maximum token length for dynamic batching. Returns: Tuple[List[DataProto], List[List[int]]]: A tuple containing a list of DataProto objects and a list of index lists. """ ... def restore_dynamic_batch(data: torch.Tensor, batch_idx_list: list[list[int]]) -> torch.Tensor: """ Restore a batch from dynamic batching. Args: data (torch.Tensor): The input data. batch_idx_list (List[List[int]]): The list of index lists. Returns: torch.Tensor: The restored data. """ ... ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 14:07:41 +08:00
H	2c407f231f	[cfg] fix: fix _generated_ppo_trainer.yaml pre-commit error on main (#2534 ) ### What does this PR do? - Run scripts/generate_trainer_config.sh to update auto-generated config - Adds missing trace configuration fields (backend, token2text) - Fixes pre-commit hook failure ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-14 19:42:20 -07:00
Blue Space	517cc23c9d	[megatron] feat: allow override DistributedDataParallelConfig (#2523 ) ### What does this PR do? Allow to override `DistributedDataParallelConfig` for custom configurations. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 09:09:52 +08:00
Kai-Hsun Chen	53ec813847	[ray] refactor: Use public method to get node IP (#2521 ) ### What does this PR do? 1. Currently, verl uses `ray._private.services.get_node_ip_address()` to get the node IP. However, it's better to avoid using functions under `_private`. Instead, we should use the public API `ray.util.get_node_ip_address()`. Both are equivalent: `c6e2080a96/python/ray/util/__init__.py (L6)`. 2. Update some methods in `class WorkerHelper` to be `@staticmethods` because they don't rely on the class's state. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-15 09:09:11 +08:00
H	d0c7bbbc05	[cfg] refactor: support +extra.any_key usage for the base dataclass config in verl (#2502 ) ### What does this PR do? This PR makes update to the base config in verl: - support +extra.any_key usage for the base config in verl. - allow selective subfields to be frozen - add a auto-generated config yaml file `verl/trainer/config/_generated_ppo_trainer.yaml` for reference purpose, in case the nested inheritance structure makes the config information too scattered ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test - added frozen field tests ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Now you can pass `--xx.profiler.extra.any_new_key=any_plain_value` in command line to a dataclass inheriting `verl.BaseConfig`. This way we can still pass dataclass configs inside verl but allow some flexiblity in accepting new keys from users' adhoc usage. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Lin <haibin@Lins-Laptop.hsd1.wa.comcast.net> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-15 09:06:56 +08:00
OC	def5b28e3d	[rollout] feat: support mlflow in rollout trace (#2440 ) Implemented mlflow as rollout trace backend. Comparing to weave, mlflow is a lite weight solution and can be deployed on-premises easily. ### API and Usage Example docs/advance/rollout_trace.rst	2025-07-15 05:18:40 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	141b1d3251	[recipe] fix: DAPO rewards using sandbox fusion (#2496 ) ### What does this PR do? Fix some bugs/outdated code so that we can use sandbox fusion for DAPO. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes Use `load_reward_manager` in `verl.trainer.ppo.reward` instead of duplicating the code there. Also, set `acc` in `reward_extra_info` when the returned result is only a float number (e.g. sandbox fusion). ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-07-14 20:10:48 +08:00
Guangming Sheng	0b508ab803	[single_controller] fix: replace unittest.mock.patch with context manager for env var handling (#2498 ) ### What does this PR do? Fixes a critical issue in the Ray worker initialization where environment variables were not being properly preserved when using `unittest.mock.patch`. This could lead to environment variables being unexpectedly deleted after worker initialization. The fix replaces the `patch` usage with a proper context manager for safer environment variable management. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+environment+variables+patch - [x] Format the PR title as `[single_controller] fix: replace unittest.mock.patch with context manager for env var handling` ### Test This change can be tested through the CI system since it affects core worker initialization functionality. The following test scenarios should be covered: - Worker initialization with pre-existing environment variables - Worker initialization without pre-existing environment variables - Multiple worker initializations in sequence - Error cases during worker initialization ### API and Usage Example No API changes. Internal implementation change only. The fix uses a new context manager: ```python @contextmanager def temp_env_var(key: str, value: str): """Context manager for temporarily setting an environment variable.""" original = os.environ.get(key) os.environ[key] = value try: yield finally: if original is None: os.environ.pop(key, None) else: os.environ[key] = original # Usage in worker initialization with temp_env_var("DISABLE_WORKER_INIT", "1"): worker = user_defined_cls(args, *kwargs) ``` ### Design & Code Changes Changes made: 1. Removed dependency on `unittest.mock.patch` 2. Added new `temp_env_var` context manager for safe environment variable handling 3. Updated worker initialization code in two locations to use the context manager: - In `WorkerDict.__init__` for regular worker initialization - In `FusedWorker.__init__` for fused worker initialization 4. Ensures environment variables are properly restored even if initialization fails ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) - [x] Apply pre-commit checks - [ ] Add / Update the documentation - N/A (internal implementation change) - [x] Add unit tests to verify environment variable handling in worker initialization - [ ] Request CI via Slack channel	2025-07-14 16:44:41 +08:00
none0663	fbec86d7fe	[BUG] fix bug for #2506 , when passing as response_mask to policy_loss_fn (#2513 ) ### What does this PR do? [BUG] advantages is incorrectly passed as response_mask to policy_loss_fn in dp_actor.py #2506 fix https://github.com/volcengine/verl/issues/2506	2025-07-14 13:27:48 +08:00
Kai-Hsun Chen	a31a8f251f	[doc] fix: quickstart example can't work on zsh (#2509 ) ### What does this PR do? I followed the instructions at https://verl.readthedocs.io/en/latest/start/quickstart.html to run the PPO example on my devbox, which uses zsh. However, I got the error zsh: no matches found: `trainer.logger=[console]` because `[]` is interpreted as a glob pattern in zsh. ``` (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console'] \ trainer.val_before_train=False \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 \| tee verl_demo.log zsh: no matches found: trainer.logger=[console] ``` This PR has 3 changes: * `trainer.logger=['console']` -> `trainer.logger=console` * `trainer.logger=['console','wandb']` -> `trainer.logger='["console","wandb"]'` * `trainer.logger=['console','tensorboard']` -> `trainer.logger='["console","tensorboard"]'` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * `trainer.logger=console` (zsh) <img width="898" height="564" alt="image" src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80" /> * ``trainer.logger='["console","wandb"]'`` (zsh) <img width="870" height="565" alt="image" src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1" /> * `trainer.logger=console` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger=console \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID: 1799248 (TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra'], (TaskRunner pid=1799248) 'save_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra']}, ``` * `trainer.logger='["console","wandb"]'` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger='["console","wandb"]' \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID: 1805000 (TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra'], (TaskRunner pid=1805000) 'save_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra']}, ``` ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-14 13:26:32 +08:00
YumiMom	4d0f4d056e	[doc] feat: update npu profiler doc and script (#2514 ) ### What does this PR do? Since the profiler has removed the individual configurations for `actor`, `rollout`, and `ref`, and now uses a unified configuration under `actor_rollout_ref.profiler`, the documentation and scripts for the NPU profiler need to be updated accordingly. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-14 11:10:27 +08:00
Kai-Hsun Chen	92758d681c	[env] fix: Change the permissions of `install_vllm_sglang_mcore.sh` from 644 to 755 to allow execution (#2508 ) ### What does this PR do? I followed the instructions at https://verl.readthedocs.io/en/latest/start/install.html#install-dependencies to install verl. The guide asks me to run the script `scripts/install_vllm_sglang_mcore.sh`, but its permission is set to 644. ``` # Make sure you have activated verl conda env # If you need to run with megatron bash scripts/install_vllm_sglang_mcore.sh # Or if you simply need to run with FSDP USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Here are the steps I followed to update the permission. ```sh (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl) ✗ ./scripts/install_vllm_sglang_mcore.sh zsh: permission denied: ./scripts/install_vllm_sglang_mcore.sh (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl) ✗ ll scripts/install_vllm_sglang_mcore.sh -rw-rw-r-- 1 ubuntu ubuntu 2.4K Jul 13 05:04 scripts/install_vllm_sglang_mcore.sh (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl) ✗ chmod +x scripts/install_vllm_sglang_mcore.sh (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl) ✗ ./scripts/install_vllm_sglang_mcore.sh 1. install inference frameworks and pytorch they need Looking in links: https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python Collecting sglang==0.4.6.post1 (from sglang[all]==0.4.6.post1) ... ``` ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-13 15:36:11 -07:00
Huazhong Ji	11e0cf752e	[misc] refactor: remove deprecated codes (#2494 ) ### What does this PR do? After PR https://github.com/volcengine/verl/pull/2257, I think vllm_mode is no longer used ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). cc @eric-haibin-lin	2025-07-13 15:34:31 -07:00
Kai-Hsun Chen	8e0b9bd9e5	[recipe] chore: Remove the duplicate definition of `class Role` (#2503 ) ### What does this PR do? `spin_trainer.py` defines `class Role` which is totally the same as `class Role` defined in `ray_trainer.py`. * `spin_trainer.py` `4aa02fe166/recipe/spin/spin_trainer.py (L55-L66)` * `ray_trainer.py` `4aa02fe166/verl/trainer/ppo/ray_trainer.py (L67)` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test https://github.com/volcengine/verl/blob/main/.github/workflows/e2e_spin.yml ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-13 18:56:02 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4aa02fe166	[trainer] fix: Allow FSDP2 when doing strategy check (#2497 ) ### What does this PR do? Allow FSDP2 when doing strategy check ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. For `strategy` field, now both "fsdp" and "fsdp2" are considered valid. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-07-12 16:31:31 -07:00
Qizhi Chen	eac4863ad7	[env] feat: safely bump py version to 3.10 (#2421 ) ### What does this PR do? This PR safely bumps python version to 3.10 for two reasons: 1. [`removeprefix`](https://docs.python.org/3.9/whatsnew/3.9.html#new-string-methods-to-remove-prefixes-and-suffixes) was introduced in python 3.9 `588f9728f3/verl/single_controller/ray/base.py (L498-L505)` 2. [`match`](https://docs.python.org/3.10/whatsnew/3.10.html#simple-pattern-match-to-a-literal) was introduced in python 3.10 `588f9728f3/verl/tools/utils/tool_registry.py (L81-L92)` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-12 16:29:39 -07:00
askender	6519220006	[trainer] fix: use .keys() to check 'response_mask' in TensorDict (#2491 )	2025-07-12 14:48:13 +08:00
CuiBo	75f2abf0a5	[sglang] fix: Only flush cache on TP rank=0. (#2455 ) ### What does this PR do? > We should call `flush_cache` in the same way it's done in the `_req_level_generate_sequences` function; otherwise, it will cause an error when TP16 is enabled. <img width="575" alt="image" src="https://github.com/user-attachments/assets/ab569ffe-22d1-402c-a58d-741253794a54" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-11 20:54:17 -07:00
keilo	f0b4abaefc	[fsdp] fix: Change the data in the update_actor function from to.('cpu') to to.(get_device_id()) (#2477 ) ### What does this PR do? > When training the Qwen3-32B model by using the DAPO algorithm in a dual-NPU environment, an error occurred during the update actor phase where the partition was found to be empty. We found that the data.to("cpu") operation in the update_actor function differed from the data handling methods in other functions. Rolling it back to data.to(get_device_id()) successfully resolved the error. Further verification confirmed that keeping the data on the device side does not trigger OOM issues. Therefore, we implemented this modification. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Co-authored-by: 王凯宇 <wangkaiyu11@h-partners.com>	2025-07-12 10:53:05 +08:00
Ze-Yi LIN	590a62ae45	[training_utils] feat: log_generations_to_swanlab use table (#2489 ) ### What does this PR do? Enhance the model output logging section in the `log_generations_to_swanlab` function to improve visualization. ![20250712-030651](https://github.com/user-attachments/assets/370f3a04-d1c0-4441-bcc4-ddeb27be5e85) demo link: https://swanlab.cn/@ZeyiLin/verl_examples/runs/e9hgu4yx78ra74bh1346v/chart#YWh6djBw-MloyeE8ybkk= ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-12 08:56:25 +08:00
Blue Space	c3e953cf44	[docker] feat: provide images with deepep (#2480 ) ### What does this PR do? Provide images with deep-ep. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-11 21:12:49 +08:00
Yan Bai	d9a6a31c8d	[megatron] feat: fused kernel lightweight (#2210 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Integrate @Jianbing-D 's fused kernel into megatron side. Memory saving amount may need some further check. ### Test Fused kernel e2e tests #### Alignment <img width="780" alt="image" src="https://github.com/user-attachments/assets/b6929f6d-f98d-49a8-a714-2627f2cb7264" /> #### Performance Gray line is no fused kernel. <img width="384" alt="image" src="https://github.com/user-attachments/assets/8b19e227-6450-4300-9bf4-0ba6a07cbab0" /> ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>	2025-07-11 15:55:41 +08:00
Liwei Ma	ada82bb719	[doc] feat: update documentation of nsight profiling (#2470 ) ### What does this PR do? Update Nsight profiling documentation accordingly ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Merge config actor_rollout_ref.(actor, ref, rollout).profiler to actor_rollout_ref.profiler ### Design & Code Changes Only documentation update ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-11 11:38:28 +08:00
Liwei Ma	1dfc1359da	[perf] feat: add range tag to start/stop profile; clean actor_rollout_ref.profiler (#2456 ) ### What does this PR do? I found the cost of workers start/stop profile is not negligible, there are big gap between steps which is annoying. So I add range tag to them, making it clear. Another change, I realize that `actor_rollout_ref` needs only one `profiler` config, and needn't redundant for each role. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-11 10:12:56 +08:00
Wang Siyuan	49fe461fb8	[doc] chore: add documentation for truncation: middle option (#2462 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. RLHF dataset has gotten a middle option of truncation without annotation and this option is not mentioned in the docs (See PR #1488 ). The annotations are added in this pull request. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Only docs and function comments update, no test needed. > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: ChitandaErumanga <yinyuqi0001@163.com>	2025-07-11 08:11:53 +08:00
H	01624c6da7	[doc] fix: colocation documentation updates (#2465 ) ### What does this PR do? Update docs and awesome work.	2025-07-11 08:11:17 +08:00
Chi Zhang	de38ed4218	[env] feat: upgrade tensordict version (#2460 ) ### What does this PR do? Upgrade tensordict to latest ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-10 16:12:02 -07:00
Frederick Robinson	a8d9d25574	[misc] feat: add py.typed file to `verl/` (#2467 ) ### What does this PR do? Adds a [pep 561](https://peps.python.org/pep-0561/) marker file to express that verl supports types. Now, when I typecheck my package which imports `verl` I no longer have to add `# type: ignore` after `import verl`. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Co-authored-by: Fred <frederrx@amazon.com>	2025-07-10 16:11:23 -07:00
Nan Jiang	1f3f0a5309	[misc] fix: add *.yaml to pyproject due to modular config (#2468 ) ### What does this PR do? Add all yaml file in configs to wheel building. Since current config loading is quite modular, we need to add those files to wheel to avoid loading issue ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-10 16:10:32 -07:00
H	269bb4a4bc	[doc] chore: add ICML meetup and upcoming feat (#2431 ) ### What does this PR do? Update readme	2025-07-10 21:49:51 +08:00
Cheetah	7b523663e3	[hardware] fix: enable sleep mode on ASCEND NPU (#2459 ) ### What does this PR do? We found that there is an OOM issue when running the Qwen2.5-VL model, In the current version, it is necessary to set actor_rollout_ref.rollout.free_cache_engine=True to enable sleep mode. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-10 20:37:31 +08:00
Huazhong Ji	c26b0f2906	[misc] refactor: Replace deepcopy with tensor.clone (#2442 ) ### What does this PR do? Optimize tensor copying in `MegatronPPOActor` by replacing copy.deepcopy with torch.Tensor.clone, which should improve performance slightly. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-10 15:18:34 +08:00
H	fc8acdc607	[cfg] refactor: split fsdp/megatron specific configs, consolidate shared ones for reward_model and critic (#2433 )	2025-07-09 22:00:39 -07:00
Justin Wong	ab11fff33d	[trainer, data] feat: Dynamic Data Generation (#2312 ) ### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%\| \| 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%\|██████████\| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%\| \| 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-07-09 13:25:28 -07:00
Stefan He	b3aed0d6c3	[sglang] fix: Fix qwen2vl weight keys issue (#2434 ) ### What does this PR do? Reapply https://github.com/volcengine/verl/pull/1880 A earlier PR: https://github.com/volcengine/verl/pull/2365 accidentally removed the weight key conversion: ##### Why it wasn't caught by CI? Because all CI are based on transformers 4.51, while the issue only happens for transformer 4.52 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-09 10:24:52 -07:00
Ethan (Yusheng) Su	526098d664	[Hardware] feat: Support AMD (ROCMm Kernel) - Update Dockerfile/Docker Image (#2390 ) ### What does this PR do? > Update Dockerfile/Docker Image ### Checklist Before Starting - [X] Search for similar PRs. - [X] Format the PR title (This will be checked by the CI) ### Test > Done ### API and Usage Example > Usage example(s) [AMD_toturial](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst). ### Design & Code Changes > Dockerfile/Docker Image dependency: ROCm: 6.3.4 (patch version) Pytoch: 2.7.0 vllm: >=0.8.5 sglang: >=v0.4.6.post4 megatron-lm: TransformerEngine==1.14.0, megatron-core==0.12.0 Ray: >=2.45 Also allow VLM training ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-09 10:05:43 -07:00
YumiMom	b5e711eab5	[perf] feat: add npu profiler for FSDP backend (#2194 ) ### What does this PR do? Add verl profiling support for NPU on FSDP backend ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test There should be no functional changes and performance changes. ### API and Usage Example Add `verl.utils.profiler.mstx_profile` which implements the `verl.utils.profiler.profile` interfaces when torch_npu is available. ### High-Level Design This PR references the design of Nsight Systems profiling and implements `mstx_profile` using the torch_npu interface to enable data collection on NPU devices. ### Specific Changes `verl.utils.profiler.mstx_profile` implements the general profiling interface in `verl.utils.profiler.profile` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-09 19:57:36 +08:00
H	cccc2ef2c9	[cfg] refactor: make the rollout & ref configs more modular (#2410 ) ### What does this PR do? move rollout and ref configs to standalone files. cc @ETOgaosion for dp_ref/rollout, default values are added to the yaml if actor_rollout_ref.actor does not exist, so that the yaml can be loaded independently. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Relying on existing tests. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-08 21:49:43 -07:00
CuiBo	ad33564f84	[sglang] fix: Bug in megatron+sglang TP16 update_weights. (#2336 ) ### What does this PR do? > We observe the following when using Megatron + Sglang + TP16: <img width="1236" alt="image" src="https://github.com/user-attachments/assets/875d83e6-325a-41c4-b778-81b457b508a1" /> After investigation, we found that this was caused by the cudaipc mechanism not supporting cross-machine access. We have resolved and fixed this bug. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-08 21:07:09 -07:00
H	4def91d511	[data] refactor: move sampler api to experimental (#2381 )	2025-07-09 09:56:04 +08:00
Wang Siyuan	004da732d3	[rollout] fix: huggingface model config max_position_embeddings assertion for model with extended context length (#737 ) ### What does this PR do? Fix hf config max_position_embeddings assertion error when using rope type yarn. When using extra rotary position embedding methods to extend a model's context window, a new max_position_embeddings should be calculated using the extend scaling factor provided in the config. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes ```python if not model_hf_config.rope_scaling: ... else: rope_scaling_factor = model_hf_config.rope_scaling.get("factor", 1.0) assert model_hf_config.max_position_embeddings * rope_scaling_factor >= config.prompt_length + config.response_length, ( f"model context length should be greater than total sequence length, got rope_scaling_factor={rope_scaling_factor} ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. New test case: A simple test case to reproduce the error of failed assertion when the model is extended using yarn. - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: HL <linhaibin.eric@gmail.com> Co-authored-by: shi yaorui <shiyaorui@dp.tech>	2025-07-08 17:14:23 -07:00
H	a4033afb45	[ci] feat: add docstring checker script and comprehensive docstrings (#2378 ) ### What does this PR do? Added a few files where docstring is enforced. We may expand it further in the future. ``` "verl/trainer/ppo/ray_trainer.py", "verl/trainer/main_ppo.py", "verl/trainer/ppo/reward.py", "verl/utils/reward_score/__init__.py", "verl/trainer/ppo/core_algos.py", "verl/experimental/agent_loop/agent_loop.py", "verl/workers/sharding_manager/fsdp_vllm.py", "verl/workers/sharding_manager/fsdp_ulysses.py" ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-09 05:48:48 +08:00
Shizhan Lu	588f9728f3	[ci] fix: forbid ci on forks (#2412 ) ### What does this PR do? Verl does not prohibit forked branches from initiating CI tasks. Normally, tasks on regular runners will disappear due to the absence of a runner with the same name, but this is not the case for mlp runners. Although CI tasks from these forked branches will fail authentication during subsequent execution, they still generate a large number of requests for mlp. For this reason, we have set it to "not run on forks". ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-08 19:16:29 +08:00
OC	ec4433cd5d	[misc] feat: trace rollout generation and tool calls using weave (#2345 ) ### What does this PR do? Provide rollout generation and tool calls details in wandb weave to help debugging agentic RL. 2 new interfaces: 1. rollout_trace_attr contextmanager: used to mark sample_index、step、rollout_n and experience name for a trajectory. 2. rollout_trace_op decorator：mark the method to trace. It must be a method of an instance. related issue https://github.com/volcengine/verl/issues/2188 ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test <img width="1910" alt="截屏2025-07-03 下午4 09 58" src="https://github.com/user-attachments/assets/ff30bbca-f9c8-434f-a3c2-0e333d16fa68" /> <img width="1895" alt="截屏2025-07-03 下午4 11 27" src="https://github.com/user-attachments/assets/0b9ed8db-58a7-4769-88fb-bda204dc9fc8" /> ### API and Usage Example options: +trainer.rollout_trace.backend=weave: only wandb weave is support in this PR. Leave the reset of trace tool to the community. +trainer.rollout_trace.token2text=False: whether append decoded text in result of run method. ### High-Level Design n/a ### Specific Changes Only works for async rollout from agent loop. No effect for sync rollout. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-08 17:17:46 +08:00
Yuyang Ding	578501e2f8	[sglang] fix: Import Error in the latest sglang (#2275 ) ### What does this PR do? The following import is not supported in sglang >= 0.4.8 ``` from sglang.srt.openai_api.protocol import Tool ``` https://github.com/sgl-project/sglang/releases/tag/v0.4.8: > The `sglang/srt/openai_api` directory has been removed and replaced with `sglang/srt/entrypoints/openai`. So replaced with ``` try: from sglang.srt.entrypoints.openai.protocol import Tool except ImportError: from sglang.srt.openai_api.protocol import Tool ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-07-07 20:07:15 -07:00
Chayenne	ee6542248b	[sglang] fix: only wake up weights on infer_tp 0 (#2403 )	2025-07-07 13:24:56 -07:00
H	3f929af747	[cfg] refactor: make actor config more modular (#2379 )	2025-07-08 00:22:03 +08:00
Alec Henx	1e7c545eef	[tool] fix: Add MCP usage documentation (#2261 )	2025-07-07 08:21:32 -07:00
Shizhan Lu	cb3dcc6f2c	[ci] feat: use action (#2393 ) ### What does this PR do? In https://github.com/volcengine/verl/pull/1979, this PR migrates CI tasks to the vemlp. However, the early version's setup and cleanup steps exposed too much procedural code, which we have encapsulated in https://github.com/volcengine/vemlp-github-runner. For specific usage, refer to the documentation in `.github/workflows/README.md`of this pr. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-07 17:17:19 +08:00
Qizhi Chen	26e26d1e91	[sglang, rollout, doc] fix: update sglang rollout generate doc (#2385 ) ### What does this PR do? This PR updates the documentation for the sglang rollout’s `generate_sequences` by separating single-turn and multi-turn explanations for improved readability. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-07 16:10:18 +08:00
askender	26d3a03b02	[misc] refactor: replace pkg_resources with importlib.metadata (#2392 ) ### What does this PR do? - pkg_resources is deprecated and will be removed as early as 2025-11-30. This patch switches to importlib.metadata to avoid future compatibility issues and suppress warnings. ### Checklist Before Starting - [X] Search for similar PRs. - [X] Format the PR title as `[{modules}] {type}: {description}` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-07 15:02:04 +08:00
Joel	fc35956543	[BREAKING][rollout] feat: repeat DataProto when n>1 in driver instead of rollout workers (#2324 ) ### What does this PR do? Before this PR, when `generate_sequences` with sampling param n>1, DataProto repeat is quit diverge. - validation: DataProto is repeated by `n` in driver, then chunked and dispatched to rollout workers. - training - batch mode: DataProto is chunked and dispatched to rollout workers, then repeated in rollout workers - server mode: DataProto is repeated by `n` in driver, then chunked and dispatched to rollout workers. In batch mode, the `chunk-dispatch-repeat` pattern restricts GRPO training where we have more GPUs than batch_size. For example, `batch_size=128, n=16, world_size=256`: - `chunk-dispatch-repeat`: DataProto(batch_size=128) can't be chunked to 256 shards. - `repeat-chunk-dispatch`: after repeat, DataProto(batch_size=2048) can be successfully chunked. After this PR, always repeat DataProto in driver whether it's validate or training, batch mode or server mode. > [!IMPORTANT] > This change breaks almost all recipes and projects using verl GRPO as submodules. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-07-07 14:57:01 +08:00
OC	4c37c97495	[rollout] fix: sglang async fail with Multi-stage Awake feature (#2365 ) ### What does this PR do? Fix a regression from https://github.com/volcengine/verl/pull/1911, because the PR did not change the sglang async branch. CI did not catch this error because it only run 1 step, but this error happen in the second test. So I update the testcases to run 2 steps. To reproduce the bug, run test: TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh It fail with: ``` (WorkerDict pid=1257286) Total steps: 2, num_warmup_steps: 0 (WorkerDict pid=1257286) Actor use_remove_padding=True (WorkerDict pid=1257286) Actor use_fused_kernels=False (AsyncSglangServer pid=1260392) FastAPI listen on [192.168.111.48:40451](http://192.168.111.48:40451/) (WorkerDict pid=1257286) terminate called after throwing an instance of 'c10::Error' (WorkerDict pid=1257286) what(): CUDA error: an illegal memory access was encountered (WorkerDict pid=1257286) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (WorkerDict pid=1257286) For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (WorkerDict pid=1257286) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (WorkerDict pid=1257286) (WorkerDict pid=1257286) Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first): (WorkerDict pid=1257286) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbf6036c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/)) (WorkerDict pid=1257286) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x64 (0x7fbf60315a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/)) (WorkerDict pid=1257286) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const, char const, int, bool) + 0x118 (0x7fbf6080d918 in ``` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20an%20illegal%20memory%20access%20was%20encountered - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ``` (TaskRunner pid=1647269) step:2 - global_seqlen/min:13075 - global_seqlen/max:14837 - global_seqlen/minmax_diff:1762 - global_seqlen/balanced_min:14231 - global_seqlen/balanced_max:14232 - global_seqlen/mean:14231.5 - actor/entropy:2.0606913566589355 - critic/vf_loss:8.7157882153 ``` ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-07 13:56:07 +08:00
杨睿	5cbad83792	[trainer] fix: Use safe masked mean/sum to handle NaN values outside the mask (#2377 ) ### What does this PR do? - for numerical stability, handle nan outside the mask when calculating masked_mean and masked_sum > We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI Platform Technology Department , dedicated to developing high-performance, easily-scalable distributed post-training engines. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-07 07:45:29 +08:00
Nan Jiang	891c873827	[sglang, rollout] refactor: use torch.Tensor in async rollout schemas (#2362 )	2025-07-06 13:15:35 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	2a01b21331	[ci] fix: PR title check supports module names with underscore (`training_utils`) (#2383 )	2025-07-06 07:32:54 -07:00
H	c71fa392c1	[doc] feat: add July events (#2382 )	2025-07-06 15:17:42 +08:00
shuyhere	281ecd4cc1	[doc] fix: Fix document config.rst (#2369 ) ### What does this PR do? > Fix document config.rst: the parameter“gemma” -> “gamma”. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2322 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-05 09:26:42 -07:00
Chi Zhang	e9b38dc382	Revert "[misc] fix: invalid escape sequence '\*'" (#2376 ) Reverts volcengine/verl#2375	2025-07-05 21:08:40 +08:00
H	cbeb3f4dae	[rollout] fix: fix hf rollout and add single gpu test (#2371 )	2025-07-05 18:51:26 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	50ba712dee	[misc] fix: invalid escape sequence '\' (#2375 ) ### What does this PR do? ```log verl/utils/dataset/rl_dataset.py:38: SyntaxWarning: invalid escape sequence '\' ``` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-07-05 18:49:15 +08:00
Yeonwoo Sung	9cc307767b	[ray] refactor: Seperate the constants into different file (#2025 ) ### What does this PR do? Move the ray runtime env constant into separate file to clean up the code. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-07-04 18:50:44 -07:00
H	9b0e327ecd	[doc] fix: add show source option (#2370 ) ### What does this PR do? Enable API docs with [source code] option	2025-07-04 17:20:12 -07:00
Nan Jiang	715724c88f	[tool] feat: Add support for tools that generate multimodal data (#2146 ) ### What does this PR do? This PR adds support for tools to create and return multimodal data (images and videos) during rollout. It enhances the framework to properly handle multimodal inputs that are dynamically generated by tools during multi-turn conversations. ### Key Features - Tools can now return images and videos as part of their response - Added support for processing multimodal inputs in the rollout system - Introduced a new configuration option `return_multi_modal_inputs` to control how multimodal inputs are processed - Updated documentation with examples of how to implement tools that generate multimodal data ### API and Usage Example ```python async def execute(self, ...) -> Tuple[str \| Dict[str, Any], float, dict]: # Process images or videos from verl.utils.dataset.vision_utils import process_image, process_video img1 = process_image(img1) video1 = process_video(video1) # Return multimodal data return {"image": [img1, ...], "video": [video1, ...], "text": "..."}, 0, {} ``` In your dataset config, set: ```yaml data: return_multi_modal_inputs: False ``` ### Specific Changes - Enhanced `AsyncRolloutRequest` to handle multimodal data from tools - Updated `add_tool_response_messages` to process multimodal content - Added documentation for multimodal tool support in the RST docs - Fixed configuration in example YAML files - Added proper handling of multimodal inputs in the rollout system ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path.	2025-07-04 16:32:22 -07:00
none0663	1b891dc0fb	[cfg] fix: pickleing error in multiprocessing in the reward_fn (#2239 ) ### What does this PR do? > Fix "Can't pickle local object" error when using custom reward functions in multiprocessing When I import `compute_score` from my own Python file, the `file_path` is not None and it will call the `wrapped_fn ` method. You can view the relevant code at the following： `e96f0fbf44/verl/trainer/ppo/reward.py (L25-L57)` When I set` reward_model.reward_manager=prime`, it will call the `ProcessPoolExecutor` to use asyncio, leading to the error: `"Can't pickle local object 'get_custom_reward_fn.<locals>.wrapped_fn'".` Root Cause: The nested closure `wrapped_fn` created in `get_custom_reward_fn() `is unpicklable for the following reasons: - Python's `pickle` cannot serialize local functions (functions defined inside another function). - The closure dynamically captures variables (`raw_fn` and `reward_kwargs`) from its outer scope. This breaks multiprocessing workflows (e.g., `SubprocVecEnv`, `multiprocessing.Pool)` that rely on pickling. --------- Co-authored-by: zelongwang <wang@zelongs-MacBook-Pro.local> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-07-04 16:06:09 -07:00
Frederick Robinson	dbd4ff189b	[data] feat: add interface for user-defined curriculum sampler (#2314 ) ### What does this PR do? This PR introduces a flexible interface that allows users to plug in their own Sampler implementations. This is particularly useful for advanced training strategies like curriculum learning, where the sampling policy evolves over time to progressively present the model with increasingly difficult tasks. Curriculum learning can significantly accelerate training convergence and improve generalization, especially in complex domains. By decoupling the Sampler, users can implement task- or environment-specific curricula—for instance, starting with simpler examples and gradually incorporating harder ones, or adapting sampling based on the model’s competence. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... There have been previous attempts to add specific samplers for curriculum learning ( [search 1](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+sampler) [search 2](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+curriculum) ) but they commit to specific implementations of the curriculum. This PR just adds an interface so that users can supply their own implementation. This approach was suggested in [this comment](https://github.com/volcengine/verl/pull/759/files#r2030220009). - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash ./examples/sglang_multiturn/run_qwen0.5b_gsm8k_multiturn_curriculum.sh ``` This is a run using the example sampler implementation. here is the diff of that script with an existing one in case we decide to omit the example runner in this PR. ```bash diff ./examples/sglang_multiturn/run_qwen0.5b_gsm8k_multiturn_curriculum.sh ./examples/sglang_multiturn/run_qwen3-4b_gsm8k_multiturn.sh 15,17c15 < data.curriculum.curriculum_class="RandomCurriculumSampler" \ < data.curriculum.curriculum_class_path="verl.utils.dataset.curriculum_sampler" \ < data.dataloader_num_workers=0 \ --- > data.train_batch_size=256 \ 20d17 < data.train_batch_size=256 \ 24c21 < actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ --- > actor_rollout_ref.model.path=Qwen/Qwen3-4B \ ``` ### Specific Changes This PR exposes a new interface so that users can implement their own `Sampler`. I also provide a trivial implementation of this interface - `RandomCurriculumSampler` as an example / for test purposes. ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Fred <frederrx@amazon.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>	2025-07-04 10:53:31 -07:00
H	c936ec7d5c	[trainer, cfg] feat: add BaseConfig for all dataclass configs. Introduce dataclass for algorithm related configs (#2147 ) ### What does this PR do? This PR introduces a BaseConfig class that bridges dataclass and hydra's DictConfig in the codebase. In this PR, the algorithm related configs and profiler related configs are instantiated as dataclass upfront for both main_ppo and main_dapo. The config related changes are expected to be backward compatible (supporting xx_config.get() API) Besides, this PR also moves the profiler related files under verl.utils.debug to verl.utils.profiler.xx. The `verl.utils.debug.performance.py` is kept for backward compatibility purpose and we'll drop it in later versions. Main principle: - users are not forced to use dataclass configs. All changes are backward compatible. - dataclass configs are converted upfront on a per entrypoint basis. Here we target main_ppo.py and main_dapo.py, and the other recipes' entrypoints are left intact. - the new dataclass are intentionally set to be frozen. Configs should not be mutable. Whenever a new field is needed, we should make a copy of the config for a new one. - whenever a dataclass config is introduced, we encourage having simple cpu-based unit tests to test the basic functionality of functions that rely on it (e.g. the grpo adv estimation in core_algorithm.py). and then also update all type annotation for the impacted functions. - in the yaml file, `_target_` field should be specified for dataclass conversion. e.g. `_target_: verl.xxx.XXConfig` The PR is built on top of @liuzhenhai93 's contribution. ### Checklist Before Describing the Details - [x] Searched for similar PR(s). - [x] PR title is in the format of: `[modules] type: Title` - modules: `trainer, cfg` - type: `feat` ### Test - Added comprehensive unit tests in `tests/trainer/config/test_algorithm_config_on_cpu.py`, `test_base_config_on_cpu.py` - Tests cover dataclass creation, nested configuration handling, backward compatibility, and integration with core algorithms - All tests pass successfully, validating the functionality and integration with existing code ### High-Level Design The design introduces three dataclasses: 1. `KLControlConfig`: Handles KL control parameters (type, kl_coef, horizon, target_kl) 2. `PFPPOConfig`: Manages preference feedback PPO parameters (reweight_method, weight_pow) 3. `AlgorithmConfig`: Main algorithm configuration containing all fields from the YAML config The conversion uses the existing `verl.utils.omega_conf_to_dataclass` utility to seamlessly convert from OmegaConf DictConfig to typed dataclasses. ### API and Usage Example The API maintains backward compatibility while providing type-safe access: ```python # Before (DictConfig) if config.algorithm.use_kl_in_reward: kl_penalty = config.algorithm.kl_penalty kl_coef = config.algorithm.kl_ctrl.get("kl_coef", 0.001) # After (Dataclass) - Type-safe with IDE support algorithm_config = omega_conf_to_dataclass(config.algorithm) if algorithm_config.use_kl_in_reward: kl_penalty = algorithm_config.kl_penalty # Type-safe access kl_coef = algorithm_config.kl_ctrl.kl_coef # Nested config access # Backward compatibility maintained gamma = algorithm_config.get("gamma", 1.0) # Still works # other cases profiler_config = omega_conf_to_dataclass(config) self.assertEqual(profiler_config.discrete, config.discrete) self.assertEqual(profiler_config.all_ranks, config.all_ranks) self.assertEqual(profiler_config.ranks, config.ranks) assert isinstance(profiler_config, ProfilerConfig) with self.assertRaises(AttributeError): _ = profiler_config.non_existing_key assert config.get("non_existing_key") == profiler_config.get("non_existing_key") assert config.get("non_existing_key", 1) == profiler_config.get("non_existing_key", 1) assert config["discrete"] == profiler_config["discrete"] from dataclasses import FrozenInstanceError with self.assertRaises(FrozenInstanceError): profiler_config.discrete = False ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --show-diff-on-failure --color=always --all-files` - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. Note: This change is fully backward compatible and does not break any existing APIs. The dataclass provides the same interface as the original DictConfig while adding type safety and better structure. --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-04 08:12:09 -07:00
Zhen	5c39b51b4b	[hardware] feat: support ray actor sharing situation on ASCEND NPU (#2341 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Support ray actor sharing with other actors on ASCEND NPU. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Not related. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Not related. ### High-Level Design > Demonstrate the high-level design if this PR is complex. Not related. ### Specific Changes 1. Define global var `VISIBLE_DEVICE_PREFIX` in `verl/utils/device.py` to get `CUDA` or `ASCEND_RT` prefix automatically. 2. Add support for ASCEND NPU when calling `RayClassWithInitArgs` object with param `sharing_with` specified. `433544f0be/verl/single_controller/ray/base.py (L206-L214)` 3. No function params names changed for consistancy. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-04 14:50:52 +08:00
Blue Space	8883b29d86	[trainer] fix: pre-commit broken by #2354 (#2358 ) ### What does this PR do? fix: pre-commit broken by #2354 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-04 14:50:34 +08:00
linxxx3	0d2af476b6	[rollout] fix: #1646 stop words for sglang rollout (#1991 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. as title，fix #1646 ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-07-03 23:44:44 -07:00
Blue Space	ebb21b7fc7	[docker] refactor: Migrate images to verlai, support latest flash attention and newer CUDA versions in future (#2085 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Migrate images to verlai, upgrade CUDA support to 12.6 and support latest flash attention ```txt docker ├── README.md ├── verl0.4-cu124-torch2.6-fa2.7.4 │ ├── Dockerfile.app.sglang.vllm.mcore0.12 │ ├── Dockerfile.app.sglang.vllm.mcore0.13.preview │ ├── Dockerfile.app.vllm.mcore0.12 │ ├── Dockerfile.app.vllm.mcore0.13.preview │ ├── Dockerfile.base │ └── README.md ├── verl0.5-cu126-torch2.7.1-fa2.8.0 │ ├── Dockerfile.app.sglang.mcore0.12 │ ├── Dockerfile.app.sglang.mcore0.13.preview │ ├── Dockerfile.base.fi0.2.6 │ └── README.md └── verl0.5-preview-cu128-torch2.7.1-fa2.8.0 ├── Dockerfile.app.sglang.megatron ├── Dockerfile.base.fi0.2.6 └── README.md ``` - verlai/verl - verl0.4 - base - app.sglang.vllm.mcore - app.vllm.mcore - verl0.5 - base - app.sglang.mcore - app.vllm.mcore [may not support now, for debug] - verl0.5-preview - base - app.sglang.mcore - app.vllm.mcore [may not support now, for debug] ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-07-04 14:32:02 +08:00
Shizhan Lu	a53fb3089e	[ckpt] fix: edit esi doc (#2354 ) This PR addresses the "ESI" comprehension issue left by the previous PR (https://github.com/volcengine/verl/pull/2192). This PR refines `ppo_trainer.yaml` by expanding the esi_redundant_time comment to define ESI (Elastic Server Instance) and draw a parallel to a training plan. In `ray_trainer.py`, it clarifies ESI-related checkpoint-saving conditions. These edits boost code readability and maintainability.	2025-07-04 13:34:12 +08:00
Blue Space	18c6ffcf08	[megatron] fix: optimizer scheduler misalignment with FSDP (#2303 ) ### What does this PR do? Fix learning rate divergence with FSDP, megatron.training's default lr decay policy is linear, but FSDP has not supported this, so return back to `constant`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-04 10:39:40 +08:00
Zhen	212d81463c	[perf] feat: support entropy checkpointing without rmpad or sp (#2342 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Support entropy checkpointing without rmpad or sp ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Not related. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Not related. ### High-Level Design > Demonstrate the high-level design if this PR is complex. Not related. ### Specific Changes Add support for entropy checkpointing without remove_padding or SP ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-04 08:32:03 +08:00
OC	aba26845f7	[tool] fix: avoid exception when sandbox return None (#2346 ) ### What does this PR do? result.strip() may raise exception when it is None. Fixed by return None for metrics and score, because they are not available yet for code sandbox. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+sandbox+ - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-04 08:31:37 +08:00
Joel	0332866857	[algo] feat: mask out observation token in GAE (#2337 ) ### What does this PR do? Mask out observation tokens in GAE for multi turn training. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-04 08:17:25 +08:00
H	11ee5125d2	[ci] chore: add gemini code assistant config (#2349 ) ### What does this PR do? add gemini code assistant config. The example code reviews are in https://github.com/eric-haibin-lin/verl/pull/17#pullrequestreview-2977403886. The threshold is set to high to avoid too many review comments. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-03 16:53:33 -07:00
Loong	7db7f32446	[megatron, fsdp, doc] feat: implement GPG loss. Add GPG advantage estimator implementation. (#2057 ) …and integrate into PPO training scripts and core algorithms ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Implement GPG loss (GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning) which can achieve comparable performance in less training time. ### Test some training records: ![image](https://github.com/user-attachments/assets/e82c5913-94e2-47bf-96b3-b42eac546a18) ![image](https://github.com/user-attachments/assets/2ec4cf1b-a9ee-48d0-b9c5-cbeade1b3a1b) ### Specific Changes > List the specific changes. Add doc of GPG in docs/algo/gpg.md Add the addvantage estimation function of gpg in verl/trainer/ppo/core_algos.py. Add compute_gpg_loss function of gpg in verl/ trainer/ppo/core_algos.py. Add a conditional branch to determine whether to use the GPG loss in verl/workers/actor/dp_actor.py and megatron_actor.py Add example scripts of GPG in examples/gpg_trainer. ### Usage Example ```shell # Add code snippet or script demonstrating how to use this bash examples/gpg_trainer/run_qwen2-7b_math.sh ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-07-03 15:22:19 -07:00
Yuge Zhang	bc2cc6b34b	[rollout] feat: Allow customization of async server class (#2326 ) ### What does this PR do? This PR contains two aspects: 1. Introduction of a new configuration option `actor_rollout_ref.rollout.custom_async_server` to allow users to customize the async server class. 2. Make `load_extern_type` more robust and support prefix like `pkg://` or `file://`, while non-breaking to any existing features and supported paths. Without this PR, it's impossible to use a customized version of AsyncvLLMServer in customized use case. We are currently using a set of ugly monkey patch to achieve this goal. Ultimately I believe `rollout.name` and `rollout.custom_async_server` can be combined. But `rollout.name` is currently referenced in too many places. It's quite difficult for me to handle all of them. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test I have tested on our internal pipelines. The new patch works as expected and the old async servers still work as usual. ### API and Usage Example Our config is something like this: ```yaml hydra: searchpath: - pkg://verl/trainer/config defaults: - ppo_trainer - _self_ data: filter_overlong_prompts: false actor_rollout_ref: rollout: mode: async custom_async_server: path: pkg://mypackage.verl.async_server name: CustomizedvLLMServer ``` ### High-Level Design This PR is pretty straightforward. ### Specific Changes Update the docs. Update behavior in agent loop and async server manager. Update `load_extern_type` implementation. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: I think it's quite troublesome to add a CI for this feature. I can add one if you feel necessary. - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-03 17:52:20 +08:00
Yan Bai	433544f0be	[megatron] feat: use mbridge as megatron adaptor (#2064 ) ### What does this PR do? MBridge provides a seamless bridge between Hugging Face models and Megatron-Core's optimized implementation for efficient distributed training and inference. It also offers necessary tools and processes for integrating Reinforcement Learning (RL) with Megatron. see https://github.com/ISEEKYAN/mbridge mbridge is developed and maintained by NVIDIA, providing functions for: - modeling HF models with megatron - loading/saving HF format weights with no memory overhead - online export parameter to rollout engine with per-tensor-generator - RL specific optimization and friendly APIs on Megatron side. Some early access features for megatron. with mbridge, the direct improvement is: - a clean interface for megatron - no offline dist_ckpt conversion needed - no offline model merger needed ### Test tested with GSM8k qwen2-7B-instruct <img width="486" alt="image" src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800" /> ### High-Level Design add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is False. Set it to true for enable. when enabled, the model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator will be taken over by mbridge ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example add this line to the script: ``` actor_rollout_ref.actor.megatron.use_mbridge=True \ ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-07-03 12:49:51 +08:00
Shiye Lei	0ea96a2673	[cfg] chore: add non-negative expected_len assertion (#2330 ) #### Summary Added a assertion when the overlong buffer configuration is invalid, specifically when `overlong_buffer_len > max_response_length` which causes `expected_len` to be negative. #### Problem When `overlong_buffer_len` is greater than `max_response_length`, the calculated `expected_len` becomes negative: ``` expected_len = max_resp_len - overlong_buffer_len # Results in negative value ``` This causes all reasonable response lengths to be penalized. #### Solution Added a `assert self.max_resp_len >= self.overlong_buffer_cfg.len` in DAPORewardManager #### Changes Made 1. File: `verl/workers/reward_manager/dapo.py`	2025-07-03 10:57:26 +08:00
rhiremat	4a846aa8f5	[hardward] chore: Enable Generation of Wheel File During Docker Build (#2332 ) ### What does this PR do? The PR enhances the Dockerfile.rocm by generating a Python wheel (.whl) as a part of Docker build process. Changes introduced: - Add python setup.py bdist_wheel immediately after pip install -e . --no-deps - The wheel is created inside the container under the dist/ directory Co-authored-by: HIREMATH <rhiremat@ctr2-alola-ctrl-01.amd.com>	2025-07-02 13:10:51 -07:00
none0663	1a4b9779ec	[cfg] fix: Security Enhancement Block Dangerous Modules in Sandbox Environment (#2170 ) ### What does this PR do? > This PR enhances security in our sandbox environment by disabling access to potentially dangerous Python modules. 1. Added blocking for subprocess and ctypes modules by setting them to None in sys.modules 3. Prevents execution of system commands via subprocess.run(), subprocess.Popen(), etc. 4. Blocks low-level system access through ctypes which could bypass Python security restrictions some built-in functions that can be destructive like below ``` import subprocess subprocess.run("rm -rf ", shell=True) ``` ``` import ctypes libc = ctypes.CDLL(None) libc.system(b"rm -rf /") ``` --------- Co-authored-by: zelongwang <wang@zelongs-MacBook-Pro.local>	2025-07-02 22:30:33 +08:00
Joel	29f50e7dbe	[recipe] feat: add retool recipe (#2233 ) Add retool training recipe described in [ReTool: Reinforcement Learning for Strategic Tool Use in LLMs](https://arxiv.org/abs/2504.11536).	2025-07-02 20:05:43 +08:00
CurryRice233	2a25e31d29	[doc] feat: FSDP forward prefetch and entropy memory optimizations (#2322 ) ### What does this PR do? @eric-haibin-lin As this comment says https://github.com/volcengine/verl/pull/1927#issuecomment-3018262885, add FSDP forward prefetch and entropy calculation memory optimization to performance tuning guide. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-02 19:21:48 +08:00
Chunyu	1bdf4d2bc7	[hardware, recipe, ci] feat: Support fsdp peft sft on npu (#2240 ) ### What does this PR do? - Support fsdp peft sft on npu. - Add CI actions to maintain peft sft and sequence parallelism function on npu. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example Run examples/sft/gsm8k/run_qwen_05_peft_sp2_npu.sh on gpu and npu: ```xshell torchrun --standalone --nnodes=1 --nproc_per_node=8 \ -m verl.trainer.fsdp_sft_trainer \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.prompt_key=extra_info \ data.response_key=extra_info \ optim.lr=1e-4 \ data.prompt_dict_keys=['question'] \ +data.response_dict_keys=['answer'] \ data.micro_batch_size_per_gpu=64 \ model.partial_pretrain=Qwen/Qwen2.5-0.5B-Instruct \ trainer.default_local_dir=$save_path \ trainer.project_name=gsm8k-sft \ trainer.experiment_name=gsm8k-sft-qwen-2.5-0.5b-instruct \ trainer.logger=['console'] \ trainer.total_epochs=2 \ trainer.default_hdfs_dir=null $@ \ model.lora_rank=32 \ model.lora_alpha=16 \ model.target_modules=all-linear \ model.strategy=fsdp \ ulysses_sequence_parallel_size=2 \ use_remove_padding=true ``` Mean absolute error of train loss: ![train_loss_mae](https://github.com/user-attachments/assets/f0c436ae-4d92-44c9-bca8-0b7cde1f4cfe) ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes Enable sp: ```xhell --ulysses_sequence_parallel_size=2 --use_remove_padding=true ``` NPU does not support sdpa2, so we need to set model.strategy: ``` --model.strategy=sdpa ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-02 15:06:06 +08:00
Chayenne	82d1ef5af2	[sglang] feat: Repeat sampling parameter n into requests of GRPO in SGLang (#2258 ) ### What does this PR do? For a large-scale GRPO with a huge sampling parameter n (say 128 or more), we take the sampling times n out to directly duplicate the requests. This is beneficial if our n is relatively large. But we need to check the order of the input and output requests. We create unique UIDs for each prompt to enable grouping in GRPO advantage computation 1. Each prompt gets a unique UID that is repeated n times along with the prompt 2. After generation, responses are aligned with prompts using UID matching 3. In GRPO advantage computation, UID groups responses from the same prompt Note that we only enable this for sglang `_req_level`, i.e., in ma ulti-turn setting GRPO. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Will add experiments on the change of rollout time. ### API and Usage Example N/A ### High-Level Design N/A ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: zhaochenyang <zhaochenyang20@gmail.com>	2025-07-02 09:27:24 +08:00
Yuyang Ding	becdb56795	[CI] fix: replace private model in CI test (#2295 ) ### What does this PR do? The CI test remote GenRM uses a private model `dyyyyyyyy/Qwen2.5-1.5B-GenRM-QueryOnly`. For the stability of the CI, this model has been uploaded to the official HF repository, i.e., `verl-team/GenRM-CI-Test-1.5B`, and the model invocation in the CI test has been updated accordingly. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example See `What does this PR do?` ### High-Level Design See `What does this PR do?` ### Specific Changes See `What does this PR do?` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-01 15:40:06 +08:00
Chi Zhang	211984b66f	[doc] fix: Update ascend_quick_start.rst (#2293 ) ### What does this PR do? Fix doc ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-01 14:45:22 +08:00
Zhen	ba026ef332	[ci, doc] fix: fix transformers version dependency on Ascend NPU (#2291 ) ### What does this PR do? Flash Attention2 in `transformers==4.53.0` is not work on Ascend NPU due to [PR line here](`3457e8e73e/src/transformers/modeling_flash_attention_utils.py (L109C5-L109C23)`) in `transformers`. In order to not affect `e2e_ascend` CI, we have to set `transformers==4.52.4` for Ascend NPU situation by force now. Corresponding bugfix in `transformers` will be conducted as soon as possible, after newer transformers version containing bugfix released, we will update the transformers version dependency in verl again. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Not related ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Not related ### High-Level Design > Demonstrate the high-level design if this PR is complex. Not related ### Specific Changes `transformers` version in `requirement-npu.txt` and document ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-01 13:45:17 +08:00
H	b66901505f	[doc] chore: add contribution guide (#2290 ) ### What does this PR do? add contribution guide TODO: add one specific doc for the workflow of adding new models ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`	2025-07-01 09:55:40 +08:00
H	00a10a8ef3	[ci] refactor: reduce ruff line-length from 300 to 120 (#2287 ) ### What does this PR do? Previously the ruff line-len is too large, making it hard for users to view code. If we keep the config, manually created short lines will be formatted to long lines as well. This PR contains 3 commits: - df4bbfca62f41d972c48c8a76088ae2ac29691cf set line len to 120 and run pre-commit auto-format - 9d03f183edd9fff4e22215cacacf62c06b7b41d3 let devin fix the multi-line code - 9fc8d436f5007535fad3dc49983b01d0d457be9c skip lint for test_sglang_async_rollout_sf_tools.py. manually adjust format for rope_utils.py - last two commits: 1. merge with main 2. run lint after merge. add test_sglang_async_rollout_sf_tools.py and scripts/legacy_model_merger.py to lint.exclude ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test This PR relies on CI for testing. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-01 09:54:40 +08:00
Shawn/Yuxuan Tong	0508af25b6	[doc] feat: more resources (#2284 ) ### What does this PR do? Add some resources about verl to the documentation. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 13:50:44 -07:00
Qunhong Zeng	024a8b8578	[ckpt, doc] chore: add backward compatibility for model merger and sync docs (#2251 ) ### What does this PR do? This PR add missing doc changes in https://github.com/volcengine/verl/pull/2125: - Synchronize checkpoint content and verl.model_merger with the latest code - Add content on how to merge checkpoints in the quick start documentation to help users understand how to merge checkpoints ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 18:42:59 +08:00
Blue Space	8b33abd84f	[megatron] feat: add megatron memory log (#2272 ) ### What does this PR do? Log memory footprints in wandb during running like FSDP does. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 15:27:02 +08:00
Qunhong Zeng	6d9ac2f7b8	[algo] fix: correctly aggregate kl metrics in PPO actor (#2259 ) ### What does this PR do? This PR fix an issue in dp_actor where `actor/kl_loss` and `actor/kl_coef` were being continuously overwritten during the micro-batch processing loop. Previously, the long-lived `metrics` dictionary was updated directly, causing the value for these metrics to reflect only the final micro-batch of any given step, rather than an aggregation of all micro-batches within that step. This change refactors the logic to align the collection of all metrics, now `kl_loss` is collected for each micro-batch, the same as other metrics like `pg_loss`. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 15:26:07 +08:00
CongLin	7ac0d98f09	[trainer, vllm] feat: add lora exclude_modules to support VL model lora training (#2182 ) ### What does this PR do? Regarding multimodal models, vLLM currently only supports adding LoRA to language model. We can use exclude_modules in lora config to exclude the ViT part from applying lora finetuning. Anyway, just prepare this feature for any possible use. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+lora+exclude+is%3Aopen. you will see my closed pr - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Qwen2.5-VL-7B + GRPO + geo3k: - blue: full parameters - yellow: all-linear + exclude visual, lora rank 64 - red: all-linear w/o exclude visual, lora rank 64 - purple: ["q_proj","gate_proj"], + exclude visual, lora rank 16 - The red directly failed as expected with KeyError: 'blocks.0.attn.qkv.base_layer.weight'. Any mismatching in module names will also fail directly, so successful runs validate the correctness. - The val generations of lora VLM all look normal. - Add "Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with rmpad using function rm" test to e2e_ppo_trainer_vllm_vlm ![企业微信截图_1750837404244](https://github.com/user-attachments/assets/336261f0-5260-45e2-8312-86eb1ae375a5) ![企业微信截图_1750837525244](https://github.com/user-attachments/assets/eafae66e-6b61-4db4-853b-a3a0425be2aa) ![企业微信截图_17508374562057](https://github.com/user-attachments/assets/f01b098a-b383-4cc6-8f14-d51978121b59) ![企业微信截图_17508374786794](https://github.com/user-attachments/assets/75b4d566-cb63-4b63-9b85-300e02711739) ![企业微信截图_17508374937879](https://github.com/user-attachments/assets/8ed2979d-30b7-4d4f-85ad-0fee6aded619) ### API and Usage Example For Qwen2.5VL, set `actor_rollout_ref.model.exclude_modules='.visual.'` It should be similar for other VLMs, e.g. `actor_rollout_ref.model.exclude_modules='.vision_tower.'` for kimi-vl. To avoid failure for special architectures, specifying actor_rollout_ref.model.target_modules is recommended over setting actor_rollout_ref.model.target_modules=all-linear ### High-Level Design The main conflict is that unlike Peft which only adds base_layer to validated target_modules, vllm adds base_layer to all-linaer modules (q/k/v/gate/up/down) of LLM with lora applied. When dealing with modules to be stacked in vllm (qkv, gate_up), base_layer must be added to their module name, which can unexpectedly involve all-linear modules of the visual architecture, as is in current `FSDPVLLMShardingManager.update_params.replace_lora_wrapper`. My solution is prioritizing lora exclude_modules to ensure that lora and base_layer will not be added to ViT, while the rest cases should remain unchanged. ### Specific Changes - add exclude_modules field to ppo_trainer.yaml - adapt check_target_module_exists from Peft for standard target/exclude_modules checking - refactor replace_lora_wrapper in sharding_manager/fsdp_vllm.py for correctly matching base_layer modules > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 12:15:48 +08:00
H	52065c6405	[BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support (#2257 ) ### What does this PR do? This PR removes support for vLLM versions 0.5.4 and 0.6.3 from the verl repository, completing a comprehensive cleanup of legacy version-specific code branches. The changes simplify the codebase by eliminating conditional logic and version-specific implementations, requiring users to upgrade to vLLM 0.7.0 or later (recommended: vLLM 0.8.3+). Key Changes: - Deleted legacy rollout implementations (`fire_vllm_rollout.py`, `vllm_rollout.py`, `test_vllm_hf_loader.py`) - Removed version-specific directories (`vllm_v_0_5_4`, `vllm_v_0_6_3`) - Simplified sharding managers by removing `customized_vllm` flag conditionals - Updated configuration files to remove deprecated options (`use_fire_sampling`) - Cleaned up documentation and environment variable exports ### Checklist Before Starting - [x] Search for similar PRs: No similar PRs found for this specific cleanup - [x] Format the PR title as `[BREAKING][vllm, rollout, worker] refactor: Remove vLLM 0.5.4 and 0.6.3 support` - Modules: `vllm`, `rollout`, `worker` (primary affected components) - Type: `refactor` (code cleanup and simplification) - Breaking: Yes, requires vLLM version upgrade ### Test This PR has been validated through: - CI Pipeline: All existing tests pass with vLLM 0.7.0+ (27 checks pending/running) - Version Detection: New version check logic properly rejects vLLM 0.5.4/0.6.3 with clear error messages - Merge Conflict Resolution: Successfully resolved complex conflicts during main branch merge - Pre-commit Checks: All linting and formatting requirements satisfied ### API and Usage Example Breaking Changes: - vLLM Version Requirement: Minimum supported version is now 0.7.0 (recommended: 0.8.3+) - Removed Configuration Options: `use_fire_sampling` no longer available in config files - Environment Variables: `VLLM_ATTENTION_BACKEND=XFORMERS` exports removed (not needed for vLLM 0.7.0+) Migration Guide: ```bash # Before: vLLM 0.5.4/0.6.3 with custom flags pip install vllm==0.6.3 export VLLM_ATTENTION_BACKEND=XFORMERS # After: vLLM 0.8.3+ with V1 API pip install vllm>=0.8.3 export VLLM_USE_V1=1 # Recommended for optimal performance ``` Updated Configuration: ```yaml # generation.yaml - removed use_fire_sampling option rollout: name: vllm_rollout # use_fire_sampling: False # <- REMOVED # Use standard vLLM rollout without legacy options ``` ### High-Level Design ```mermaid graph TB subgraph "Before: Multi-Version Support" A1[vLLM Version Check] --> B1{Version 0.5.4?} A1 --> B2{Version 0.6.3?} A1 --> B3{Version 0.7.0+?} B1 --> C1[Legacy vllm_v_0_5_4 Code] B2 --> C2[Legacy vllm_v_0_6_3 Code] B3 --> C3[Modern vLLM Code] end subgraph "After: Simplified Support" A2[vLLM Version Check] --> B4{Version >= 0.7.0?} B4 -->\|Yes\| C4[Modern vLLM Code Only] B4 -->\|No\| C5[Clear Error Message] end ``` ### Specific Changes Deleted Files: - `verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py` - `verl/workers/rollout/vllm_rollout/vllm_rollout.py` - `tests/workers/rollout/rollout_vllm/test_vllm_hf_loader.py` - `verl/third_party/vllm/vllm_v_0_5_4/` (entire directory) - `verl/third_party/vllm/vllm_v_0_6_3/` (entire directory) - `pytest.ini` Modified Core Files: - `verl/third_party/vllm/__init__.py`: Simplified version detection with clear error messages - `verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py`: Removed cache engine management and version conditionals - `verl/workers/sharding_manager/fsdp_vllm.py`: Dropped `customized_vllm` flag logic - `verl/workers/sharding_manager/megatron_vllm.py`: Simplified weight loading and cache management Configuration Updates: - `verl/trainer/config/generation.yaml`: Removed `use_fire_sampling` option - `verl/trainer/config/ppo_trainer.yaml`: Removed `use_fire_sampling` option - `tests/special_sanity/check_api_docs.py`: Removed `LLMEngine` from whitelist Documentation Updates: - `docs/start/install.rst`: Updated to recommend vLLM 0.8.3+ with `VLLM_USE_V1=1` - `docs/perf/perf_tuning.rst`: Updated performance recommendations - Removed 42+ `VLLM_ATTENTION_BACKEND=XFORMERS` exports from bash scripts Reverted Changes: - `.github/workflows/vllm.yml`: Restored original container image names - `docs/faq/faq.rst`: Restored original apptainer commands - `docs/ascend_tutorial/ascend_quick_start.rst`: Reverted all modifications - `examples/tuning//`: Restored original `nproc_per_gpu` settings ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs): Updated install and performance tuning docs - [x] Add unit or end-to-end test(s): Existing CI tests validate the changes; legacy-specific tests were removed as intended - [x] CI Request*: Once PR is ready, message will be sent to `ci-request` channel in verl Slack workspace --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-06-29 19:27:22 -07:00
Joel	72429f21b7	[rollout] feat: add zeromq vllm distributed executor (#2246 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 09:24:44 +08:00
H	2805ce9137	[doc, ci] fix: fix sandbox doc and enhance CI trigger filter and doc error checking (#2267 ) ### What does this PR do? - fix sandbox doc - enhance CI trigger filter and doc error checking - add a rule to check PR description ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-30 08:02:03 +08:00
Chi Zhang	86ef66ebe6	[trainer] fix: fix split placement (#2227 )	2025-06-29 12:42:51 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	afee3acb5d	[rollout] fix: Make `free_cache_engine` option workable in latest vLLM/SGLang (#1464 ) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? Make `free_cache_engine` option workable in latest vLLM/SGLang ### High-Level Design It looks like `actor_rollout_ref.rollout.free_cache_engine` control option only works for vLLM version 0.5.4 and 0.6.3, the sleep / wake up mode in vLLM engine, as well as release / resume memory occupation in SGLang is enabled by default and there's no way to turn them off. While always alllowing inference engine to free cache can be ideal, it's unfortunately not supported on some devices, such as AMD Mi250x, since it doesn't support virtual memory management: https://github.com/vllm-project/vllm/pull/12695#issuecomment-2633919751 So we would need to be able to turn it off so that verl can run on those devices. In addition, it looks like we no longer need to enforce eager in latest vLLM when we choose to free cache, so this PR also lifted this restriction. ### Additional Info. - Issue Number: None - Training: both FSDP and Megatron - Inference: both vLLM and SGLang ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] Add CI test(s) if neccessary. --------- Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-06-29 08:09:54 -07:00
Yuyang Ding	072725c385	[trainer, recipe] feat: add support for external generative reward models (#2121 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Support External Generative Reward Model. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-29 14:42:14 +08:00
H	7559a6a938	[doc] fix: add time info for each doc, assert sphinx warning in CI (#2255 ) ### What does this PR do? add time info for each doc, assert sphinx warning in CI. The time info is helpful for the community to identify docs that may be too old before it's actually removed or updated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-06-29 11:58:35 +08:00
H	bd1be62df0	[ci] fix: fix cpu dataset git download error (#2256 )	2025-06-28 16:06:01 -07:00
Tianyun Zhao	fda87b8046	[worker] fix: OOM on first iteration in multi-turn RL (#2253 ) ### What does this PR do? Fix issue #2189. This bug was introduced in #1911, which relocated `resume_memory_occupation` in resharding phase before calling `get_torch_device().empty_cache()`. Calling `resume_memory_occupation` without emptying cache before will cause OOM on resharding phase of the first iteration, which prevents the example `run_qwen2.5-3b_gsm8k_multiturn` to run. Re-adding `get_torch_device().empty_cache()` solves the problem, and allows the example to run again. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-28 17:35:33 +08:00
H	a306434806	[doc] chore: version bumped to v0.4.1.dev and doc fixes (#2226 ) v0.4.1 is released and bump the version number to v0.4.1.dev	2025-06-27 20:14:23 -07:00
OC	ce6a7b8449	[rollout] fix: use flashattn3 backend in sglang to avoid error in tool call (#2244 ) ### What does this PR do? Fix error found in https://github.com/volcengine/verl/issues/2242 In none-hopper gpu, llm can not use tools because sglang use flashinfer in default on this type of hardware. Changed backend to flashattn3 to avoid this error ### Test ROLLOUT_NAME=sglang pytest -svvv tests/experimental/agent_loop/test_basic_agent_loop.py ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-28 09:35:53 +08:00
Yaowei Zheng	8ba2f27cb2	[misc] chore: pin transformers under 4.53 (#2241 ) ### What does this PR do? Transformers 4.53 does not work with the current vLLM for Qwen2-VL models: https://github.com/vllm-project/vllm/issues/19833#issuecomment-3011175952 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-27 18:12:18 +08:00
Yaowei Zheng	e96f0fbf44	[model] fix: separate minicpmo data (#2212 ) ### What does this PR do? This PR moves the data process code of minicpm-o to recipes to avoid breaking the current function Fixes #2178 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-27 14:20:40 +08:00
Xiang Long	b816d17056	[sglang] feat: Add multi-interaction registry support and testing (#2184 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. #1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation	2025-06-27 14:00:37 +08:00
Shizhan Lu	d8ecba318f	[ckpt] feat: support esi execution environment (#2192 ) Volcengine provides users with ESI(Elastic Instance). We supported the reserved instances for the vemlp, and when an ESI is about to expire, the logic to save the checkpoint (CKPT) will be triggered to reduce training data loss. We also support ESI for AWS.	2025-06-27 11:07:27 +08:00
Chi Zhang	466ef1ad47	[misc] fix: add license (#2230 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-27 11:06:31 +08:00
Joel	790a8a29c5	[rollout] feat: add agent loop (#2124 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Add AgentLoopBase and AgentLoopManager for agentic rollout. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design New Components: - AgentLoopBase: abstract class represents the loop of a single prompt's rollout, in the loop, agent may chat with OpenAI compatible LLM server and interact with various environments. - AsyncLLMServerManager: send chat completion requests to multiple LLM servers, providing load balance and sticky session. - AgentLoopManager: get a batch of prompts from dataloader and split to multiple AgentLoopWorker - AgentLoopWorker: for each prompt, create a AgentLoopBase instance, run loop task. <img width="885" alt="image" src="https://github.com/user-attachments/assets/1f949719-c000-4b94-9ee2-c8a8ff71b109" /> ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-27 09:29:29 +08:00
xichengpro	b2235f0a55	[recipe] fix: unsupported operand type(s) for \|: 'dict' and 'DictConfig' (#2217 ) ### What does this PR do? #### Fix https://github.com/volcengine/verl/issues/2216 #### 1 Fix Config Reference in entropy_trainer.yaml #### 2 Fix TypeError When Merging `reward_kwargs` and `cfg_reward_kwargs` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. #### 1 Fix Config Reference in entropy_trainer.yaml - Modified File : `recipe.entropy.config.entropy_trainer.yaml` - Change: ```yaml - reward_model.reward_kwargs.overlong_buffer_cfg: $reward_model.overlong_buffer + reward_model.reward_kwargs.overlong_buffer_cfg: ${reward_model.overlong_buffer} ``` - Purpose : Ensures OmegaConf correctly resolves the reference as a DictConfig object instead of interpreting it as a string. #### 2 Fix TypeError When Merging `reward_kwargs` and `cfg_reward_kwargs` - Modified File : `recipe.entropy.main_entropy.py` - Change : ```yaml - reward_fn = load_reward_manager(config, tokenizer, num_examine=0, (merge_dict(reward_kwargs, cfg_reward_kwargs))) + reward_fn = load_reward_manager(config, tokenizer, num_examine=0, OmegaConf.merge(OmegaConf.create(reward_kwargs), cfg_reward_kwargs)) ``` - Purpose : Use OmegaConf.merge() to safely merge dict and DictConfig types. > Background : > The DAPORewardManager class accesses the `enable` attribute from `overlong_buffer_cfg`. > This fails if `overlong_buffer_cfg` is a regular dict. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-26 17:27:01 -07:00
jvmncs	ed0f308acb	[ckpt] fix: conditionally import fsdp/megatron backend (#2224 ) ### What does this PR do? `verl/model_merger/__main__.py` allows the user to specify either the FSDP backend or the Megatron backend, but it forces the user to have both backends installed. This change moves those imports under the backend conditional, relieving that requirement. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=merger+is%3Aopen+ - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: hard to check these conditionals in the CI environment, since both dependencies are in the runner's image - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-26 16:40:47 -07:00
xingyunjohn1	ff750e2472	[trainer] fix: indentation error leading to critic_output.get() failure (#2143 ) ### What does this PR do? This PR addresses an `IndentationError` that was causing the `critic_output.get()` call to fail when `self.use_critic` was false. ### Checklist Before Starting - [x] Search for similar PRs. [The PR cause the problem](https://github.com/volcengine/verl/pull/281) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > None. This is just a simple bug fix involving a few lines of code. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > This is just a simple bug fix involving a few lines of code. ### Specific Changes > This is just a simple bug fix involving a few lines of code. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-26 14:50:09 -07:00
OC	7b824077d5	[misc] feat: support ValidationGenerationsLogger in vemlp_wandb (#2191 ) ### What does this PR do? Implement vemlp_wandb in ValidationGenerationsLogger in order to write validation log into it.	2025-06-26 20:34:25 +08:00
Jaewan Park	4f1ece8bed	[recipe] fix: parameter order in RayPRIMETrainer super().__init__() call (#2172 ) ### What does this PR do? <!-- > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. --> - Fixes incorrect parameter order in `RayPRIMETrainer.__init__()` when calling `super().__init__()`. - The missing `processor` parameter was causing all subsequent positional arguments to be passed to wrong parameters, leading to `reward_fn` being passed as `processor` and `val_reward_fn` being passed as `reward_fn`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+PRIME - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example <!-- > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` --> - No breaking changes to existing API ### High-Level Design <!-- > Demonstrate the high-level design if this PR is complex. --> - Simple parameter alignment fix, no design changes ### Specific Changes <!-- > List the specific changes. --> - Added `reward_fn=my_reward_fn` and `val_reward_fn=my_val_reward_fn` to the `super().__init__()` call in `RayPRIMETrainer.__init__()` to maintain correct parameter alignment with parent class RayPPOTrainer - Ensures `reward_fn` and `val_reward_fn` are passed to their intended parameters instead of being shifted due to missing processor argument ### Checklist Before Submitting <!-- > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. --> - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-26 19:37:36 +08:00
Yaowei Zheng	a9e3a8fa41	[model] fix: make vlm patch forward compatible (#2215 ) ### What does this PR do? Fixes #2213 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-26 19:35:53 +08:00
Yuge Zhang	85bacb1ccc	[trainer] fix: Add __init__.py to verl.trainer.config (#2214 ) ### What does this PR do? Add `__init__.py` for verl.trainer.config, so that it can be discussed by hydra via searchpath `pkg://`. In my use case, I want to implement my own trainer with my own config, similar to DAPO did. I noticed that when DAPO inherits the config, it directly uses the relative path of VERL. This is not applicable in my case. My code base is another separated directory, I can't know for sure where VERL is installed in my environment. Usage of `pkg://` of [hydra](https://hydra.cc/docs/advanced/search_path/) looks suitable in my case. However, it complains: ``` lib/python3.10/site-packages/hydra/_internal/config_loader_impl.py:216: UserWarning: provider=hydra.searchpath in main, path=verl.trainer.config is not available. warnings.warn( ``` This is because `__init__.py` does not exist. In hydra documentation, it states specifically: > pkg:// points to an importable Python module, with . being the separator. __init__.py files are needed in directories for Python to treat them as packages. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+config+init - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example This PR supports usages like this: ```yaml hydra: searchpath: - pkg://verl/trainer/config # Originally it has to be file:///path/to/verl/trainer/config defaults: - ppo_trainer - _self_ my_custom_server: port: 9999 data: filter_overlong_prompts: false actor_rollout_ref: rollout: mode: async ``` ### High-Level Design N/A ### Specific Changes Adds an `__init__.py`. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: not applicable. - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Yuge Zhang <scotyugochang@gmail.com>	2025-06-26 16:36:28 +08:00
Blue Space	43a5ab3378	[trainer] fix: add missing qwen2_moe flops counter (#2190 ) ### What does this PR do? Add missing qwen2_moe flops counter, shall be the same as original qwen3-moe counter. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-26 13:01:40 +08:00
Xu Huang	02549d99cc	[data] fix: fix the type of parquet_files in SFTDataset (#2203 ) ### What does this PR do? Fix the type of parquet_files in sft_dataset.py. When sending a list of files, the type of parquet_files is ListConfig, not List[str]. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-26 08:46:31 +08:00
Blue Space	3b3e597042	[megatron] feat: Support of dist checkpoint (#2125 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Support of dist checkpoint in saving, loading and model merger. ### Test Algorithm: <img width="783" alt="image" src="https://github.com/user-attachments/assets/9a200b47-5937-426a-8da6-c601d2d8328f" /> ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-25 17:17:29 +08:00
Yaowei Zheng	411751e610	[model] feat: Add MiniCPM-o 2.6 support (#2178 ) @RanchiZhao We reverted the previous commit because we find a critical bug in #1833 https://github.com/volcengine/verl/pull/1833/files#diff-e06e73d3a7775a502b7aea91103e7911f6597eb48e4b898db558766cdd41daf9R119-R121 The indentation size of the if-else block is incorrect	2025-06-25 16:30:20 +08:00
Yang Wang	c5d4d90af7	[doc] fix: Fix a typo in the profiler's document (#2141 ) ### What does this PR do? Fix a typo in the profiler's document, `use_profiler` should be `use_profile`. `9b7bb69ea3/verl/utils/debug/profile.py (L49)` > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-25 11:09:59 +08:00
杨睿	fc6ebc9ebe	[megatron,vllm] fix: megatron vllm async rollout server (#2122 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix megatron vllm async rollout, releated to https://github.com/volcengine/verl/pull/2008 and https://github.com/volcengine/verl/issues/2001 > We are from the Large Model Post-Training Team of 📕 Xiaohongshu's AI Platform Technology Department , dedicated to developing high-performance, easily-scalable distributed post-training engines. ### Test ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-25 10:23:21 +08:00
Yaowei Zheng	dc805c7897	[ci] fix: enable e2e ppo trainer test (#2174 ) ### What does this PR do? Fix bugs introduced by https://github.com/volcengine/verl/pull/2113 Do not skip the e2e tests when pushing changes to the main branch > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-24 20:16:17 +08:00
Zhen	68d62518ce	[misc] fix: fix timer importance error in split_placement (#2169 ) ### What does this PR do? fix timer importance error in split_placement, should use `from verl.trainer.ppo.ray_trainer import marked_timer`, but got `from verl.trainer.ppo.ray_trainer import _timer` now. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### High-Level Design Not related. ### Specific Changes fix timer importance error in split_placement, should use `from verl.trainer.ppo.ray_trainer import marked_timer`, but got `import _timer` now. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-24 17:06:12 +08:00
Yaowei Zheng	24707f6d4e	[model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" (#2176 ) Reverts volcengine/verl#1833	2025-06-24 16:44:57 +08:00
yuanqian_zhao	e1039aed4f	[model] feat: Add MiniCPM-o 2.6 support (#1833 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add MiniCPM-o 2.6 multimodal model support to VERL framework for vision-language RL training. ### Specific Changes - verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py: Add MiniCPM-o weight loading - verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py: Add MiniCPM-o weight loading - verl/utils/dataset/vision_utils.py: Enhanced vision data processing - verl/utils/dataset/rl_dataset.py: Multimodal dataset support - verl/utils/flops_counter.py: Vision model FLOPS calculation - verl/workers/actor/dp_actor.py: Multimodal model compatibility - examples/grpo_trainer/run_minicpmo2_6.sh: Complete training example ### Usage Example ```bash # Train MiniCPM-o 2.6 with GRPO bash examples/grpo_trainer/run_minicpmo2_6.sh ``` ### Test - [x] Local testing with MiniCPM-o 2.6 on geo3k dataset - [x] Verified weight loading for both vLLM versions - [x] Training script validation ### Checklist Before Submitting - [x] Read the Contribute Guide - [x] Apply pre-commit checks (will fix in follow-up if needed) - [ ] No breaking API changes - [ ] Documentation updates (if needed) - [x] Rely on existing unit tests --------- Co-authored-by: RanchiZhao <ranchizhao@example.com>	2025-06-24 14:56:50 +08:00
Cong Lin	08be380f95	[worker] feat: allow dist shared file-system initialization (#2154 ) ### What does this PR do? Allow torch.distributed.init_process_group to fetch "DIST_INIT_METHOD" from os.environ to accelerate single node initialization. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Before using shared file-system initialization ![企业微信截图_17506726709312](https://github.com/user-attachments/assets/cce62dab-dbea-496e-bb60-5cd4e88f8809) After export DIST_INIT_METHOD='file:///tmp/torch_dist' ![企业微信截图_17506729178154](https://github.com/user-attachments/assets/6ed23d76-dda8-44fc-8cb8-5596da0c606d) ### API and Usage Example Simply add ```export DIST_INIT_METHOD='file:///tmp/some_file'``` to your script, and remember to ```rm -rf /tmp/some_file``` before your next run. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: very simple to reproduce - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-24 13:16:09 +08:00
Stefan He	2a6212385a	[rollout] feat: Support Multi-stage Awake for SGLang (#1911 ) Co-authored with: MrAta (immrata@gmail.com) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? ### Motivation In RL Ecosystem which use colocate design like [verl](https://github.com/volcengine/verl/tree/main), we need to offload training model and load serving model & KV Cache frequently. #### Background - Currently SGLang is using [torch_memory_saver](https://github.com/fzyzcjy/torch_memory_saver) to pause and resume. - [torch_memory_saver](https://github.com/fzyzcjy/torch_memory_saver) is a open source repo that provided easy to use api to hack cudaMalloc and cudaFree to make sure the virtual address could be consistent after pause and resume, which is critical to ensure CUDA Graph work. - CUDA Graph is critical to make sure SGLang runs faster in decoding phases. #### Here is the current behavior of VERL + SGLang ![Image](https://github.com/user-attachments/assets/e87e7dd6-f223-4de6-8f07-915eb2030ea8) 1. During Training, we have training model and optimizer state in the GPU Memory, and once training is done, we will offload optimizer state to cpu and keep the model weights in GPU, which is needed in Update Weight. 2. During Update Weight, we awake the SGLang engine, so those paused memory of Model Weights and KV Cache will come back. Then we update model from training model to serving model on the fly using the api: `update_weights_in_tensor` 3. After Model being updated, we delete the training model from GPU Memory. Above design works pretty well so far, however, this would waste a big chunk of GPU Memory during rollout, which could cause a few issues we've seen so far: - Small KV Cache: We need to use relative lower number of mem fraction ratio (e.g: 0.6), hence our KV Cache has less tokens. Given KV Cache has less tokens, we will hit `RuntimeError: Prefill out of memory. Try to lower your batch size.` when we try prefill large number of requests. - Out of Memory: If we use mem fraction ratio 0.8 and run RL for 32B model on 8 H100, it will OOM during update weight #### Challenge - `torch_memory_saver` currently only supports Singleton, hence SGLang will pause and resume KV Cache + Weights together, they are treated as the same group of memory controlled by the singleton `torch_memory_saver` instance #### Proposal ![Image](https://github.com/user-attachments/assets/7fda9638-0dc2-4c14-bc64-cd20616f350f) 1. During Training, we do the same 2. During Update Weight Stage 1, we awake the model weights from SGLang and then update weights 3. During Update Weight Stage 2, we delete the training model weights from GPU Memory 4. Awake the SGLang's KV Cache ![Image](https://github.com/user-attachments/assets/f3dab327-dc2e-4ed8-88d7-15e383f77d25) ### Benefit With above feature, we can train larger model with same GPU, we can also make training/rollout more efficient given we can allocate larger KV Cache ### Solution: Keep using Singleton and provide tag based pause/resume - [x] Support tag based resume/pause: https://github.com/fzyzcjy/torch_memory_saver/pull/20 - [x] Support Multiple Stage Awake in SGLang: https://github.com/sgl-project/sglang/pull/7099 - [ ] Support Multiple Stage Awake in verl: https://github.com/volcengine/verl/pull/1911 ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ![Screenshot 2025-06-19 at 12 16 19 PM](https://github.com/user-attachments/assets/a95dd57e-43e1-4f28-8a84-003ec5c043fc) ![Screenshot 2025-06-19 at 12 13 14 PM](https://github.com/user-attachments/assets/f1f4a8a8-1845-4fad-9424-5526d4154dd0) ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-06-23 14:03:35 -07:00
Joy Zhou	d69528fe38	[rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True (#2156 ) ### What does this PR do? > fix : batch size and the size of raw_prompt unmatching when setting `data.return_raw_chat=True` fix bug when using `data.return_raw_chat=True` in GRPO algorithm with reward model: ` File "/ossfs/workspace/repository/verl/verl/single_controller/ray/base.py", line 625, in func return getattr(self.worker_dict[key], name)(args, kwargs) File "/ossfs/workspace/repository/verl/verl/single_controller/base/decorator.py", line 534, in inner return func(args, *kwargs) File "/ossfs/workspace/repository/verl/verl/workers/fsdp_workers.py", line 634, in generate_sequences output = self.rollout.generate_sequences(prompts=prompts) File "/ossfs/workspace/repository/verl/verl/utils/debug/performance.py", line 78, in f return self.log(decorated_function, args, *kwargs) File "/ossfs/workspace/repository/verl/verl/utils/debug/performance.py", line 88, in log output = func(args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, **kwargs) File "/ossfs/workspace/repository/verl/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py", line 346, in generate_sequences return DataProto(batch=batch, non_tensor_batch=non_tensor_batch) File "<string>", line 6, in __init__ File "/ossfs/workspace/repository/verl/verl/protocol.py", line 214, in __post_init__ self.check_consistency() File "/ossfs/workspace/repository/verl/verl/protocol.py", line 325, in check_consistency assert val.shape[0] == batch_size, f"key {key} length {len(val)} is not equal to batch size {batch_size}" AssertionError: key raw_prompt length 128 is not equal to batch size 640` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-23 20:56:12 +08:00
Qunhong Zeng	2ac410f001	[fsdp] feat: support fsdp2 save hugging face model (#2138 ) ### What does this PR do? Support FSDP2 save HF model. Previously only supported FSDP1, and FSDP2 will lead to error in https://github.com/volcengine/verl/issues/1703. Fix https://github.com/volcengine/verl/issues/1703. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-23 15:15:36 +08:00
Nan Jiang	644aaa76bc	[sglang] feat: add multimodal input to multiturn async rollout (#2014 ) ### Checklist Before Starting - [X] Searched for similar PR(s). ### What does this PR do? This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it. ### High-Level Design Since sglang engine already handle the image input, just need to properly handling the tokenization. ### Specific Changes Change `self.tokenizer.apply_chat_template()` to `self.processing_class.apply_chat_template()`. `processing_class` could be `tokenizer` or `processor`. ### Usage Example It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: xieck13 <xieck13@gmail.com>	2025-06-22 15:43:46 -07:00
Shizhan Lu	e67ee86f8b	[tool] feat: Add memory limit configuration for sandbox fusion (#2105 )	2025-06-22 11:06:00 -07:00
He Du	c7aa5e845d	[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang (#1630 )	2025-06-22 09:47:14 -07:00
Chi Zhang	dff6b96843	[ray] feat: add a test to demonstrate how to perform p2p communication inside wor… (#2131 ) …ker group ### What does this PR do? As title ### Checklist Before Describing the Details - [ ] Searched for similar PR(s). - [ ] PR title is in the format of: `[modules] type: Title` - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg` - type is in `feat, fix, refactor, chore, test` - multiple modules are seperated by `,` or space, such as `[megatron, fsdp, doc] feat: xxx` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --show-diff-on-failure --color=always --all-files` - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-22 09:45:58 -07:00
H	ade658f48e	[doc] fix: fix index rendering (#2127 ) ### What does this PR do? fix the rendering ### Checklist Before Describing the Details - [x] Searched for similar PR(s). - [x] PR title is in the format of: `[modules] type: Title` - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg` - type is in `feat, fix, refactor, chore, test` - multiple modules are seperated by `,` or space, such as `[megatron, fsdp, doc] feat: xxx` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --show-diff-on-failure --color=always --all-files` - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-22 09:45:44 -07:00
Shawn/Yuxuan Tong	9b7bb69ea3	[BREAKING][ci] feat: add CI request channel & improve PR template (#2126 )	2025-06-21 20:33:31 -07:00
Qunhong Zeng	76f63cffa5	[fsdp] refactor: set actor's strategy as default for critic and ref (#2130 ) ### What does this PR do? Set actor's strategy as the default strategy for critic, ref and reward model. In principle, all actors should use the same strategy. With this change, we can set `STRATEGY=fsdp2` in `run_function_reward.sh` and all models can use fsdp2 as strategy, instead of setting it for each role individually. ### Checklist Before Describing the Details - [x] Searched for similar PR(s). - [x] PR title is in the format of: `[modules] type: Title` - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg` - type is in `feat, fix, refactor, chore, test` - multiple modules are seperated by `,` or space, such as `[megatron, fsdp, doc] feat: xxx` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --show-diff-on-failure --color=always --all-files` - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-21 23:42:43 +08:00
Zerui Wang	9bc360aa97	[worker] feat: add support for dynamic batch size of multimodal data (#2049 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Add support for dynamic batch size (data packing) of multimodal dataset. Add an example script `examples/grpo_trainer/run_qwen2_5_vl-7b_seq_balance.sh`. ### Test The console log from training Qwen2.5-VL-7B with PPO on the Geo3K dataset (`examples/grpo_trainer/run_qwen2_5_vl-7b_seq_balance.sh`). The experiment was conducted on a single node with 8 NVIDIA A800 GPUs. ``` [2025-06-17 02:42:10] (WorkerDict pid=13539) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch [repeated 7x across cluster] [2025-06-17 02:42:10] (WorkerDict pid=13361) Model config after override: Qwen2_5_VLConfig { [2025-06-17 02:42:10] (WorkerDict pid=13361) "architectures": [ [2025-06-17 02:42:10] (WorkerDict pid=13361) "Qwen2_5_VLForConditionalGeneration" [2025-06-17 02:42:10] (WorkerDict pid=13361) ], [2025-06-17 02:42:10] (WorkerDict pid=13361) "attention_dropout": 0.0, [2025-06-17 02:42:10] (WorkerDict pid=13361) "eos_token_id": 151645, [2025-06-17 02:42:10] (WorkerDict pid=13361) "hidden_act": "silu", [2025-06-17 02:42:10] (WorkerDict pid=13361) "hidden_size": 3584, [2025-06-17 02:42:10] (WorkerDict pid=13361) "image_token_id": 151655, [2025-06-17 02:42:10] (WorkerDict pid=13361) "initializer_range": 0.02, [2025-06-17 02:42:10] (WorkerDict pid=13361) "intermediate_size": 18944, [2025-06-17 02:42:10] (WorkerDict pid=13361) "max_position_embeddings": 128000, [2025-06-17 02:42:10] (WorkerDict pid=13361) "max_window_layers": 28, [2025-06-17 02:42:10] (WorkerDict pid=13361) "model_type": "qwen2_5_vl", [2025-06-17 02:42:10] (WorkerDict pid=13361) "num_attention_heads": 28, [2025-06-17 02:42:10] (WorkerDict pid=13361) "num_hidden_layers": 28, [2025-06-17 02:42:10] (WorkerDict pid=13361) "num_key_value_heads": 4, [2025-06-17 02:42:10] (WorkerDict pid=13361) "pad_token_id": 151643, [2025-06-17 02:42:10] (WorkerDict pid=13361) "rms_norm_eps": 1e-06, [2025-06-17 02:42:10] (WorkerDict pid=13361) "rope_scaling": { [2025-06-17 02:42:10] (WorkerDict pid=13361) "mrope_section": [ [2025-06-17 02:42:10] (WorkerDict pid=13361) 16, [2025-06-17 02:42:10] (WorkerDict pid=13361) 24, [2025-06-17 02:42:10] (WorkerDict pid=13361) 24 [2025-06-17 02:42:10] (WorkerDict pid=13361) ], [2025-06-17 02:42:10] (WorkerDict pid=13361) "rope_type": "default", [2025-06-17 02:42:10] (WorkerDict pid=13361) "type": "default" [2025-06-17 02:42:10] (WorkerDict pid=13361) }, [2025-06-17 02:42:10] (WorkerDict pid=13361) "rope_theta": 1000000.0, [2025-06-17 02:42:10] (WorkerDict pid=13361) "sliding_window": 32768, [2025-06-17 02:42:10] (WorkerDict pid=13361) "tie_word_embeddings": false, [2025-06-17 02:42:10] (WorkerDict pid=13361) "torch_dtype": "bfloat16", [2025-06-17 02:42:10] (WorkerDict pid=13361) "transformers_version": "4.51.0", [2025-06-17 02:42:10] (WorkerDict pid=13361) "use_cache": true, [2025-06-17 02:42:10] (WorkerDict pid=13361) "use_sliding_window": false, [2025-06-17 02:42:10] (WorkerDict pid=13361) "video_token_id": 151656, [2025-06-17 02:42:10] (WorkerDict pid=13361) "vision_config": { [2025-06-17 02:42:10] (WorkerDict pid=13361) "depth": 32, [2025-06-17 02:42:10] (WorkerDict pid=13361) "fullatt_block_indexes": [ [2025-06-17 02:42:10] (WorkerDict pid=13361) 7, [2025-06-17 02:42:10] (WorkerDict pid=13361) 15, [2025-06-17 02:42:10] (WorkerDict pid=13361) 23, [2025-06-17 02:42:10] (WorkerDict pid=13361) 31 [2025-06-17 02:42:10] (WorkerDict pid=13361) ], [2025-06-17 02:42:10] (WorkerDict pid=13361) "hidden_act": "silu", [2025-06-17 02:42:10] (WorkerDict pid=13361) "hidden_size": 1280, [2025-06-17 02:42:10] (WorkerDict pid=13361) "in_channels": 3, [2025-06-17 02:42:10] (WorkerDict pid=13361) "in_chans": 3, [2025-06-17 02:42:10] (WorkerDict pid=13361) "intermediate_size": 3420, [2025-06-17 02:42:10] (WorkerDict pid=13361) "model_type": "qwen2_5_vl", [2025-06-17 02:42:10] (WorkerDict pid=13361) "num_heads": 16, [2025-06-17 02:42:10] (WorkerDict pid=13361) "out_hidden_size": 3584, [2025-06-17 02:42:10] (WorkerDict pid=13361) "patch_size": 14, [2025-06-17 02:42:10] (WorkerDict pid=13361) "spatial_merge_size": 2, [2025-06-17 02:42:10] (WorkerDict pid=13361) "spatial_patch_size": 14, [2025-06-17 02:42:10] (WorkerDict pid=13361) "temporal_patch_size": 2, [2025-06-17 02:42:10] (WorkerDict pid=13361) "tokens_per_second": 2, [2025-06-17 02:42:10] (WorkerDict pid=13361) "torch_dtype": "float32", [2025-06-17 02:42:10] (WorkerDict pid=13361) "window_size": 112 [2025-06-17 02:42:10] (WorkerDict pid=13361) }, [2025-06-17 02:42:10] (WorkerDict pid=13361) "vision_end_token_id": 151653, [2025-06-17 02:42:10] (WorkerDict pid=13361) "vision_start_token_id": 151652, [2025-06-17 02:42:10] (WorkerDict pid=13361) "vision_token_id": 151654, [2025-06-17 02:42:10] (WorkerDict pid=13361) "vocab_size": 152064 [2025-06-17 02:42:10] (WorkerDict pid=13361) } [2025-06-17 02:42:10] (WorkerDict pid=13361) [2025-06-17 02:42:10] (WorkerDict pid=13361) Monkey patch FlashAttention2.forward in Qwen2.5VL [2025-06-17 02:42:10] (WorkerDict pid=13361) Monkey patch _flash_attention_forward in transformers.models.qwen2_5_vl.modeling_qwen2_5_vl [2025-06-17 02:42:10] (WorkerDict pid=13361) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch [2025-06-17 02:42:10] (WorkerDict pid=13541) Monkey patch FlashAttention2.forward in Qwen2.5VL [2025-06-17 02:42:10] (WorkerDict pid=13541) Monkey patch _flash_attention_forward in transformers.models.qwen2_5_vl.modeling_qwen2_5_vl [2025-06-17 02:42:10] (WorkerDict pid=13541) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch [2025-06-17 02:42:10] (WorkerDict pid=13361) Qwen2_5_VLForConditionalGeneration contains 8.29B parameters [2025-06-17 02:42:10] (WorkerDict pid=13361) wrap_policy: functools.partial(<function _or_policy at 0x7f8504485b40>, policies=[functools.partial(<function transformer_auto_wrap_policy at 0x7f8504485a20>, transformer_layer_cls={<class 'transformers.models.qwen2_5_vl.modeling_qwen2_5_vl.Qwen2_5_VLDecoderLayer'>, <class 'transformers.models.qwen2_5_vl.modeling_qwen2_5_vl.Qwen2_5_VLVisionBlock'>})]) [2025-06-17 02:42:10] (WorkerDict pid=13361) Total steps: 60, num_warmup_steps: 0 [2025-06-17 02:42:10] (WorkerDict pid=13361) Actor use_remove_padding=True [2025-06-17 02:42:10] (WorkerDict pid=13361) Actor use_fused_kernels=False [2025-06-17 02:42:10] (WorkerDict pid=13543) Monkey patch FlashAttention2.forward in Qwen2.5VL [repeated 6x across cluster] [2025-06-17 02:42:10] (WorkerDict pid=13543) Monkey patch _flash_attention_forward in transformers.models.qwen2_5_vl.modeling_qwen2_5_vl [repeated 6x across cluster] [2025-06-17 02:42:10] (WorkerDict pid=13543) Skipping monkey patch for Qwen2_5_VLForConditionalGeneration as use_fused_kernels is False or fused_kernels_backend is torch [repeated 6x across cluster] [2025-06-17 02:42:10] (WorkerDict pid=13361) WARNING 06-16 18:40:12 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f830d065330> [2025-06-17 02:42:10] (WorkerDict pid=13540) NCCL version 2.21.5+cuda12.4 Training Progress: 0%\| \| 0/60 [00:00<?, ?it/s] [2025-06-17 02:42:18] (WorkerDict pid=13539) /********/envs/verl/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 5x across cluster] [2025-06-17 02:42:18] (WorkerDict pid=13539) warnings.warn( [repeated 5x across cluster] Training Progress: 2%\|▏ \| 1/60 [04:09<4:05:26, 249.60s/it] [2025-06-17 02:46:27] (WorkerDict pid=13537) /********/envs/verl/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:690: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . [repeated 2x across cluster] [2025-06-17 02:46:27] (WorkerDict pid=13537) warnings.warn( [repeated 2x across cluster] Training Progress: 3%\|▎ \| 2/60 [08:04<3:52:47, 240.81s/it] (TaskRunner pid=9331) Training Progress: 5%\|▌ \| 3/60 [11:53<3:43:33, 235.33s/it] (WorkerDict pid=13540) kwargs: {'n': 5, 'logprobs': 0, 'max_tokens': 2048, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} (WorkerDict pid=13539) WARNING 06-16 18:40:12 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f97d7cc92d0> [repeated 7x across cluster] (WorkerDict pid=13542) NCCL version 2.21.5+cuda12.4 [repeated 2x across cluster] (TaskRunner pid=9331) Using LocalLogger is deprecated. The constructor API will change (WorkerDict pid=13539) kwargs: {'n': 5, 'logprobs': 0, 'max_tokens': 2048, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 5x across cluster] (TaskRunner pid=9331) step:1 - global_seqlen/min:194004.000 - global_seqlen/max:215990.000 - global_seqlen/minmax_diff:21986.000 - global_seqlen/balanced_min:203335.000 - global_seqlen/balanced_max:203336.000 - global_seqlen/mean:203335.125 - actor/entropy:0.467 - training/rollout_probs_diff_max:0.378 - training/rollout_probs_diff_mean:0.005 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.001 - actor/kl_coef:0.010 - actor/pg_loss:-0.005 - actor/pg_clipfrac:0.001 - actor/ppo_kl:-0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.230 - perf/mfu/actor:0.323 - perf/max_memory_allocated_gb:62.271 - perf/max_memory_reserved_gb:81.812 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.394 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.394 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.008 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.008 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:380.995 - response_length/max:2048.000 - response_length/min:25.000 - response_length/clip_ratio:0.007 - prompt_length/mean:254.428 - prompt_length/max:996.000 - prompt_length/min:102.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:66.493 - timing_s/reshard:1.879 - timing_s/gen:70.929 - timing_s/reward:3.603 - timing_s/old_log_prob:34.632 - timing_s/ref:33.643 - timing_s/adv:0.095 - timing_s/update_actor:95.425 - timing_s/step:238.697 - timing_per_token_ms/gen:0.073 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.021 - timing_per_token_ms/update_actor:0.059 - perf/total_num_tokens:1626681.000 - perf/time_per_step:238.697 - perf/throughput:851.856 (WorkerDict pid=13537) kwargs: {'n': 5, 'logprobs': 0, 'max_tokens': 2048, 'detokenize': False, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 2x across cluster] (TaskRunner pid=9331) step:2 - global_seqlen/min:190581.000 - global_seqlen/max:220843.000 - global_seqlen/minmax_diff:30262.000 - global_seqlen/balanced_min:209057.000 - global_seqlen/balanced_max:209058.000 - global_seqlen/mean:209057.500 - actor/entropy:0.458 - training/rollout_probs_diff_max:0.415 - training/rollout_probs_diff_mean:0.005 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.001 - actor/kl_coef:0.010 - actor/pg_loss:0.017 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.252 - perf/mfu/actor:0.327 - perf/max_memory_allocated_gb:62.280 - perf/max_memory_reserved_gb:85.205 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:2.000 - training/epoch:0.000 - critic/score/mean:0.403 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.403 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.016 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.016 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:390.521 - response_length/max:2048.000 - response_length/min:18.000 - response_length/clip_ratio:0.009 - prompt_length/mean:262.783 - prompt_length/max:996.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:63.223 - timing_s/reshard:2.093 - timing_s/gen:70.164 - timing_s/reward:3.706 - timing_s/old_log_prob:30.945 - timing_s/ref:30.190 - timing_s/adv:0.088 - timing_s/update_actor:96.829 - timing_s/step:232.303 - timing_per_token_ms/gen:0.070 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.018 - timing_per_token_ms/update_actor:0.058 - perf/total_num_tokens:1672460.000 - perf/time_per_step:232.303 - perf/throughput:899.936 (TaskRunner pid=9331) step:3 - global_seqlen/min:197140.000 - global_seqlen/max:212951.000 - global_seqlen/minmax_diff:15811.000 - global_seqlen/balanced_min:205956.000 - global_seqlen/balanced_max:205957.000 - global_seqlen/mean:205956.250 - actor/entropy:0.418 - training/rollout_probs_diff_max:0.319 - training/rollout_probs_diff_mean:0.005 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.005 - actor/kl_coef:0.010 - actor/pg_loss:0.065 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.199 - perf/mfu/actor:0.332 - perf/max_memory_allocated_gb:62.414 - perf/max_memory_reserved_gb:85.205 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:3.000 - training/epoch:0.000 - critic/score/mean:0.392 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.392 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.004 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.004 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:379.654 - response_length/max:2048.000 - response_length/min:20.000 - response_length/clip_ratio:0.003 - prompt_length/mean:263.959 - prompt_length/max:776.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:60.097 - timing_s/reshard:2.019 - timing_s/gen:69.763 - timing_s/reward:3.414 - timing_s/old_log_prob:30.005 - timing_s/ref:30.284 - timing_s/adv:0.090 - timing_s/update_actor:93.705 - timing_s/step:227.641 - timing_per_token_ms/gen:0.072 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.018 - timing_per_token_ms/update_actor:0.057 - perf/total_num_tokens:1647650.000 - perf/time_per_step:227.641 - perf/throughput:904.741 (TaskRunner pid=9331) Training Progress: 7%\|▋ \| 4/60 [15:41<3:37:00, 232.51s/it] (TaskRunner pid=9331) step:4 - global_seqlen/min:190149.000 - global_seqlen/max:224987.000 - global_seqlen/minmax_diff:34838.000 - global_seqlen/balanced_min:207060.000 - global_seqlen/balanced_max:207061.000 - global_seqlen/mean:207060.250 - actor/entropy:0.429 - training/rollout_probs_diff_max:0.299 - training/rollout_probs_diff_mean:0.004 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.002 - actor/kl_coef:0.010 - actor/pg_loss:0.036 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.210 - perf/mfu/actor:0.330 - perf/max_memory_allocated_gb:62.977 - perf/max_memory_reserved_gb:87.430 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:4.000 - training/epoch:0.000 - critic/score/mean:0.406 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.406 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.019 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.019 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:392.973 - response_length/max:2048.000 - response_length/min:25.000 - response_length/clip_ratio:0.010 - prompt_length/mean:254.090 - prompt_length/max:996.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:64.229 - timing_s/reshard:2.136 - timing_s/gen:71.688 - timing_s/reward:3.684 - timing_s/old_log_prob:28.621 - timing_s/ref:28.663 - timing_s/adv:0.088 - timing_s/update_actor:94.804 - timing_s/step:227.898 - timing_per_token_ms/gen:0.071 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.017 - timing_per_token_ms/update_actor:0.057 - perf/total_num_tokens:1656482.000 - perf/time_per_step:227.898 - perf/throughput:908.567 (TaskRunner pid=9331) test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True} (TaskRunner pid=9331) validation generation end (TaskRunner pid=9331) [prompt] system (TaskRunner pid=9331) You are a helpful assistant. (TaskRunner pid=9331) user (TaskRunner pid=9331) Chords $\overline{A C}$ and $\overline{D F}$ are equidistant from the center. If the radius of $\odot G$ is 26 find $A C$ You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}. (TaskRunner pid=9331) assistant (TaskRunner pid=9331) (TaskRunner pid=9331) [response] <think>Let's start by analyzing the given information and the diagram. We know that chords $\overline{AC}$ and $\overline{DF}$ are equidistant from the center $G$. This means that the perpendicular distances from $G$ to $\overline{AC}$ and $\overline{DF}$ are the same. (TaskRunner pid=9331) (TaskRunner pid=9331) First, let's consider the radius of the circle, which is 26. The perpendicular distance from the center $G$ to a chord is the length of the segment from $G$ to the midpoint of the chord. This distance is also the length of the segment from $G$ to the chord, which is the same for both chords $\overline{AC}$ and $\overline{DF}$. (TaskRunner pid=9331) (TaskRunner pid=9331) Since the chords are equidistant from the center, the perpendicular distance from $G$ to $\overline{AC}$ is the same as the perpendicular distance from $G$ to $\overline{DF}$. This distance is 10, as given in the problem. (TaskRunner pid=9331) (TaskRunner pid=9331) Now, let's use the Pythagorean theorem in the right triangle formed by the radius, the perpendicular distance, and half the length of the chord. The radius is 26, the perpendicular distance is 10, and half the length of the chord is $ \frac{AC}{2} $. (TaskRunner pid=9331) (TaskRunner pid=9331) The Pythagorean theorem states: (TaskRunner pid=9331) \[ 26^2 = 10^2 + \left( \frac{AC}{2} \right)^2 \] (TaskRunner pid=9331) \[ 676 = 100 + \left( \frac{AC}{2} \right)^2 \] (TaskRunner pid=9331) \[ 576 = \left( \frac{AC}{2} \right)^2 \] (TaskRunner pid=9331) \[ \frac{AC}{2} = \sqrt{576} \] (TaskRunner pid=9331) \[ \frac{AC}{2} = 24 \] (TaskRunner pid=9331) \[ AC = 48 \] (TaskRunner pid=9331) (TaskRunner pid=9331) So, the length of $AC$ is 48.</think> (TaskRunner pid=9331) \boxed{48} (TaskRunner pid=9331) [ground_truth] 48 (TaskRunner pid=9331) [score] 1.0 (TaskRunner pid=9331) Training Progress: 8%\|▊ \| 5/60 [20:34<3:53:09, 254.36s/it] (TaskRunner pid=9331) Training Progress: 10%\|█ \| 6/60 [24:24<3:41:25, 246.02s/it] (TaskRunner pid=9331) step:5 - global_seqlen/min:196253.000 - global_seqlen/max:210637.000 - global_seqlen/minmax_diff:14384.000 - global_seqlen/balanced_min:205432.000 - global_seqlen/balanced_max:205432.000 - global_seqlen/mean:205432.000 - actor/entropy:0.383 - training/rollout_probs_diff_max:0.349 - training/rollout_probs_diff_mean:0.004 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.003 - actor/kl_coef:0.010 - actor/pg_loss:-0.022 - actor/pg_clipfrac:0.001 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.218 - perf/mfu/actor:0.327 - perf/max_memory_allocated_gb:62.977 - perf/max_memory_reserved_gb:87.430 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - val-aux/hiyouga/geometry3k/reward/mean@1:0.450 - val-aux/hiyouga/geometry3k/reward/mean@24:0.550 - val-aux/hiyouga/geometry3k/reward/std@24:0.450 - val-aux/hiyouga/geometry3k/reward/best@2/mean:0.637 - val-aux/hiyouga/geometry3k/reward/best@2/std:0.238 - val-aux/hiyouga/geometry3k/reward/worst@2/mean:0.367 - val-aux/hiyouga/geometry3k/reward/worst@2/std:0.229 - val-aux/hiyouga/geometry3k/reward/best@4/mean:0.789 - val-aux/hiyouga/geometry3k/reward/best@4/std:0.255 - val-aux/hiyouga/geometry3k/reward/worst@4/mean:0.137 - val-aux/hiyouga/geometry3k/reward/worst@4/std:0.153 - val-aux/hiyouga/geometry3k/reward/best@8/mean:0.964 - val-aux/hiyouga/geometry3k/reward/best@8/std:0.118 - val-aux/hiyouga/geometry3k/reward/worst@8/mean:0.097 - val-aux/hiyouga/geometry3k/reward/worst@8/std:0.056 - val-aux/hiyouga/geometry3k/reward/best@16/mean:1.000 - val-aux/hiyouga/geometry3k/reward/best@16/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@16/mean:0.064 - val-aux/hiyouga/geometry3k/reward/worst@16/std:0.022 - val-aux/hiyouga/geometry3k/reward/best@24/mean:1.000 - val-aux/hiyouga/geometry3k/reward/best@24/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@24/mean:0.100 - val-aux/hiyouga/geometry3k/reward/worst@24/std:0.000 - val-aux/hiyouga/geometry3k/reward/mean@14:0.550 - val-aux/hiyouga/geometry3k/reward/std@14:0.450 - val-aux/hiyouga/geometry3k/reward/best@14/mean:1.000 - val-aux/hiyouga/geometry3k/reward/best@14/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@14/mean:0.100 - val-aux/hiyouga/geometry3k/reward/worst@14/std:0.000 - val-aux/hiyouga/geometry3k/reward/mean@2:0.548 - val-aux/hiyouga/geometry3k/reward/std@2:0.210 - val-aux/hiyouga/geometry3k/reward/mean@3:0.455 - val-aux/hiyouga/geometry3k/reward/std@3:0.309 - val-aux/hiyouga/geometry3k/reward/best@3/mean:0.664 - val-aux/hiyouga/geometry3k/reward/best@3/std:0.192 - val-aux/hiyouga/geometry3k/reward/worst@3/mean:0.231 - val-aux/hiyouga/geometry3k/reward/worst@3/std:0.235 - val-aux/hiyouga/geometry3k/reward/mean@6:0.475 - val-aux/hiyouga/geometry3k/reward/std@6:0.437 - val-aux/hiyouga/geometry3k/reward/best@6/mean:0.958 - val-aux/hiyouga/geometry3k/reward/best@6/std:0.174 - val-aux/hiyouga/geometry3k/reward/worst@6/mean:0.105 - val-aux/hiyouga/geometry3k/reward/worst@6/std:0.061 - val-core/hiyouga/geometry3k/reward/mean@26:0.612 - val-aux/hiyouga/geometry3k/reward/std@26:0.454 - val-core/hiyouga/geometry3k/reward/best@26/mean:1.000 - val-core/hiyouga/geometry3k/reward/best@26/std:0.000 - val-aux/hiyouga/geometry3k/reward/worst@26/mean:0.012 - val-aux/hiyouga/geometry3k/reward/worst@26/std:0.032 - val-aux/hiyouga/geometry3k/reward/mean@8:0.438 - val-aux/hiyouga/geometry3k/reward/std@8:0.420 - val-aux/hiyouga/geometry3k/reward/mean@5:0.460 - val-aux/hiyouga/geometry3k/reward/std@5:0.400 - val-aux/hiyouga/geometry3k/reward/best@5/mean:0.856 - val-aux/hiyouga/geometry3k/reward/best@5/std:0.255 - val-aux/hiyouga/geometry3k/reward/worst@5/mean:0.135 - val-aux/hiyouga/geometry3k/reward/worst@5/std:0.134 - val-aux/hiyouga/geometry3k/reward/mean@9:0.300 - val-aux/hiyouga/geometry3k/reward/std@9:0.374 - val-aux/hiyouga/geometry3k/reward/best@9/mean:0.908 - val-aux/hiyouga/geometry3k/reward/best@9/std:0.272 - val-aux/hiyouga/geometry3k/reward/worst@9/mean:0.100 - val-aux/hiyouga/geometry3k/reward/worst@9/std:0.000 - val-aux/hiyouga/geometry3k/reward/mean@4:0.100 - val-aux/hiyouga/geometry3k/reward/std@4:0.000 - training/global_step:5.000 - training/epoch:1.000 - critic/score/mean:0.388 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.388 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.010 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.010 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:384.739 - response_length/max:2048.000 - response_length/min:18.000 - response_length/clip_ratio:0.007 - prompt_length/mean:257.236 - prompt_length/max:996.000 - prompt_length/min:103.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:63.336 - timing_s/reshard:2.105 - timing_s/gen:69.227 - timing_s/reward:3.572 - timing_s/old_log_prob:29.942 - timing_s/ref:29.623 - timing_s/adv:0.087 - timing_s/update_actor:94.945 - timing_s/testing:51.987 - timing_s/step:279.773 - timing_per_token_ms/gen:0.070 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.018 - timing_per_token_ms/update_actor:0.058 - perf/total_num_tokens:1643456.000 - perf/time_per_step:279.773 - perf/throughput:734.280 (TaskRunner pid=9331) step:6 - global_seqlen/min:200473.000 - global_seqlen/max:216599.000 - global_seqlen/minmax_diff:16126.000 - global_seqlen/balanced_min:207366.000 - global_seqlen/balanced_max:207367.000 - global_seqlen/mean:207366.250 - actor/entropy:0.346 - training/rollout_probs_diff_max:0.239 - training/rollout_probs_diff_mean:0.004 - training/rollout_probs_diff_std:0.011 - actor/kl_loss:0.004 - actor/kl_coef:0.010 - actor/pg_loss:0.013 - actor/pg_clipfrac:0.001 - actor/ppo_kl:-0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.257 - perf/mfu/actor:0.328 - perf/max_memory_allocated_gb:62.977 - perf/max_memory_reserved_gb:87.430 - perf/cpu_memory_used_gb:0.000 - actor/lr:0.000 - training/global_step:6.000 - training/epoch:1.000 - critic/score/mean:0.443 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.443 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.010 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.010 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:381.082 - response_length/max:2048.000 - response_length/min:22.000 - response_length/clip_ratio:0.005 - prompt_length/mean:266.938 - prompt_length/max:996.000 - prompt_length/min:102.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:60.989 - timing_s/reshard:1.788 - timing_s/gen:67.473 - timing_s/reward:3.320 - timing_s/old_log_prob:30.357 - timing_s/ref:31.241 - timing_s/adv:0.090 - timing_s/update_actor:95.860 - timing_s/step:228.698 - timing_per_token_ms/gen:0.069 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.019 - timing_per_token_ms/update_actor:0.058 - perf/total_num_tokens:1658930.000 - perf/time_per_step:228.698 - perf/throughput:906.726 (TaskRunner pid=9331) Training Progress: 12%\|█▏ \| 7/60 [28:12<3:32:05, 240.11s/it] (TaskRunner pid=9331) Training Progress: 13%\|█▎ \| 8/60 [31:55<3:23:25, 234.72s/it] (TaskRunner pid=9331) Training Progress: 15%\|█▌ \| 9/60 [35:50<3:19:30, 234.71s/it] ... ... ``` ### Usage Example ```bash bash examples/grpo_trainer/run_qwen2_5_vl-7b_seq_balance.sh ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Rely on existing unit tests on CI that covers the code path. - [ ] New CI unit test(s) are added to cover the code path.	2025-06-21 20:49:24 +08:00
H	0fd4d0ff6a	[cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig (#2117 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Previously, most of individual components in verl takes omega conf dict as one of the input, making it tedious to setup unit tests. Now verl is gradually introducing dataclass for each sub module for configuration, with `verl.utils.omega_conf_to_dataclass` to make the conversion easier. This PR also provide example unit tests on how standalone classes with config as the input should be tested before using them end-to-end. Finally, this PR also renames WorkerProfiler to DistProfiler for clarity. ### Test Test cases for configuration utilities on CPU. 1. Test basic OmegaConf to dataclass conversion for simple nested structures 2. Test nested OmegaConf to dataclass conversion for complex hierarchical configurations 3. Verify all configuration values are correctly converted and accessible Test suite for NsightSystemsProfiler functionality 1. Initialization: Verify profiler state after creation 2. Basic Profiling: Test start/stop functionality 3. Discrete Mode: Test discrete profiling behavior 4. Annotation: Test the annotate decorator in both normal and discrete modes 5. Config Validation: Verify proper config initialization from OmegaConf ### Usage Example > Provide usage example(s) for easier usage. ```python def omega_conf_to_dataclass(config: Union[DictConfig, dict], dataclass_type: Type[Any]) -> Any: """ Convert an OmegaConf DictConfig to a dataclass. Args: config: The OmegaConf DictConfig or dict to convert. dataclass_type: The dataclass type to convert to. Returns: The dataclass instance. """ ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-20 11:41:08 -07:00
H	c87e91b2ef	[ci] test: inspect the type annotation of newly added code, focusing on func defs (#2113 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Per https://github.com/volcengine/verl/discussions/2112, type annotation should be encouraged to increase readability. In previous PRs, the type check script does not really take effect (either too strict or too loose). In this PR, the check is limited to only function definitions, with a default threshold. By default on CI it only inspect the files changed in the current PR. For reference, below is a glimpse of failure cases if we force it to inspect all files under `verl`. Upon failure, it prints: ``` f"Please add type annotations for inputs and outputs to meet threshold {args.threshold}. Cases exempt from checking:" "1. Private methods." "2. Args with name in ('self', 'cls'), or args / kwargs" "3. Files under tests/" ``` ``` verl/trainer/main_generation.py:44: def main(config): verl/trainer/main_generation.py:48: def run_generation(config) -> None: verl/trainer/main_generation.py:60: def main_task(config): verl/trainer/main_eval.py:33: def process_item(reward_fn, data_source, response_lst, reward_data): verl/trainer/main_eval.py:40: def main(config): verl/trainer/main_ppo.py:26: def main(config): verl/trainer/main_ppo.py:31: def run_ppo(config) -> None: verl/trainer/main_ppo.py:182: def create_rl_dataset(data_paths, data_config, tokenizer, processor): verl/trainer/main_ppo.py:224: def create_rl_sampler(data_config, dataset): verl/trainer/main_ppo.py:57: def run(self, config): verl/trainer/fsdp_sft_trainer.py:71: def extract_step(path): verl/trainer/fsdp_sft_trainer.py:549: def run_sft(config): verl/trainer/fsdp_sft_trainer.py:572: def main(config): verl/trainer/fsdp_sft_trainer.py:576: def create_sft_dataset(data_paths, data_config, tokenizer): verl/trainer/fsdp_sft_trainer.py:384: def training_step(self, batch: TensorDict): verl/trainer/fsdp_sft_trainer.py:433: def validation_step(self, batch: TensorDict): verl/trainer/fsdp_sft_trainer.py:444: def save_checkpoint(self, step): verl/trainer/fsdp_sft_trainer.py:486: def fit(self): verl/trainer/ppo/reward.py:25: def get_custom_reward_fn(config): verl/trainer/ppo/reward.py:60: def load_reward_manager(config, tokenizer, num_examine, reward_kwargs): verl/trainer/ppo/reward.py:111: def compute_reward(data: DataProto, reward_fn): verl/trainer/ppo/reward.py:133: def compute_reward_async(data: DataProto, config, tokenizer): verl/trainer/ppo/reward.py:54: def wrapped_fn(args, *kwargs): verl/trainer/ppo/ray_trainer.py:132: def apply_kl_penalty(data: DataProto, kl_ctrl: core_algos.AdaptiveKLController, kl_penalty="kl", multi_turn=Fals verl/trainer/ppo/ray_trainer.py:181: def compute_response_mask(data: DataProto): verl/trainer/ppo/ray_trainer.py:199: def compute_advantage(data: DataProto, adv_estimator, gamma=1.0, lam=1.0, num_repeat=1, multi_turn=False, norm_a verl/trainer/ppo/ray_trainer.py:89: def create_resource_pool(self): verl/trainer/ppo/ray_trainer.py:710: def init_workers(self): verl/trainer/ppo/ray_trainer.py:892: def fit(self): verl/trainer/ppo/ray_trainer.py:381: def check_mutually_exclusive(mbs, mbs_per_gpu, name: str): verl/trainer/ppo/core_algos.py:34: def register_adv_est(name_or_enum): verl/trainer/ppo/core_algos.py:53: def get_adv_estimator_fn(name_or_enum): verl/trainer/ppo/core_algos.py:116: def get_kl_controller(kl_ctrl): verl/trainer/ppo/core_algos.py:127: def compute_gae_advantage_return( verl/trainer/ppo/core_algos.py:174: def compute_grpo_outcome_advantage( verl/trainer/ppo/core_algos.py:231: def compute_grpo_passk_outcome_advantage( verl/trainer/ppo/core_algos.py:291: def compute_reinforce_plus_plus_baseline_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, verl/trainer/ppo/core_algos.py:336: def compute_rloo_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, index: np.ndarray, verl/trainer/ppo/core_algos.py:379: def compute_opo_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, index: np.ndarray, verl/trainer/ppo/core_algos.py:426: def compute_reinforce_plus_plus_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, verl/trainer/ppo/core_algos.py:463: def compute_remax_outcome_advantage(token_level_rewards: torch.Tensor, reward_baselines: torch.Tensor, response_mask: verl/trainer/ppo/core_algos.py:492: def compute_rewards(token_level_scores, old_log_prob, ref_log_prob, kl_ratio): verl/trainer/ppo/core_algos.py:497: def agg_loss(loss_mat: torch.Tensor, loss_mask: torch.Tensor, loss_agg_mode: str): verl/trainer/ppo/core_algos.py:533: def compute_policy_loss( verl/trainer/ppo/core_algos.py:599: def compute_entropy_loss(logits, response_mask, loss_agg_mode: str = "token-mean"): verl/trainer/ppo/core_algos.py:616: def compute_value_loss(vpreds: torch.Tensor, returns: torch.Tensor, values: torch.Tensor, response_mask: torch.Tensor, verl/trainer/ppo/core_algos.py:651: def kl_penalty(logprob: torch.FloatTensor, ref_logprob: torch.FloatTensor, kl_penalty) -> torch.FloatTensor: verl/trainer/ppo/core_algos.py:689: def compute_pf_ppo_reweight_data( verl/trainer/ppo/core_algos.py:43: def decorator(fn): verl/trainer/ppo/core_algos.py:99: def update(self, current_kl, n_steps): verl/trainer/ppo/core_algos.py:112: def update(self, current_kl, n_steps): verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:329: def init_cache_engine(self): verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:334: def free_cache_engine(self): verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:355: def from_engine_args( ``` ### Usage Example For current git diffs compared to `main`: ``` python3 tests/special_sanity/type_coverage_check.py ``` For inspecting all files under `verl/` ``` find verl -type f -name ".py" \| xargs -n 1 python3 tests/special_sanity/type_coverage_check.py --all-lines --debug --target-file ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-21 00:47:23 +08:00
H	92f9381ed0	[ci] test: enforce API docstring checks (#2114 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? For any function or class included in `__all__`, there must be docstring associated. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-21 00:46:35 +08:00
Yuchen Zhang	b1cdef84b5	[recipe] feat: Move entropy reward to the entropy recipe (#2118 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Move entropy reward to the entropy recipe, and kl_cov anf clip_cov to README > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Jiacheng Chen <jackchan9345@gmail.com> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-20 17:27:40 +08:00
Blue Space	a3498c9fa8	[rollout] fix: fix rollout key not found (#2116 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix rollout `multi_turn.format` key not found error. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-20 15:25:03 +08:00
OC	6642bb2eae	[rollout] fix: error in sgyang async mode (#2098 ) Fixed regression from: - https://github.com/volcengine/verl/pull/1668 - https://github.com/volcengine/verl/pull/1933 Added e2e test for both sglang and vllm async mode test	2025-06-19 19:22:57 -07:00
Yuchen Zhang	39b7250b0a	[recipe] feat: integrate entropy-mechanism recipe: Clip-Cov and KL-Cov methods (#1830 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add support for the Clip-Cov and KL-Cov methods in paper: The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models. Also add the verifier used in the paper. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. in `core_algos.py`, we add the clip-cov and kl-cov loss ``` def compute_policy_loss_clip_cov( old_log_prob, log_prob, advantages, response_mask, cliprange=None, cliprange_low=None, cliprange_high=None, loss_agg_mode="token-mean", clip_ratio=0.0002, clip_cov_lb=1.0, clip_cov_ub=5.0, ): """ Compute the clipped policy objective and related metrics for Clip-Cov. Adapted from https://github.com/PRIME-RL/Entropy-Mechanism-of-RL/blob/main/verl/trainer/ppo/core_algos.py Args: old_log_prob (torch.Tensor): Log-probabilities of actions under the old policy, shape (batch_size, response_length). log_prob (torch.Tensor): Log-probabilities of actions under the current policy, shape (batch_size, response_length). advantages (torch.Tensor): Advantage estimates for each action, shape (batch_size, response_length). response_mask (torch.Tensor): Mask indicating which tokens to include in the loss, shape (batch_size, response_length). cliprange (float, optional): Clipping parameter ε for standard PPO. See https://arxiv.org/abs/1707.06347. Defaults to None (must be provided). cliprange_low (float, optional): Lower clip range for dual-clip PPO. Defaults to same as `cliprange`. cliprange_high (float, optional): Upper clip range for dual-clip PPO. Defaults to same as `cliprange`. loss_agg_mode (str, optional): Aggregation mode for `agg_loss`. Defaults to "token-mean". clip_ratio (float, optional): Ratio for clipping the covariance. Defaults to 0.0002. clip_cov_lb (float, optional): Lower bound for clipping covariance. Defaults to 1.0. clip_cov_ub (float, optional): Upper bound for clipping covariance. Defaults to 5.0. """ assert clip_ratio > 0, "clip_ratio should be larger than 0." negative_approx_kl = log_prob - old_log_prob ratio = torch.exp(negative_approx_kl) ppo_kl = verl_F.masked_mean(-negative_approx_kl, response_mask) pg_losses1 = -advantages * ratio if cliprange_low is None: cliprange_low = cliprange if cliprange_high is None: cliprange_high = cliprange corr = torch.ones_like(advantages) pg_losses2 = -advantages * torch.clamp(ratio, 1 - cliprange_low, 1 + cliprange_high) clip_by_origin = (pg_losses2 > pg_losses1) & (response_mask > 0) cov_all = (advantages- verl_F.masked_mean(advantages, response_mask)) * (log_prob- verl_F.masked_mean(log_prob.detach(), response_mask)) cov_all[response_mask == 0] = -torch.inf cov_all[clip_by_origin] = -torch.inf clip_num = max(int(clip_ratio * response_mask.sum().item()), 1) top_k_idx = (cov_all < clip_cov_ub) & (cov_all > clip_cov_lb) & (response_mask > 0) top_k_idx = torch.nonzero(top_k_idx) if len(top_k_idx) > 0: perm = torch.randperm(len(top_k_idx)) top_k_idx = top_k_idx[perm[:min(clip_num, len(top_k_idx))]] else: top_k_idx = torch.empty((0, 2), device=cov_all.device, dtype=torch.long) corr[top_k_idx[:, 0], top_k_idx[:, 1]] = 0 pg_clipfrac = verl_F.masked_mean((corr==0).float(), response_mask) pg_losses = torch.maximum(pg_losses1, pg_losses2) * corr pg_loss = agg_loss(loss_mat=pg_losses, loss_mask=response_mask, loss_agg_mode=loss_agg_mode) return pg_loss, pg_clipfrac, ppo_kl, torch.tensor(0.) def compute_policy_loss_kl_cov( old_log_prob, log_prob, advantages, response_mask, loss_agg_mode="token-mean", k_ratio=0.0002, ppo_kl_coef=1, ): """ Compute the clipped policy objective and related metrics for Clip-Cov. Adapted from https://github.com/PRIME-RL/Entropy-Mechanism-of-RL/blob/main/verl/trainer/ppo/core_algos.py Args: old_log_prob (torch.Tensor): Log-probabilities of actions under the old policy, shape (batch_size, response_length). log_prob (torch.Tensor): Log-probabilities of actions under the current policy, shape (batch_size, response_length). advantages (torch.Tensor): Advantage estimates for each action, shape (batch_size, response_length). response_mask (torch.Tensor): Mask indicating which tokens to include in the loss, shape (batch_size, response_length). loss_agg_mode (str, optional): Aggregation mode for `agg_loss`. Defaults to "token-mean". k_ratio (float, optional): Ratio for selecting the top-k covariance values. Defaults to 0.0002. ppo_kl_coef (float, optional): Coefficient for the KL penalty term in the loss. Defaults to 1. """ assert k_ratio > 0, "k_ratio should be larger than 0." negative_approx_kl = log_prob - old_log_prob abs_kl = negative_approx_kl.abs() ratio = torch.exp(negative_approx_kl) ppo_kl_abs = verl_F.masked_mean(negative_approx_kl.abs(), response_mask) pg_losses1 = -advantages * ratio pg_losses_kl = - advantages * ratio + ppo_kl_coef * abs_kl pg_losses = pg_losses1 all_valid = (response_mask > 0) all_valid_idx = torch.nonzero(all_valid.reshape(-1), as_tuple=True)[0] all_valid_adv = advantages[all_valid].detach().reshape(-1).cpu() all_valid_logp = log_prob[all_valid].detach().reshape(-1).cpu() k = min(k_ratio, len(all_valid_adv)) if k != 0: cov_lst_all = (all_valid_adv - all_valid_adv.mean()) * (all_valid_logp - all_valid_logp.mean()) k_percent_nums = max(1, int(len(cov_lst_all) * k_ratio)) large_cov_idxs = torch.topk(cov_lst_all, k_percent_nums, largest=True).indices if len(large_cov_idxs) != 0: large_cov_idxs = all_valid_idx[large_cov_idxs] pg_losses[large_cov_idxs // advantages.shape[1], large_cov_idxs % advantages.shape[1]] = pg_losses_kl[large_cov_idxs // advantages.shape[1], large_cov_idxs % advantages.shape[1]] pg_loss = agg_loss(loss_mat=pg_losses, loss_mask=response_mask, loss_agg_mode=loss_agg_mode) return pg_loss, torch.tensor(0.), ppo_kl_abs, torch.tensor(0.) ``` in the `dp_actor.py`, we add the loss mode switch feature: ``` loss_mode = self.config.get("loss_mode", "vanilla") if loss_mode not in ["vanilla", "clip_cov", "kl_cov"]: raise ValueError(f"Unsupported loss mode: {loss_mode}. Supported modes are: 'vanilla', 'clip_cov', 'kl_cov'.") if loss_mode == "vanilla": pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower = compute_policy_loss( old_log_prob=old_log_prob, log_prob=log_prob, advantages=advantages, response_mask=response_mask, cliprange=clip_ratio, cliprange_low=clip_ratio_low, cliprange_high=clip_ratio_high, clip_ratio_c=clip_ratio_c, loss_agg_mode=loss_agg_mode, ) elif loss_mode == "clip_cov": pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower= compute_policy_loss_clip_cov( old_log_prob=old_log_prob, log_prob=log_prob, advantages=advantages, response_mask=response_mask, cliprange=clip_ratio, cliprange_low=clip_ratio_low, cliprange_high=clip_ratio_high, loss_agg_mode=loss_agg_mode, clip_ratio=self.config.clip_cov_ratio, clip_cov_lb=self.config.clip_cov_lb, clip_cov_ub=self.config.clip_cov_ub, ) elif loss_mode == "kl_cov": pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower= compute_policy_loss_kl_cov( old_log_prob=old_log_prob, log_prob=log_prob, advantages=advantages, response_mask=response_mask, loss_agg_mode=loss_agg_mode, k_ratio=self.config.k_ratio, ppo_kl_coef=self.config.ppo_kl_coef, ) ``` ### Usage Example > Provide usage example(s) for easier usage. We create a recipe (built on dapo recipe) named entropy to store our scripts, for example the `7b_kl_cov.sh`: ``` #!/usr/bin/env bash set -xeuo pipefail export WANDB_API_KEY=YOUR_WANDB_API_KEY # export VLLM_USE_V1=1 project_name='Qwen2.5-7B' exp_name='klcov' adv_estimator=grpo use_kl_in_reward=False kl_coef=0.0 use_kl_loss=False kl_loss_coef=0.0 clip_ratio_low=0.2 clip_ratio_high=0.2 max_prompt_length=$((1024 * 2)) max_response_length=$((1024 * 8)) enable_overlong_buffer=False overlong_buffer_len=$((1024 * 2)) overlong_penalty_factor=1.0 loss_agg_mode="token-mean" loss_mode="kl_cov" enable_filter_groups=False filter_groups_metric=acc max_num_gen_batches=10 train_prompt_bsz=256 gen_prompt_bsz=$((train_prompt_bsz * 3)) train_prompt_mini_bsz=256 n_resp_per_prompt=8 max_token=20480 # Ray RAY_ADDRESS=${RAY_ADDRESS:-"http://localhost:8265"} WORKING_DIR=${WORKING_DIR:-"${PWD}"} RUNTIME_ENV=${RUNTIME_ENV:-"${WORKING_DIR}/verl/trainer/runtime_env.yaml"} NNODES=${NNODES:-4} # Paths RAY_DATA_HOME=${RAY_DATA_HOME:-"${HOME}/verl"} MODEL_PATH=${MODEL_PATH:-"/YOUR_MODELPATH"} CKPTS_DIR=${CKPTS_DIR:-"/YOUR_CKPTS_PATH"} TRAIN_FILE=${TRAIN_FILE:-"/YOUR_TRAIN_FILE_PATH"} TEST_FILE=${TEST_FILE:-["/YOUR_TRAIN_FILE_PATH"]} # Algorithm temperature=1.0 top_p=1.0 top_k=-1 # 0 for HF rollout, -1 for vLLM rollout ppo_kl_coef=1 k_ratio=0.002 # Mathematically equivalent use_dynamic_bsz=True infer_micro_batch_size=null train_micro_batch_size=null offload=False HYDRA_FULL_ERROR=1 python -m recipe.entropy.main_entropy \ data.train_files="${TRAIN_FILE}" \ data.val_files="${TEST_FILE}" \ data.prompt_key=prompt \ data.truncation='left' \ data.filter_overlong_prompts=False \ data.max_prompt_length=${max_prompt_length} \ data.max_response_length=${max_response_length} \ data.gen_batch_size=${gen_prompt_bsz} \ data.train_batch_size=${train_prompt_bsz} \ data.return_raw_chat=True \ actor_rollout_ref.rollout.n=${n_resp_per_prompt} \ actor_rollout_ref.actor.use_kl_loss=${use_kl_loss} \ actor_rollout_ref.actor.kl_loss_coef=${kl_loss_coef} \ actor_rollout_ref.actor.clip_ratio_low=${clip_ratio_low} \ actor_rollout_ref.actor.clip_ratio_high=${clip_ratio_high} \ actor_rollout_ref.actor.clip_ratio_c=10.0 \ actor_rollout_ref.actor.loss_mode=${loss_mode} \ actor_rollout_ref.actor.k_ratio=${k_ratio} \ actor_rollout_ref.actor.ppo_kl_coef=${ppo_kl_coef} \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.mode=sync \ algorithm.adv_estimator=${adv_estimator} \ algorithm.use_kl_in_reward=${use_kl_in_reward} \ algorithm.kl_ctrl.kl_coef=${kl_coef} \ algorithm.filter_groups.enable=${enable_filter_groups} \ algorithm.filter_groups.metric=${filter_groups_metric} \ algorithm.filter_groups.max_num_gen_batches=${max_num_gen_batches} \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.use_dynamic_bsz=${use_dynamic_bsz} \ actor_rollout_ref.ref.log_prob_use_dynamic_bsz=${use_dynamic_bsz} \ actor_rollout_ref.rollout.log_prob_use_dynamic_bsz=${use_dynamic_bsz} \ actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${max_token} \ actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${max_token} \ actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=${max_token} \ actor_rollout_ref.model.path="${MODEL_PATH}" \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.optim.weight_decay=0 \ actor_rollout_ref.actor.optim.warmup_style=constant \ actor_rollout_ref.actor.ppo_mini_batch_size=${train_prompt_mini_bsz} \ actor_rollout_ref.actor.ppo_micro_batch_size=${train_micro_batch_size} \ actor_rollout_ref.actor.fsdp_config.param_offload=${offload} \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=${offload} \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.actor.grad_clip=1.0 \ actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.85 \ actor_rollout_ref.rollout.log_prob_micro_batch_size=${infer_micro_batch_size} \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.enable_chunked_prefill=True \ actor_rollout_ref.rollout.max_num_batched_tokens=${max_token} \ actor_rollout_ref.rollout.temperature=${temperature} \ actor_rollout_ref.rollout.top_p=${top_p} \ actor_rollout_ref.rollout.top_k="${top_k}" \ actor_rollout_ref.rollout.val_kwargs.temperature=${temperature} \ actor_rollout_ref.rollout.val_kwargs.top_p=${top_p} \ actor_rollout_ref.rollout.val_kwargs.top_k=${top_k} \ actor_rollout_ref.rollout.val_kwargs.do_sample=False \ actor_rollout_ref.rollout.val_kwargs.n=1 \ actor_rollout_ref.ref.log_prob_micro_batch_size=${infer_micro_batch_size} \ actor_rollout_ref.ref.fsdp_config.param_offload=${offload} \ actor_rollout_ref.ref.ulysses_sequence_parallel_size=1 \ actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \ reward_model.reward_manager=dapo \ reward_model.overlong_buffer.enable=${enable_overlong_buffer} \ reward_model.overlong_buffer.len=${overlong_buffer_len} \ reward_model.overlong_buffer.penalty_factor=${overlong_penalty_factor} \ trainer.logger=['console','wandb'] \ trainer.project_name="${project_name}" \ trainer.experiment_name="${exp_name}" \ trainer.n_gpus_per_node=8 \ trainer.nnodes="${NNODES}" \ trainer.val_before_train=False \ trainer.test_freq=4 \ trainer.save_freq=32 \ trainer.total_epochs=1000 \ trainer.default_local_dir="${CKPTS_DIR}" \ trainer.resume_mode=disable ``` ### Test Please refer to the Fig 11 and Tab 2 in https://arxiv.org/pdf/2505.22617 for detailed results. ### Additional Info. NA ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: Jiacheng Chen <jackchan9345@gmail.com> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-19 15:08:43 -07:00
Stefan He	ba908710ff	[doc] fix: s/Linkedin/LinkedIn (#2111 ) ### Checklist Before Starting as titled - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Use formal name of LinkedIn > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-19 12:35:32 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	18c2825c53	[trainer] fix: make `reward_extra_info` optional in `reward_result` (#2109 ) ### Checklist Before Starting - [X] Searched for similar PR(s). - [X] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Fix the error message: `Error in reward_fn: reward_extra_info`, as for some reward function implementation, only `reward_tensor` is included in the returned dictionary. - `b401382405/verl/workers/reward_manager/prime.py (L176)` - `b401382405/examples/split_placement/main_ppo_split.py (L88)` ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-06-20 01:44:25 +08:00
Alec Henx	b401382405	[tool] feat: Add Search Tool implemented with MCP (#1948 ) 1. MCP client manager which manages the connection with MCP server, such as session multiplexing, rate limit. 2. Search Tool with MCP client and [Tavily](https://app.tavily.com/home) MCP server, which delivers the same capability with Search R1 Tool. 3. A general MCP tool base for handling the logic of executing. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. 1. Register a [Tavily](https://app.tavily.com/home) account 2. Edit the `mcp_server.json` file by replacing `url` and `auth_token`. Surely, you can use your own MCP server according to the instructions provided by [FastMCP](https://gofastmcp.com/clients/transports#configuration-based-transports) (supporting SSEServer, stdioServer and streamHTTP) 3. Configure the `mcp_tool_config.yaml` file: - `mcp_server_config_path` should point to the JSON file from step 2 - `tool_selected_list` specifies the tools you need to register from the MCP server 4. (Optional) Implement a concrete instance based on `MCPBaseTool` to parse the results returned by the server Details are listed in [tutorial](https://github.com/AlecHenx/ml-recipe/blob/main/Tutorial%20for%20MCP%20Tool%20in%20veRL.md) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes part of issue #1837 - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-19 22:41:14 +08:00
Shawn/Yuxuan Tong	f9a7cf3049	[doc] fix: DAPO branch & doc (#2104 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? This PR fixes the broken link for DAPO branch and add some details to the doc. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that cover the code path.	2025-06-19 19:44:54 +08:00
xichengpro	ccefcf05ca	[doc] fix: Fix mismatched config description for `ppo_epochs` in critic (#2102 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Fix mismatched config description for `ppo_epochs` in critic ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ![image](https://github.com/user-attachments/assets/72df0d9a-3ac8-418c-b1c0-aa6e6daaccfd) > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-19 18:19:31 +08:00
Maozhou Ge	42f612dc15	[rollout] refactor: Add option for rollout_log_probs, and default as `False` (#2072 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > As discussed in https://github.com/volcengine/verl/pull/1712, we may want to minimize communication cost on large clusters, add an option for it and default as `False` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-06-19 15:16:47 +08:00
Zhen	0077f3e38f	[ci] feat: Add CI for checking irregular device api usage (#2089 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Add CI for checking irregular device api usage, suggest using api in `verl/utils/device.py` to get device name or object. Besides, this CI test case is friendly for non-linux system (e.g. windows), which is easier to debug and find out the problem. ### Test Not related. ### High-Level Design Not related. ### Specific Changes Add a new CI test case for checking irregular device api usage, suggest using api in `verl/utils/device.py`. ### API Not related. ### Usage Example ```shell python tests\special_sanity\check_device_api_usage.py --directory ./recipe` [CHECK] File D:\workspace\verl\recipe\char_count\create_dataset.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\char_count\reward_function.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\dapo\dapo_ray_trainer.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\dapo\main_dapo.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\prime\main_prime.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\prime\prime_core_algos.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\prime\prime_dp_rm.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\prime\prime_fsdp_workers.py is detected for device api usage check, check result: success. [SKIP] File D:\workspace\verl\recipe\prime\prime_ray_trainer.py is in device api usage check whitelist, checking is skipped. [CHECK] File D:\workspace\verl\recipe\prime\__init__.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\data_process.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\main_eval.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\reward_score.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\__init__.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\tasks\gpqa.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\tasks\livecodebench.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\tasks\math.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\r1\tasks\__init__.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\retool\retool_multi_turn_sft_preprocess.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\spin\core_algos.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\spin\dp_actor.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\spin\fsdp_workers.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\spin\main_spin.py is detected for device api usage check, check result: success. [SKIP] File D:\workspace\verl\recipe\spin\spin_trainer.py is in device api usage check whitelist, checking is skipped. [CHECK] File D:\workspace\verl\recipe\sppo\dp_actor.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\sppo\main_sppo.py is detected for device api usage check, check result: success. [SKIP] File D:\workspace\verl\recipe\sppo\sppo_ray_trainer.py is in device api usage check whitelist, checking is skipped. [CHECK] File D:\workspace\verl\recipe\sppo\sppo_worker.py is detected for device api usage check, check result: success. [CHECK] File D:\workspace\verl\recipe\sppo\__init__.py is detected for device api usage check, check result: success. ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-19 10:38:09 +08:00
Chi Zhang	a44b83c1a5	[misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb (#2094 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? - As title ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-18 19:16:14 -07:00
H	83cb13ad53	[recipe, doc] fix: fix dapo branch name (#2090 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? As title	2025-06-19 09:35:05 +08:00
Yuyang Ding	7dc3ee7476	[vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs (#2068 ) All scripts using LLM (Non-VLM + vllm rollout backend) break (Error details can be found at issue https://github.com/volcengine/verl/issues/1923, also mentioned in PR https://github.com/volcengine/verl/pull/1900) This error currently occurs in vllm>=0.9.0). The reason is that `disable_mm_preprocessor_cache=True` only works for VLM, and will cause errors for non-VLM models. It appears that the default value in vllm is `False` and it's recommended to be set to False, even for VLM, according to official guidelines below: `ca94d7fa00/vllm/config.py (L380C5-L382)` Therefore, it's would be better to set `disable_mm_preprocessor_cache` to `False` here.	2025-06-18 22:43:46 +08:00
Jiaming Huang	ed9cec8081	[megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text (#1999 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix qwen2_vl on plain-text data and mix data of plain-text and image-text, refer to https://github.com/volcengine/verl/pull/1286 ### Test test on gsm8k dataset and mix data of gsm8k and geo3k. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-18 22:42:50 +08:00
thelongestusernameofall	9466d371ee	[doc] chore: (baseline.md)Add scripts and logs for performance testing of GRPO-LoRA. (#2083 )	2025-06-18 21:59:05 +08:00
Yam(长琴)	d815db5ad8	[trainer] fix: Fix trainer config for `val_only` (#2084 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? fix: val_only not in trainer structure ### Test no need. ### High-Level Design no need. ### Specific Changes - verl/trainer/config/ppo_trainer.yaml ### API no need. ### Usage Example > For eval only ```python python3 -m verl.trainer.main_ppo \ ... trainer.val_before_train=True \ trainer.val_only=True \ ... ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).	2025-06-18 19:34:32 +08:00
Geaming	5d54876b48	[training_utils] feat: Add project and experiment name to tensorboard log path (#2080 ) By adding project name and experiment name to the log path, avoid all tensorboard logs being mixed in the same folder, improving log management clarity. ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-18 15:49:02 +08:00
Shawn/Yuxuan Tong	e48421160b	[doc] feat: update DAPO doc (#2081 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that cover the code path.	2025-06-18 15:47:27 +08:00
Jiarui Fang（方佳瑞）	4c2ea9aa21	[sglang] fix: AsyncSglangServer use async wake_up/sleep (#2062 ) ### Checklist Before Starting - [X] Searched for similar PR(s). - [X] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Correctly implement async wake_up and sleep for AsyncSglangServer. They are called in await manner by ActorRolloutRefWorker. > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-18 12:00:52 +08:00
H	34342365e6	[doc] test: ensure new docs are included in TOC tree (#2070 ) ### Checklist Before Starting - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` - [x] Searched for similar PR(s). ### What does this PR do? Add docs to the ToC tree of the documentation website. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-17 20:27:35 -07:00
Lumeng Wu	992ac065a1	[data] fix: multimodal overlong prompt length filtering (#2063 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Prompt length filtering should utilize the processor when handling multimodal inputs. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-18 03:38:13 +08:00
Liwei Ma	e48292f698	[perf] feat: Add verl profiling support from Nvidia Nsight System (#1820 ) Add verl profiling support from Nvidia Nsight System ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? Add verl profiling support from Nvidia Nsight System ### High-Level Design This PR add config fileds to trigger Nsight profiling. If `trainer.profile_steps` is set, Nsight system will be triggered to profiling the corresponding steps. In each task role, other config fields control also control the profiling details. The profiling tasks include the single_controller process and the worker process. Single_controller process uses the re-designed `marked_timer` to record each task range in NVTX. The worker processes dumps the GPU execution details. Since veRL has hybrid-engine mode and supports split mode, there are two profiling modes, discrete or not. Discrete mode means each task will generate a dedicate database; otherwise a whole giant database will be generated. Nsight system supports to import and align multiple databases automatically. ### Specific Changes `verl.utils.debug.profile` add general profling interface and `verl.utils.debug.nvtx_profile` implements the interface. ### API `verl.utils.debug.performance._timer` has been changed to `simple_timer`, and `marked_timer` is added to support profiler range marker. `verl.utils.debug.profile` wrappers the basic profiler interfaces, including mark__range, mark_annotate, ProfilerConfig, WorkerProfiler, and WorkerProfilerExtension. `verl.utils.debug.nvtx_profile` implements the interfaces when nvtx is available. ### Usage Example Two examples are added in `/examples/ppo_trainer/run_deepseek_math_gsm8k_megatron_nsys.sh` `/examples/ppo_trainer/run_qwen2-7b_rm_seq_balance_nsys.sh` ### Test There should be no functional changes and performance changes. ### Additional Info. - Training: both FSDP, Megatron will be affected. - Inference*: both vLLM, SGLang will be affected. ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] Add CI test(s) if necessary.	2025-06-17 11:05:16 -07:00
Maozhou Ge	8e9e73723f	[Bug] fix `None` check in DataProto print_size() (#2067 )	2025-06-17 23:18:27 +08:00
Lancer	e83215a854	[trainer] chore: Reducing the number of calls to the write (#2043 ) ### Checklist Before Starting Search for similar PR(s). ### What does this PR do? All entries are first concatenated into a single large string, then written to the file in one operation ### Test Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro15,2 Processor Name: Quad-Core Intel Core i5 Processor Speed: 2.3 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 6 MB Hyper-Threading Technology: Enabled Memory: 16 GB System Firmware Version: 2022.100.22.0.0 (iBridge: 21.16.4222.0.0,0) OS Loader Version: 580~1678 Activation Lock Status: Disabled <img width="931" alt="截屏2025-06-16 17 59 53" src="https://github.com/user-attachments/assets/66dbf3cf-e3f6-45a1-8a27-6003b96b7116" /> Co-authored-by: Lancer <maruixiang6688@gmail.com>	2025-06-17 20:04:16 +08:00
Cheetah	0333f8dafc	[hardware] feat: support qwen2_5_vl on ASCEND NPU (#1924 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? support vLMs on ASCEND NPU ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-17 19:51:06 +08:00
William Zeng	83ebd007e0	[doc] fix: Fix typo for `trainer.resume_mode` (#2054 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? `default_local_dir` is used, not `default_hdfs_dir`: `7737bf06e5/verl/trainer/ppo/ray_trainer.py (L818-L825)` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-17 11:29:35 +08:00
Kai Chen	7737bf06e5	[Doc] Update "Awesome work using verl" Section in README.md (#2045 )	2025-06-16 22:31:25 +08:00
zhihe-wang	a50000fa25	fix: TensorDict usage error (#2046 )	2025-06-16 22:30:49 +08:00
H	cfc5ff2452	[ci] fix: add tests for vllm (#2036 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Fix the failing vllm test ### Test Added one more test to make sure problematic tool class should fail during initialization ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: wuxibin <wuxibin@bytedance.com>	2025-06-16 18:27:28 +08:00
ChangyueLiao	fe8bb0d259	[CI] feat: update npu image to vLLM-ascend-v0.7.3.post1 (#2035 ) ### Checklist Before Starting [done] Search for similar PR(s). ### What does this PR do? Version of vLLM-ascend upgraded to v0.7.3.post1 to support multimodal PRs. ### Specific Changes Change .github/workflows/e2e_ascend.yml ### Checklist Before Submitting [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). Co-authored-by: liaochangyue <liaochangyue@bytedance.com>	2025-06-16 13:27:14 +08:00
杨睿	615f5f1461	[megatron] fix: dpskv3 convert src and dst mixed up bug (#2029 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? - fix DeepseekV3 convert bug introduced from https://github.com/volcengine/verl/pull/1995 which mixed up the `src` and `dst` parameters of function `safe_copy`. appologize for my mistake ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-16 10:28:15 +08:00
Chi Zhang	38d9a88170	[misc] fix: fix format (#2023 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-14 22:58:06 +08:00
Chi Zhang	27bd30dd3c	[trainer] fix: fix sft max_position_embeddings (#2019 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-14 22:40:06 +08:00
Zhen	ca65c363fb	[hardware] refactor: refactor part of device management (#1974 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [x] In format of: [modules] type: Title - [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [x] type is in `feat, fix, refactor, chore` - [x] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Refactor device management such as `torch.cuda` and `nccl` in most part of code in `verl/recipe` and `verl/verl`, which is more convinent for supporting other devices or platforms. ### Test Not related. ### High-Level Design Not related. ### Specific Changes 1. use `get_torch_device()` to get corresponding `torch.device()` object based on specific device. 2. use `get_device_id()` to get corresponding device rank index based on specific device. 3. use `get_nccl_backend()` to get corresponding nccl backend based on specific device. ### API Not related. ### Usage Example Monifications in this PR should not be perceived. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-14 20:53:47 +08:00
ShareLer	d50c6cd66e	[fsdp] fix: position_ids in qwen-vl (#1947 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? Fix two issues releated to position_ids for qwen2_VL/qwen2.5_VL: (1) Create processor with use_fast=True lead to use `Qwen2VLImageProcessorFast`, however, when determining whether to handle 3D position ids, the Qwen2VLImageProcessor was still used. (2) And 3D position is not considered in ulysses_pad. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com> Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-06-14 20:50:00 +08:00
Chi Zhang	ae75bb6af6	[data] fix: fix retool sft data source (#2018 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-14 15:12:13 +08:00
Shawn/Yuxuan Tong	6e15bbe258	[algo] fix: `vf_loss` factor (#2016 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? - Fix `vf_loss` factor: `ae528e06e9 (diff-af3da2c60785abde478f7bb68c303cd20e044e8af1b1ae93a2698f5b8fd5ed63R646-R647)` - Fix `core_algos.__all__`: ```diff - __all__ = ["register", "get_adv_estimator_fn", "AdvantageEstimator"] + __all__ = ["register_adv_est", "get_adv_estimator_fn", "AdvantageEstimator"] ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-14 14:22:46 +08:00
Shawn/Yuxuan Tong	c3ffce26d1	[ci] feat: pre-commit check all the files by default (#2017 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? We found that most files have fixed the linting errors, so it might be the time to check all the files by default. This PR 1. fixes the remaining linting errors (4409ad0070aa11027e13e26c469d46c63cdab7fb) 2. sets the pre-commit to check all the files by default (4c30c2bb99ffec50b038c2a7ff34e28062d7a168) > [!NOTE] > About merging / rebasing overhead > Similar to the previous, contributors only need to merge / rebase the files they have changed, so the overhead should be acceptable. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-14 14:22:17 +08:00
H	e2ffa1c871	[ci] chore: add code owners (#2000 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? #### initial codeowner list - Added codeowners to a small subset of verl namespaces. Some are left unassigned for now and we may add them in the future. - The code owners must demonstrate long-term commitments to the project, sufficient past contribution to the assigned module, and owner list may change if commitment changes - we yet need to have better file/folder separation for vlm specific changes #### Test structure enforcement Let the test folder structure mirror the subfolders under `verl`. Below is an example failure: ``` ❌ Test layout violations found: - tests/non_existent_namespace/test_xx.py: must be inside one of ['models', 'single_controller', 'special_distributed', 'special_e2e', 'special_sanity', 'special_standalone', 'third_party', 'tools', 'trainer', 'utils', 'version', 'workers'] (not at tests root) Guideline: Place each test file under tests/<module_name>/… where <module_name> is one of the top-level packages inside 'verl', or is explicitly listed via --allow-dirs. ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-14 10:33:45 +08:00
lxg2015	6681e25ff4	[ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError (#2010 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > when I converter hf ckpt to mcore with --test, an AttributeError raised , this PR will fixed it ```sh [rank0]: File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore [rank0]: test_conversion(megatron_model_provider, tfconfig, output_path, model) [rank0]: File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion [rank0]: assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}" [rank0]: AttributeError: 'dict' object has no attribute 'shape' ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>	2025-06-14 00:45:24 +08:00
syo093c	2c85b43299	Stabilize loss calculations by clamping KL divergence values (#1779 ) ## Stabilize PPO Loss Calculations by Clamping KL Divergence Values ### Summary This PR improves the numerical stability of PPO training in `verl` by clamping KL divergence-related values in the loss calculations. Specifically: - In `compute_policy_loss`, the `negative_approx_kl` value is now clamped to the range $[-10, 10]$ before exponentiation and further use. - In `kl_penalty` (for the `"low_var_kl"` mode), the KL value is also clamped to $[-10, 10]$ before further calculations. ### Motivation During PPO training, extreme log-probability differences can occasionally occur, leading to numerical instabilities or exploding/vanishing gradients. By clamping these values, we ensure more stable and reliable training dynamics, especially in edge cases. ### Changes - Added `torch.clamp` to `negative_approx_kl` in `compute_policy_loss`. - Added `torch.clamp` to KL values in `kl_penalty` for `"low_var_kl"` mode. - Both are clamped to the range $[-10, 10]$. ### Related Issues #891 #721 --------- Co-authored-by: syo <syo@jupiter.local>	2025-06-13 23:43:09 +08:00
杨睿	ffeaed8c41	[megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk (#1995 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? - `DeepseekV3` is too large to load and init weights, as `meta device` is a better approach. - accumulate numel to check if model weight is not missed ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-13 23:32:17 +08:00
Dong Lyu	2441d533fa	[megatron] fix: multiple key error when trying to override megatron tr… (#1990 ) fix `TypeError: verl.models.mcore.config_converter._get_mla_transformer_config() got multiple values for keyword argument ` when user trying to override megatron config ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>	2025-06-13 23:30:55 +08:00
htc070011	a90f2d8793	[tests] chore: ppo workflow runs on volcengine machine learning platform (#1979 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [x] In format of: [modules] type: Title - [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [x] type is in `feat, fix, refactor, chore` - [x] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Currently, GPU-related CI jobs in the verl repository have long execution times, which is not agile-development friendly. To address this issue, we're introducing dynamic runners in the CI workflow. These runners operate under the dedicated account for verl CI tasks on the VolcanoEngine Machine Learning Platform, alleviating GPU resource constraints in our CI pipeline. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design This PR serves as a prototype. After merging, we'll monitor its performance improvement and plan migration for other workflows accordingly. ### Specific Changes The workflow configuration requires the following adaptations to support dynamic runners: Remove container configuration in jobs and add an IMAGE environment variable to specify the job execution environment Add setup and clean jobs for runner registration and cleanup, with proper job dependency configuration ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-13 17:21:23 +08:00
none0663	8af15da77d	[megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading (#1983 ) ### Checklist Before Starting - [x] Searched for similar PR(s). ### What does this PR do? > This merge request addresses an issue encountered when using Megatron as the backend for loading models with `load_state_dict_to_megatron_gptmodel`. Specifically, when loading 32B or larger models on 64 or more GPUs, it is common to exceed the default NCCL timeout of 10 minutes(default 10 mins for [torch.distributed.init_process_group("nccl")](https://docs.pytorch.org/docs/stable/distributed.html), leading to errors during the[ dist.barrier() ](`a1a152ee4a/verl/models/mcore/loader.py (L463)`)call. `a1a152ee4a/verl/models/mcore/loader.py (L360)` `a1a152ee4a/verl/models/mcore/loader.py (L463)` To mitigate this issue, this PR introduces a configuration option to increase the NCCL timeout. This enhancement allows users to easily adjust the timeout duration when encountering errors, improving the robustness of model loading in distributed settings. Thank you for considering this change!	2025-06-13 14:52:27 +08:00
Blue Space	cfa1750eb4	[ci] feat: assignment type annotation except for assignment (#2007 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - type is in `feat, fix, refactor, chore` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Type checking seems to be too strict. ```py a = 0 data = data.to("cpu") ``` seems have no need for annotation. Assignment only print warnings. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-13 14:50:48 +08:00
H	9ec260be23	[ci] chore: add type annotation coverage check (#1935 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? See https://github.com/volcengine/verl/issues/1936 for details. Need to first wait for RFC to pass. ### High-Level Design Please see RFC for details ### Specific Changes ### Usage Example ```bash python3 type_coverage_check.py ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-13 08:00:17 +08:00
H	0de4982168	[ci] chore: add documentation coverage test (#2004 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Added a test that asserts every function and class imported by a target file verl/trainer/ppo/ray_trainer.py. We may extend this test further for all frequently inspected/extended modules. Example error msg: ``` Docstring verification failed: • /opt/tiger/open_verl/verl/trainer/ppo/ray_trainer.py:58 - function `verl.utils.seqlen_balancing.log_seqlen_unbalance` is missing a docstring. Traceback (most recent call last): File "/opt/tiger/open_verl/tests/special_sanity/validate_imported_docs.py", line 136, in <module> main() File "/opt/tiger/open_verl/tests/special_sanity/validate_imported_docs.py", line 129, in main raise Exception("❌ Docstring verification failed.") Exception: ❌ Docstring verification failed. ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-13 07:59:42 +08:00
Blue Space	8a247f7dca	[doc] fix: revert previous ray cluster description (#1998 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? [doc] fix: revert previous ray cluster description ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-12 09:29:10 -07:00
thelongestusernameofall	49b08e9509	[doc] chore: Add GRPO-LoRA Training Resource & Batch Size Tests (#1985 ) ### Checklist Before Starting - [ y] Searched for similar PR(s). - [ y] Checked PR Title format - [ y] In format of: [modules] type: Title - [ y] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ y] type is in `feat, fix, refactor, chore` - [ y] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. 1. Add: Tested the minimum resource requirements and corresponding max batch sizes for 0.5B/1.5B/3B/7B/14B/32B/72B models during GRPO-LoRA training. 2. Add: Added test scripts for GRPO-LoRA on 0.5B/1.5B/3B/7B/14B/32B/72B models. ### Checklist Before Submitting - [y ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [y ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [y ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [y ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [y ] New CI unit test(s) are added to cover the code path. - [y ] Rely on existing unit tests on CI that covers the code path.	2025-06-12 21:37:39 +08:00
H	5fa911b3ce	[ci] refactor: setup testing guidance (#1958 )	2025-06-12 06:16:58 -07:00
lxg2015	a0673f0c89	[doc] feat: Add RL-Factory agentic learning project with verl on README (#1994 )	2025-06-12 21:04:05 +08:00
none0663	4a3881b6b5	Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh (#1996 )	2025-06-12 21:02:59 +08:00
tim1024	13475caaa9	[env] fix: npu ray verion to 2.46.0 for CI problem (#1987 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [ ] type is in `feat, fix, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-12 17:22:56 +08:00
Qunhong Zeng	a1a152ee4a	[ckpt] refactor: enhance FSDP checkpoint manager flexibility (#1350 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. This PR enables `FSDPCheckpointManager` to accept optimizer and `lr_scheduler` as None, removing some existing TODO. Now `FSDPCheckpointManager` performs saving and loading according to `checkpoint_contents`, only saving/loading content in `checkpoint_contents`. This behavior is consistent with `MegatronCheckpointManager`. When allowing `optimizer` and `lr_scheduler` to be None, we can create an `FSDPCheckpointManager` for `fsdp_module` when FSDPWorkers are initialized only for rollout (`is_actor==False and is_rollout==True`). This allows users to use `main_generation.py` to directly load FSDP checkpoints without merging them into hf_model. Also, added `save_xx` property in the base class to replace all `"xx" in checkpoint_contents` statements, making the code look better. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. Currently CI should test this PR correctly. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: FSDP - Inference: VLLM ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>	2025-06-12 09:37:20 +08:00
Yan Bai	87d97c9acd	[recipe] feat: qwen2.5vl 7b report and guide (#1969 ) ### What does this PR do? add a report and a script containing tuning guide of megatron training qwen2.5vl 7b > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting	2025-06-11 20:06:19 +08:00
Jianbing-D	c8908e197c	[fsdp] feat: Memory efficient cross entropy with a linear layer fused (#462 ) Implemented forward and backward of the following compute logics, which eliminated many intermediate storage tensors, and resulted in reduced peak memory usage. ## Equivalent compute logic: ```python def run_torch_entropy(hidden: torch.Tensor, weight: torch.Tensor, labels: torch.Tensor) -> typing.List[torch.Tensor]: logits = torch.matmul(hidden.to(torch.float32), weight.to(torch.float32)) # [num_tokens, vocab_size] pd = torch.nn.functional.softmax(logits, dim=-1) # [num_tokens, vocab_size] entropy_a = torch.logsumexp(logits, dim=-1) # [num_tokens] entropy_b = torch.sum(pd * logits, dim=-1) # [num_tokens] entropy = entropy_a - entropy_b logprobs = torch.nn.functional.cross_entropy(logits, labels) # [1] logprobs = torch.neg(logprobs) return logprobs, entropy ``` ## API ```python from verl.utils.kernel import linear_cross_entropy hidden = torch.randn(num_tokens, hidden_size, dtype=torch.bfloat16, device="cuda") weight = torch.randn(hidden_size, vocab_size, dtype=torch.bfloat16, device="cuda") labels = torch.randint(0, vocab_size, (num_tokens,), device="cuda") loss, entropy = linear_cross_entropy(hidden, weight, labels, reduction="mean") ``` ## Storage and latency <img width="636" alt="image" src="https://github.com/user-attachments/assets/396b7303-a46a-46b1-a261-917fda034b02" /> ## Unit test ```shell $ cd verl/ $ python3 tests/kernel/test_memory_efficient_entropy.py ``` # NOTE For compatibility, `torch.library.triton_op` was not applied to those APIs, so that `torch.compile` might not be able to be enabled on top of it. --------- Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn> Co-authored-by: gaoziyuan.955 <gaoziyuan.955@bytedance.com> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>	2025-06-11 19:48:47 +08:00
Liyuan Liu	675a06d172	[doc] fix: FSDP typo in README.md (#1956 ) Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>	2025-06-11 13:25:37 +08:00
Joel	9e5510ab3a	[rollout] fix: set repetition_penalty=1.0 to AsyncLLM (#1949 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? - set repetition_penalty=1.0 for AsyncLLM - add missing timing metrics, close #1926 ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-11 12:53:49 +08:00
CurryRice233	0bd03d7c05	[FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy (#1927 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Add fsdp1 forward pefetch configuration. 2. Add chunk entropy computation. 3. Add torch.checkpoint to entropy computation. 4. Move data to device from `ActorRolloutRefWorker.update_actor` to `DataParallelPPOActor.update_policy`. 5. Add `npu_cross_entropy_loss` fusion kernel. ### High-Level Design 1. More detail see [FSDP forward_pefetch](https://docs.pytorch.org/docs/stable/fsdp.html#module-torch.distributed.fsdp) 2. `logits` usually is a large tensor [bsz\seq_len, voc], on `compute_entropy_from_logits` will use [bsz\seq_len, voc] * (4(float32) + 2(autocast of softmax+logsumexp) + 1(output of softmax)) memory. To reduce this memory peak, we can use chunk calculation, changing [bszseq_len, voc] to [chunk_size(2048), voc]. 3. During the training phase, `enable_gradient_checkpointing=True` is not applicable to entropy calculation, so add the recomputation function of entropy to reduce the memory peak during the training phase. 4. On `ActorRolloutRefWorker.update_actor` all batch data is moved to the device, but this is unnecessary, `DataParallelPPOActor.update_policy` will move the data to the device for each micro batch. ### Specific Changes > List the specific changes. ### API Add 3 new configurations in actor/ref, 1 new configuration in critic/reward. - actor_rollout_ref.actor.fsdp_config.forward_prefetch: False - actor_rollout_ref.actor.entropy_from_logits_with_chunking: False - actor_rollout_ref.actor.entropy_checkpointing: False - actor_rollout_ref.ref.fsdp_config.forward_prefetch: False - actor_rollout_ref.ref.entropy_from_logits_with_chunking: False - actor_rollout_ref.ref.entropy_checkpointing: False - critic.model.fsdp_config.forward_prefetch: False - reward_model.model.fsdp_config.forward_prefetch: False ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference*: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-11 12:52:19 +08:00
ZhiyuLi-Nvidia	966c84595b	[misc] doc: fix typo in deepseek v3 docker image name install.rst (#1957 )	2025-06-11 06:54:01 +08:00
vickytsang	d2665c5eb5	[hardware] fix typo in dockerfile (#1950 )	2025-06-11 06:46:46 +08:00
H	7a8122d86a	[ci] chore: minor adjustment for PR template (#1952 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? Make the PR template more concise ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-11 06:46:02 +08:00
leopardracer	22974eeaca	[trainer] docs: Fix Typos in Documentation Files (#1954 ) Description: This pull request corrects several typographical errors in the documentation: - In docs/advance/checkpoint.rst, the word "togather" has been corrected to "together". - In docs/faq/faq.rst, the word "trainning" has been corrected to "training". These changes improve the clarity and professionalism of the documentation. No functional code changes are included.	2025-06-10 12:30:22 -07:00
Blue Space	b4aa2dce8f	[fsdp] fix: fsdp entropy metrics (#1943 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? FSDP entropy calculation forgot to revert indices when use dynamic batch size. This does not affect training loss or gradient, but rather the metrics displayed on tensorboard/wandb. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-10 11:28:48 -07:00
Xiang Long	cfa4e701ac	[training_utils] Add qwen3 multi-turn sft support (#1889 )	2025-06-10 22:08:34 +08:00
Dong Lyu	3f630e741d	[megatron] fix: rope_type typo in config_converter.py (#1944 ) ![image](https://github.com/user-attachments/assets/bae987fe-9543-4da3-b3bb-5e3bd11cc551) fix TypeError: MLATransformerConfig.__init__() got an unexpected keyword argument 'rotary_type' ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-10 21:34:48 +08:00
Cheetah	74463f9129	[hardware] fix: fix issue when sp>1 on ASCEND NPU (#1942 )	2025-06-10 20:30:13 +08:00
thelongestusernameofall	f880ec4c72	[ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters (#1821 )	2025-06-10 20:29:16 +08:00
Yan Bai	85fef90d51	[megatron] feat: qwen2.5vl (#1286 ) works with qwen2.5vl 3b + geo3k <img width="1148" alt="image" src="https://github.com/user-attachments/assets/87c8746c-7f40-4189-9e82-eb1b459669f8" /> <img width="1143" alt="image" src="https://github.com/user-attachments/assets/58bce88d-c53e-45a2-b89c-bfacf4ae9e85" /> <img width="1503" alt="image" src="https://github.com/user-attachments/assets/284ef5c6-2057-4a73-ad56-bed2ef0ece43" />	2025-06-10 15:38:16 +08:00
Joel	1e1645d8e2	[rollout] feat: add async llm perf script (#1930 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Add perf scripts comparing AsyncLLM backend: - RayDistributedExecutor: default executor with compiled graph - ExternalRayDistributedExecutor: external executor with remote call ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-10 14:30:07 +08:00
jinqinn	2b5d66a721	[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 (#1836 ) I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3. #### Error Message ``` (TaskRunner pid=1256989) File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config (TaskRunner pid=1256989) tf_config = hf_to_mcore_config(hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config (TaskRunner pid=1256989) return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3 (TaskRunner pid=1256989) args = _get_base_transformer_config( (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config (TaskRunner pid=1256989) return TransformerConfig(**base_config) (TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank' ``` #### Solution The `hf_to_mcore_config_dpskv3` function should directly create an `MLATransformerConfig` instance instead of going through `_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters. --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>	2025-06-10 13:01:43 +08:00
ShareLer	ea121f0d39	fix sequence parallelism conflict in kimiVL (#1899 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0) ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>	2025-06-10 09:45:43 +08:00
Yanbin Jiang	6d8b2fe37e	[sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 (#1852 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? SGLang multiturn rollout relies on bos and eos token in the tool parser to retrieve the right tool parser. SGLang==0.4.6.post5 changed those tokens for Qwen2 parser([PR](https://github.com/sgl-project/sglang/pull/6597/files#diff-725eae87b1043c063d85c22b71f415941e2983c60eb52ef1a0d0be89f13b1110)) so it breaks async rollout. This PR updates the logic in Verl to fix the issue. Error example: ``` Traceback (most recent call last): File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 28, in main run_ppo(config) File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 40, in run_ppo ray.get(runner.run.remote(config)) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(args, kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, *kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=64101, ip=100.96.58.10, actor_id=5f74f8de7594144240b2dbcf01000000, repr=<main_ppo.TaskRunner object at 0x7bab01cdd9f0>) File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 155, in run trainer.init_workers() File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 837, in init_workers self.actor_rollout_wg.init_model() File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 51, in __call__ output = ray.get(output) ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_init_model() (pid=84769, ip=100.96.58.10, actor_id=d2da89f7763ecc0e0681bcdd01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7409bc4c7fa0>) File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 645, in func return getattr(self.worker_dict[key], name)(args, *kwargs) File "/home/jobuser/resources/verl/single_controller/base/decorator.py", line 534, in inner return func(args, kwargs) File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 564, in init_model self.rollout, self.rollout_sharding_manager = self._build_rollout(trust_remote_code=self.config.model.get("trust_remote_code", False)) File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 474, in _build_rollout rollout = SGLangRollout( File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 161, in __init__ ) = self._initialize_tools(config, tokenizer) File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 384, in _initialize_tools tool_call_parser_type = get_tool_call_parser_type(tokenizer) File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 113, in get_tool_call_parser_type raise ValueError(f"No tool call parser found for tokenizer {tokenizer}") ValueError: No tool call parser found for tokenizer Qwen2TokenizerFast(name_or_path='/shared/public/elr-models/Qwen/Qwen2.5-7B-Instruct/52e20a6f5f475e5c8f6a8ebda4ae5fa6b1ea22ac', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<\|im_end\|>', 'pad_token': '<\|endoftext\|>', 'additional_special_tokens': ['<\|im_start\|>', '<\|im_end\|>', '<\|object_ref_start\|>', '<\|object_ref_end\|>', '<\|box_start\|>', '<\|box_end\|>', '<\|quad_start\|>', '<\|quad_end\|>', '<\|vision_start\|>', '<\|vision_end\|>', '<\|vision_pad\|>', '<\|image_pad\|>', '<\|video_pad\|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={ 151643: AddedToken("<\|endoftext\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<\|im_start\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<\|im_end\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<\|object_ref_start\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<\|object_ref_end\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<\|box_start\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<\|box_end\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<\|quad_start\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<\|quad_end\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<\|vision_start\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<\|vision_end\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<\|vision_pad\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<\|image_pad\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<\|video_pad\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151659: AddedToken("<\|fim_prefix\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151660: AddedToken("<\|fim_middle\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151661: AddedToken("<\|fim_suffix\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151662: AddedToken("<\|fim_pad\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151663: AddedToken("<\|repo_name\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151664: AddedToken("<\|file_sep\|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), } ) ``` ### Test Tested with both SGLang==0.4.6.post4 and SGLang==0.4.6.post5, successfully executed multiturn RL experiments that failed with SGLang==0.4.6.post5 before this change . ### Additional Info. - Inference**: SGLang ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-10 09:44:33 +08:00
H	581735a5d8	[rollout] fix: fix async llm config passing (#1933 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [ ] In format of: [modules] type: Title - [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt` - [ ] type is in `feat, fix, doc, refactor, chore` - [ ] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp] feat: xxx` ### What does this PR do? Here we should pass full config instead of the sub config. Consumed here: https://github.com/volcengine/verl/blob/main/verl/workers/rollout/async_server.py#L111 Also, move the sandbox test another folder to mirror source code folder structure. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-10 09:41:55 +08:00
Yanbin Jiang	16662ceff4	[sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking (#1668 )	2025-06-10 00:13:56 +08:00
Blue Space	d843f95992	[CI] feat: hint PR title in template (#1925 )	2025-06-09 23:39:01 +08:00
Yaowei Zheng	60138ebd19	[worker] fix: do not break dynamic bsz in dp critic (#1922 ) ### What does this PR do? Fix bug introduced in #1839 ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-09 15:26:59 +08:00
Seungyoun, Shin	af5dbec99b	[doc] Add TTS model GRPO tuning project with verl on README (#1918 ) ### Checklist Before Starting * [x] Search for similar PR(s). ### What does this PR do? > Integrates Korean TTS fine-tuning using GRPO optimization based on LLASA-1B models, significantly improving synthesis quality by reducing Character Error Rate (CER). ### High-Level Design > This PR enhances the existing TTS model training pipeline by introducing a reinforcement learning optimization (GRPO) step using Whisper's NLL and CER metrics. ### Specific Changes * Adds GRPO reward calculation based on Character Error Rate (CER) and Negative Log-Likelihood (NLL). * Implements a Whisper server to compute NLL metrics efficiently. * Provides scripts for training (`run_llasa_tts_grpo.sh`) and data preprocessing (`tts.py`). ### API > No changes to existing public APIs. Internal additions only. ### Usage Example ```bash CUDA_VISIBLE_DEVICES=2 python3 tts/whisper_server.py --port 8001 --model large-v3 WHISPER_SERVER=http://localhost:8001 nohup bash ./examples/grpo_trainer/run_llasa_tts_grpo.sh > verl_grpo_1b.log 2>&1 & ``` ### Test > Evaluated on internal dataset: * LLasa1B + 15K Korean dataset baseline CER = 0.0266 * LLasa1B + 15K Korean dataset + GRPO optimization CER = 0.0204 The reduction in CER demonstrates the effectiveness of the GRPO optimization. ### Additional Info. * Issue Number: N/A * Training: FSDP, Megatron (as relevant) * Inference: vLLM, SGLang (as relevant) ### Checklist Before Submitting * [ ] Read the [[Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). * [ ] Apply [[pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). * [ ] Add `[BREAKING]` to the PR title if it breaks any API. * [ ] Update the documentation about your changes in the [[docs](https://github.com/volcengine/verl/tree/main/docs)](https://github.com/volcengine/verl/tree/main/docs). * [ ] New CI unit test(s) are added to cover the code path. * [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-09 13:52:23 +08:00
Yang Wang	6baa44d605	revert HIP_VISIBLE_DEVICES in worker.py (#1920 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Sorry, I found in my tests that with the latest branch and the AMD-modified version of Ray (https://github.com/ray-project/ray/pull/53531/files), it’s no longer necessary to override HIP_VISIBLE_DEVICES here. For the sake of keeping the code clean, could you please revert this change? ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-09 13:51:50 +08:00
ShareLer	cc9bc3fc21	[bugfix] fix megatron model merger (#1774 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix megatron model merger. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Fix get rank method to support just TP. - Fix state_dict keys after convert. - Add mla/moe convert support. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test Test with Qwen3-8B and Qwen2.5-7B. ### Additional Info. - Issue Number: Fixes issue #1757 - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>	2025-06-09 13:28:24 +08:00
杨睿	5aa1b046b4	[ppo] feat: add critic valuehead model support for multi-modal PPO (#1839 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? - 支持多模的 PPO，主要是复用 trl 的 `AutoModelForCausalLMWithValueHead` 作为 critic valuehead model ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-06-09 10:22:54 +08:00
Costa Huang	40f5db4a6e	[recipe] doc: Rename READMD.md to README.md (#1917 ) Fix typo.	2025-06-09 09:35:01 +08:00
Yang Wang	8e82bf196c	set CUDA and HIP VISIBLE DEVICES (#1914 )	2025-06-09 07:57:47 +08:00
H	916ab431b7	[trainer] refactor: refactor reward manager, advantage estimator (#1916 )	2025-06-09 07:57:16 +08:00
David Klank	2bd291e549	fix typos (#1912 ) Hey devs! Fixed typo recipe/spin/dp_actor.py slient - silent recipe/spin/spin_trainer.py differnt - different	2025-06-08 15:41:11 +08:00
H	e8645158a3	[trainer] doc: enforce documentation for config fields (#1910 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Force documentation for the trainer yaml file ### Test Added a test to enforce it. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-08 12:18:56 +08:00
Chi Zhang	450d479b38	[recipe] feat: char count (#1908 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add a tiny recipe char count that can be run on a consumer GPU with only 8GB. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-07 20:57:33 -07:00
Rocke Dong	f8df864d5f	[rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager (#1909 ) ### What does this PR do? > Fix bug on DAPO lora training, based on currently main branch, the entrance file is recipe/dapo/test_dapo_7b_math_lora.sh ### Specific Changes > Just 2 line code fix as explain in the "Test" module below ### Usage Example adds actor_rollout_ref.model.lora_rank=8 \ into the "recipe/dapo/test_dapo_7b_math_lora.sh" file to enable lora RL training. ```bash bash recipe/dapo/test_dapo_7b_math_lora.sh ``` ### Test a test .sh file: recipe/dapo/test_dapo_7b_math_lora.sh to test dapo with lora training. Before this change, the training have the error > TypeError: argument of type 'torch.device' is not iterable in the line code below orig_dev = "cpu" if "cpu" in next(model.parameters()).device else "cuda" This error is caused the string "cpu" is not in the class "torch.device" which is not a string or not iterable. After this change, the lora RL training starts normally. --------- Co-authored-by: qichang.dong <dongqichang@ecmas.ai>	2025-06-08 09:14:03 +08:00
Yaowei Zheng	59379539a0	fix qwen2vl grpo for vllm 0.9 and transformers 4.52 (#1880 ) ### What does this PR do? Fixes #1710 ![image](https://github.com/user-attachments/assets/185d37b6-a4fe-4e89-8eed-72f4477937e8) 1. vLLM 0.9.0 does not support `limit_mm_per_prompt=None`; this parameter must be a `dict`. 2. Transformers 4.52.* changes the weight keys in the model state dict, causing mismatches with vLLM's weight loader. See also: https://github.com/huggingface/transformers/pull/38385 https://github.com/vllm-project/vllm/pull/19054 https://github.com/vllm-project/vllm/pull/19151 ### Test run `bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh` ![image](https://github.com/user-attachments/assets/b8137c87-f250-40d0-b9c3-c3f44f1a40a1) ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-07 18:09:06 +08:00
H	897619d738	[tests] chore: add PR title check (#1901 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-07 18:08:14 +08:00
David Klank	5bf69923f2	fix errors in megatron_workers.py (#1906 ) Hey team! Fixed errors in verl/workers/megatron_workers.py `startegy` - `strategy` x3	2025-06-07 16:56:28 +08:00
Blue Space	01ae0198ff	[feat][BREAKING] Megatron: Support learning rate scheduler (#1701 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support lr scheduler in megatron ### High-Level Design Still got some difference with FSDP's optimizer in APIs ### Specific Changes > List the specific changes. ### API ```yaml optim: lr: 1e-6 clip_grad: 1.0 total_training_steps: -1 # must be override by program lr_warmup_init: 0.0 # initial learning rate for warmup, default to 0.0 lr_warmup_steps: -1 # Prioritized. Negative values mean delegating to lr_warmup_steps_ratio. lr_warmup_steps_ratio: 0. # the total steps will be injected during runtime lr_decay_steps: null lr_decay_style: linear # select from constant/linear/cosine/inverse_square_root min_lr: 0.0 # minimum learning rate, default to 0.0 weight_decay: 0.01 weight_decay_incr_style: constant # select from constant/linear/cosine lr_wsd_decay_style: exponential # select from constant/exponential/cosine lr_wsd_decay_steps: null use_checkpoint_opt_param_scheduler: False # use checkpoint optimizer parameter scheduler ``` Notice that there are some differences in APIs between Megatron optimizer and FSDP optimizer. - Megatron optimizer scheduler names the period after lr_warmup as lr_decay_steps, so the ``warmup_style`` actually means the style of lr decay after warmup. - Megatron optimizer also support weight decay decay mechanism - ``use_checkpoint_opt_param_scheduler`` determines whether to use the checkpoint optimizer parameter scheduler. If set to True, the optimizer parameter scheduler will be saved in the checkpoint and loaded from the checkpoint during resuming training. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-06-07 13:19:09 +08:00
Wayne	01fee0a231	[feat] add validation shuffle (#1886 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In scenarios involving multiple validation sets, where the difficulty levels of these sets differ significantly and the generated content lengths vary notably, the order in which the validation sets are processed can have a substantial impact on the validation speed. ### High-Level Design add validation shuffle ### Usage Example > Provide usage example(s) for easier usage. ```python validation_shuffle: True ``` ### Test Validation speed increase of over 10%. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-07 13:12:03 +08:00
vickytsang	d02b3d5134	Dockerfile.rocm update tensordict==0.6.2 (#1898 ) ### Checklist Before Starting - [x ] Search for similar PR(s). ### What does this PR do? Update tensordict version Resolve PPO training error + python3 -m verl.trainer.main_ppo algorithm.adv_estimator=gae data.train_files=/root/data/gsm8k/train.parquet data.val_files=/root/data/gsm8k/test.parquet data.train_batch_size=256 data.max_prompt_length=512 data.max_response_length=512 data.return_raw_chat=True actor_rollout_ref.model.path=/root/models/Qwen/Qwen2.5-0.5B actor_rollout_ref.model.use_liger=True actor_rollout_ref.actor.optim.lr=1e-6 actor_rollout_ref.model.use_remove_padding=True actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1 actor_rollout_ref.actor.ppo_mini_batch_size=128 actor_rollout_ref.actor.use_dynamic_bsz=False actor_rollout_ref.actor.ppo_max_token_len_per_gpu=32768 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 actor_rollout_ref.actor.fsdp_config.param_offload=False actor_rollout_ref.actor.fsdp_config.optimizer_offload=False actor_rollout_ref.actor.use_kl_loss=False actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=32768 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 actor_rollout_ref.rollout.tensor_model_parallel_size=2 actor_rollout_ref.rollout.name=vllm actor_rollout_ref.rollout.gpu_memory_utilization=0.8 actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=32768 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2 critic.optim.lr=1e-5 critic.ulysses_sequence_parallel_size=1 critic.model.use_remove_padding=True critic.optim.lr_warmup_steps_ratio=0.05 critic.model.path=/root/models/Qwen/Qwen2.5-0.5B critic.model.enable_gradient_checkpointing=False critic.use_dynamic_bsz=False critic.ppo_max_token_len_per_gpu=32768 critic.ppo_micro_batch_size_per_gpu=2 critic.model.fsdp_config.param_offload=False critic.model.fsdp_config.optimizer_offload=False reward_model.enable=True reward_model.ulysses_sequence_parallel_size=1 reward_model.model.path=/root/models/Qwen/Qwen2.5-0.5B reward_model.model.use_remove_padding=True reward_model.model.fsdp_config.param_offload=True reward_model.use_dynamic_bsz=False reward_model.forward_max_token_len_per_gpu=32768 reward_model.micro_batch_size_per_gpu=2 algorithm.use_kl_in_reward=False trainer.critic_warmup=0 'trainer.logger=[console]' trainer.project_name=verl-test trainer.experiment_name=qwen2.5-0.5b-model-reward-minimal trainer.nnodes=1 trainer.n_gpus_per_node=8 trainer.val_before_train=False trainer.test_freq=False trainer.save_freq=-1 trainer.resume_mode=disable trainer.total_epochs=2 trainer.total_training_steps=1 Traceback (most recent call last): File "<frozen runpy>", line 189, in _run_module_as_main File "<frozen runpy>", line 112, in _get_module_details File "/sgl-workspace/verl/__init__.py", line 22, in <module> from .protocol import DataProto File "/sgl-workspace/verl/protocol.py", line 30, in <module> import tensordict File "/usr/local/lib/python3.12/dist-packages/tensordict/__init__.py", line 6, in <module> import tensordict._reductions File "/usr/local/lib/python3.12/dist-packages/tensordict/_reductions.py", line 11, in <module> from tensordict._lazy import LazyStackedTensorDict File "/usr/local/lib/python3.12/dist-packages/tensordict/_lazy.py", line 38, in <module> from tensordict.memmap import MemoryMappedTensor File "/usr/local/lib/python3.12/dist-packages/tensordict/memmap.py", line 25, in <module> from torch.multiprocessing.reductions import ForkingPickler ImportError: cannot import name 'ForkingPickler' from 'torch.multiprocessing.reductions' (/usr/local/lib/python3.12/dist-packages/torch/multiprocessing/reductions.py) ### Checklist Before Submitting - [x ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x ] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Vicky Tsang <vtsang@amd.com>	2025-06-07 08:09:12 +08:00
H	69c2a1a81f	[release] chore: bump version to v0.4 (#1897 )	2025-06-07 07:49:37 +08:00
H	043c72bc7b	[docs] moe: add docs for deepseek 671b and qwen-236b (#1896 )	2025-06-07 07:49:01 +08:00
Joel	457f4d2a20	[rollout] feat: follow OpenAI tool calling schema in chat scheduler (#1831 )	2025-06-07 07:47:47 +08:00
Blue Space	70bd3d3d6b	[feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding (#1834 )	2025-06-07 07:45:50 +08:00
Chi Zhang	c0f5ccbe5d	[recipe] retool: add retool sft (#1828 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? - Add retool qwen3 dataset and sft - The original retool doesn't follow standard qwen multiturn chat template. In this PR, we recompile the dataset and add a SFT script to train QWen-8b ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-06 10:39:51 -07:00
Chi Zhang	cfead14adf	[misc] fix: fix indent (#1891 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-06 21:11:40 +08:00
TechZhu	2b9a440bb6	update dapo trainer process (#1888 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. To handle the process bar update frequency when training in DAPO. ### Specific Changes > List the specific changes. 1.When we set algorithm.filter_groups.enable=true, the DAPO training process will skip samples whose advantages are all 0 or 1. 2.However, the progress bar does not update simultaneously, which can confuse users. 3.This merge request addresses the issue by updating the progress bar before filtering the samples. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. Co-authored-by: techzhu <techzhu@tencent.com>	2025-06-06 20:30:50 +08:00
CurryRice233	2038048184	DAPO npu support (#1858 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support DAPO algorithm on npu ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. change `cuda` hardcode to get_torch_device() 2. add `device_name` parameter to RayDAPOTrainer ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-06 20:28:59 +08:00
OC	fe23634116	[rollout] feat: ChatScheduler requests sglang fully async (#1769 ) Changed sglang rollout pipeline to async method to have better performance. resolved issue #1721 ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? In previous version, the sglang async_generate is called with a sync ray actor with lots of sync functions, and resulted poor performance ( GPU SM is 20% in TP2) This PR changed the while pipeline to async method. Performance comparsion to previous "sglang_async" mode: \| sglang_async (old) \| async （new） \| % faster -- \| -- \| -- \| -- timing_s/gen \| 95 \| 25 \| 73.68% timing_s/step \| 170 \| 90 \| 47.06% perf/throughput \| 2700 \| 4000 \| 48.15% ### High-Level Design see https://github.com/volcengine/verl/pull/1698 This is a follow up task from above PR. ### Usage Example examples/grpo_trainer/run_qwen2-7b_seq_balance.sh ### Test .github/workflows/e2e_ppo_trainer.yml ### Additional Info. - Issue Number: Fixes issue #1721 ### Checklist Before Submitting - [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ done ] Add `[BREAKING]` to the PR title if it breaks any API. - [ done ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ done ] Add CI test(s) if necessary.	2025-06-06 16:46:30 +08:00
OC	22da46bc1f	Add rollout Module Development Progress & Roadmap (#1884 ) Updated readme for rollout related ppcoming features and changes.	2025-06-06 16:01:29 +08:00
omahs	4653f82fa5	fix: typos (#1879 ) fix: typos	2025-06-06 15:25:26 +08:00
OC	9afa8d6dff	fix error when ci failed by incorrect sgl-kernel version (#1872 ) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? Fix ci failure from incorrect sgl-kernel version in docker image: ``` File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 647, in assert_pkg_version raise Exception( Exception: sgl-kernel is installed with version 0.1.0, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall` ```	2025-06-06 13:55:08 +08:00
Chi Zhang	bd94bd61fe	[misc] fix: fix flops for H200 (#1877 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-05 22:29:35 -07:00
OC	dafd33de59	[ray] profiler: add timeline option for performance analyse (#1768 ) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? Add an option to generate ray timeline for performance analysing. ### Usage Example Run a job with this option. It can generate the trace file at the end of training. You can view it from https://ui.perfetto.dev/ ``` python3 -m verl.trainer.main_ppo \ ray_init.timeline_json_file=/tmp/timeline.json \ ... ``` <img width="1347" alt="截屏2025-05-30 13 13 56" src="https://github.com/user-attachments/assets/ec57ef94-3ecd-467e-b33f-ae0da3a54c49" />	2025-06-05 20:03:34 -07:00
Blue Space	78240de7dd	[DeepSeek][Docker Image] Update dpsk image (#1870 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Split docker image used by CI and deepseek-V3 running, using cudnn 9.8 to support MLA. New Image is ``whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.1-te2.3-deepseekv3``. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-06 09:35:29 +08:00
Chi Zhang	f1fd0f095d	[single controller] feat: mitigate pickle cost (#1862 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? ray put all the args in advance to avoid duplicate serialization cost for megatron dispatch. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-06 09:34:58 +08:00
Yuhua Jiang	aa11dd19b3	[ppo critic] fix EOS token value to zero (#1850 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. For PPO critic training, the value of EOS tokens should be zero and should not be fitted. However, the current implementation does not mask the EOS token values, resulting in non-zero EOS token values. Although the learning target is zero, when PPO GAE lambda < 1, this affects the advantage calculation for tokens preceding EOS, thereby impacting performance. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>	2025-06-06 01:05:18 +08:00
shizhediao	45aec859d6	Fixed URL for ProRL in README.md (#1866 ) Fixed URL for ProRL in README.md	2025-06-05 22:43:52 +08:00
黄石	3870869cc0	Make DeepSeek 671B GRPO example more GPU memory friendly (#1867 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - Run on 512 GPUs with TP1PP16EP32, 2k input + 4k output - Add some tips on memory saving ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-05 22:43:29 +08:00
ChangyueLiao	b23829704f	[CI]feat:Add NPU CI action and fallback SFT's e2e test defaults to FSDP1 (#1823 ) ### Checklist Before Starting - [done] Search for similar PR(s). ### What does this PR do? Mirror the CI for VeRL to run on the NPU and fallback the e2e test of the SFT to FSDP1, as the NPU is not currently adapted for FSDP2 ### Specific Changes Add `.github/workflows/e2e_ascend.yml` Change `tests/e2e/sft/run_sft.sh` ### Checklist Before Submitting - [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). --------- Co-authored-by: liaochangyue <liaochangyue@bytedance.com>	2025-06-05 22:03:20 +08:00
黄石	a6f15ae0ad	Add DeepSeek 671B GRPO example (#1771 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add an example for DeepSeek 671B GRPO ### Specific Changes - Need https://github.com/volcengine/verl/pull/1694 - Set `torch._dynamo.config.suppress_errors = True` at entrypoint, if ``` ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception traceback: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/ray/exceptions.py", line 46, in from_ray_exception return pickle.loads(ray_exception.serialized_exception) TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception' ``` ### Additional Info. - vllm as backend, sglang working in process (https://github.com/sgl-project/sglang/issues/6762). Merged when both backends are ready. - For DeepSeek-V3-0324 at `gsm8k`, the reward starts from 0.8 and saturated at around 0.95 using only 3 steps. - Memory peaks around 90GB during actor update (1.5k input + 2.5k output), consider using TP/ETP for a lower requirement. - For gsm8k training using this yaml, ![image](https://github.com/user-attachments/assets/d16cf959-5845-4dd0-95af-07fc35820f18) ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-05 22:00:49 +08:00
Yan Bai	2f050a8516	[Mcore] dpskv3 671B (#1694 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? support training with deepseekv3 671B support MTP on top of #1284 now it is functional ready for 671B, still lacking of practice > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-05 21:55:04 +08:00
Yan Bai	28587336a9	[megatron] moonlight fix per_tensor_generator (#1772 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? there is a tricky bug in per_tensor_generator with model.named_parameter(). "decoder.layers[n].mlp.router.expert_bias" in GPTModel is not registered in named_parameter, but in state_dict(). Before this fix, the router_bias or `model.layers.{layer_number}.mlp.gate.e_score_correction_bias` is not transfered from m-core to infer engine. > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-05 19:56:52 +08:00
rj42	81acbb2cc5	[bugfix] Force create checkpoint directory before saving dataloader state. (#1625 ) Fix training crash due to missing checkpoint directory We encountered a training crash with error: "RuntimeError: Parent directory /workspace/ckpts/global_step_20 does not exist". It appears that `self.actor_rollout_wg.save_checkpoint`, which should create the checkpoint directory, might be running asynchronously and doesn't complete creating the folder in time. This change explicitly forces creation of the directory before saving the dataloader state to prevent this race condition. ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: [1657](https://github.com/volcengine/verl/issues/1657) - Training: FSDP/Megatron - Inference: vLLM ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-06-05 19:33:45 +08:00
thelongestusernameofall	f7f8b042d5	[feat] Add support for FSDP2 in GRPO-LoRA (#1844 ) 1. Add: Add support for FSDP2 in GRPO-LoRa 2. Format: Automatic code formatting changes initiated by the pre-commit tool 3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into the CI pipeline.	2025-06-05 19:32:53 +08:00
shizhediao	2fe47f71ab	Add ProRL to README.md (#1855 ) ProRL is a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. The empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@k evaluations, including scenarios where base models fail entirely regardless of the number of attempts. It is developed based on Verl. Link: https://arxiv.org/abs/2505.24864	2025-06-05 17:51:11 +08:00
Blue Space	2a386cf0e9	[BugFix][CI] Megatron: add ep CI (#1726 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix ep bug and try to add CI with 15B model, finding smaller models which are more convenient to test. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-06-05 14:02:00 +08:00
Hongpeng Guo	5b66489b52	[refactor] Align name_prefix same behavior for pool and wg (#1851 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Follow-up of #1838, make the `name_prefix` mechanism same for `RayWorkerGroup` and `RayResourcePool`, default to be `None` and will be initialized randomly. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-06-05 11:52:28 +08:00
黄石	565c496f87	fix batch size validation for Megatron (#1811 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix batch size validation for Megatron. `real_train_batch_size` should be divisible by dpmbs. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference*: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-05 11:18:12 +08:00
Hongpeng Guo	2ed63bbf39	[fix] Adding a default value for `RayWorkerGroup.from_detached(name_prefix=None)` (#1838 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In #1443, a new argument `name_prefix` was introduced for function `from_detached` without setting a default value. This PR sets its default value as `None`, in which case, `RayWorkerGroup` will generate a random string as the prefix. This fix makes the API compatible with existing usage, and the users don't need to worry about this new args when a `name_prefix` is not not context necessary. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-06-04 20:47:58 +08:00
none0663	5580b0b057	Add log_generations_to_tensorboard Function (#1841 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - This PR introduces the `log_generations_to_tensorboard` function. When` trainer.log_val_generations` is greater than 0 and `tensorboard` is selected in` trainer.logger,` the function writes the generations to`generations/text_summary` in TensorBoard. - I have already tested this in the experiment, and the resulting TensorBoard is shown in the image below: <img width="1652" alt="WeChatWorkScreenshot_d6ac53ed-253e-44a1-b641-4542f3eb1db0" src="https://github.com/user-attachments/assets/78dcb226-0ada-4af6-9231-f40c558eb3d5" /> - The training scripts is shown below: ``` set -x model_path=$MODEL_PATH python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=/data/gsm8k/train.parquet \ data.val_files=/data/gsm8kest.parquet \ data.train_batch_size=1024 \ data.max_prompt_length=512 \ data.max_response_length=256 \ data.filter_overlong_prompts=True \ data.truncation='left' \ actor_rollout_ref.model.path=$model_path \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=128 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.8 \ actor_rollout_ref.rollout.n=6 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.log_val_generations=10 \ trainer.logger=['console','tensorboard'] \ trainer.project_name='verl_grpo_example_gsm8k' \ trainer.experiment_name='qwen2_7b_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=1 \ trainer.save_freq=20 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ```	2025-06-04 20:41:56 +08:00
Nile Zhou	fdf7d513e4	fix fsdp train save checkpoint bug (#1843 ) ### Checklist Before Starting - [ Y] Search for similar PR(s). ### What does this PR do? Fix the save_checkpoint logic (otherwise it will save checkpoint at every step !) ### Checklist Before Submitting - [ Y] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [Y ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). Co-authored-by: zhouyi9 <zhouyi9@APBBS24115035.local>	2025-06-04 20:39:53 +08:00
Zhiwei He	7c49f7098a	Add DeepMath to awesome work list (#1847 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Adding DeepMath to the README as a list of work that used veRL ### High-Level Design > 1-line update. ### Specific Changes > only changed the readme.md (1-line update). ### API > No. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-04 20:38:50 +08:00
Xiang Long	0a5c491639	[sglang] Fix for broadcast_pyobj nccl timeout in sgl rollout with larger model (e.g. 32B) (#1846 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. A method proposed by Congkai Xie to avoid sglang_rollout broadcast_pyobj with nccl timeout error. ### Specific Changes > List the specific changes. - Add dist.barrier() before `broadcast_pyobj` to avoid nccl communication waiting happens at same time TP0 start rollout ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: close #1420 - Training: none - Inference: SGLang ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-04 20:38:01 +08:00
OC	fba8f3463a	fix sglang e2e_sppo test (#1832 ) ### Checklist Before Starting - [done] Search for similar PR(s). ### What does this PR do? Fix error in e2e_sppo CI test Exception: sgl-kernel is installed with version 0.0.9.post2, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall` For example: https://github.com/volcengine/verl/actions/runs/15431843178/job/43430980736?pr=1769	2025-06-04 11:44:34 +08:00
Jianhao Yan	996b945e74	Add LUFFY to awesome work list #1608 (#1816 ) ### What does this PR do? > Adding LUFFY to the README as a list of work that used veRL ### High-Level Design > 1-line update. ### Specific Changes > only changed the readme.md (1-line update). ### API > No. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-04 08:38:57 +08:00
H	adf775c43b	[logging] misc: update PR template and fix lint (#1806 )	2025-06-04 07:53:12 +08:00
Hugh Liu	15ca90ceaa	[sft] trainer: port features (dtype, save_freq, test_freq) from PPO config to SFT config (#1451 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? enable (dtype, save_freq, test_freq) config to sync SFTTrainer with PPOTrainer ### Specific Changes 1. add new config items 2. sync `defaul_local_dir`, `default_hdfs_dir` and `logger` with PPO config ```yaml model: fsdp_config: model_dtype: fp32 trainer: save_freq: -1 # unit: iteration test_freq: -1 ``` ### Usage Example Just works same as `main_ppo` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-03 11:50:24 -07:00
Dongfu Jiang	299dde1f86	[docs] Add verl-tool in the list of "awesome works using verl" (#1829 )	2025-06-04 00:44:01 +08:00
Guangming Sheng	ed3aec22df	[misc] fix fsdp2 has no _fsdp_wrapped_module in lora collect param (#1822 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? [misc] fix fsdp2 has no _fsdp_wrapped_module in lora collect param ### Additional Info. - Training: FSDP - Inference: vllm ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-03 19:20:27 +08:00
OC	de5b2f1ca7	[Tracking]feat: support wandb proxy (#1817 ) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? For environment that can not access wandb directly, you can add a proxy setting to wandb without impact to other https requests. ### Usage Example see docs/faq/faq.rst	2025-06-03 16:58:25 +08:00
Chi Zhang	8540d6ce5f	[config] feat: Hardcode moe_router_load_balancing_type to none (#1814 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? - Hardcode moe_router_load_balancing_type to none as it hurts perf in QWen3MoE - We can provide a config to set it ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-03 16:36:34 +08:00
Chi Zhang	2fbdcb38fb	[script] feat: upload qwen3 236b script (#1813 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Upload a script that uses QWen3 236b to train on DAPO dataset. Note that we set the response length to 4k. This results in many truncations at the beginning. So the training dynamic acts as using RL to compress the math capabilities of QWen3 236b into 4k response instead of verbose thinking. We can achieve 0.5 on AIME'24 after 30 steps. Didn't train for longer. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-03 15:55:27 +08:00
Chi Zhang	99100867be	[optimization] feat: move kv cache wakeup after model weights release (#1810 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Details are in the comments inside the code. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-03 14:48:57 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	668e5f617b	[megatron] fix: critic and reward model load tokenizer from config (#301 ) Currently, the worker will fail if the critic or reward model path doesn't contain a tokenizer. This PR tries to fix this by loading tokenizer from the config for the previously mentioned case. - For the critic model, we fall back to load from `critic.model.tokenizer_path`. - For the reward model, we first fall back to load from `reward_model.model.rm_tokenizer`, throw an error if that is not set. --------- Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>	2025-06-03 13:26:00 +08:00
Shawn/Yuxuan Tong	263115cd9d	[dev] fix: note that DP balancing doesn't affect advantage calculation (#1809 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes the comments about DP balancing. btw, it adds the DP balancing option in the PRIME trainer, while keeping the default value as `False`. ### Additional Info. - Issue Number: #1718 - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-06-03 10:20:54 +08:00
Zefan Wang	7695b8db43	[recipe] prime: Code example for PRIME (#1714 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add running example for PRIME algorithm on coding data of [Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data) ### Specific Changes > Runing example > Log ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-06-02 19:08:11 -07:00
Yaru Hao	a4b1bb7fb9	[algo] OPO: add implementations and descriptions for OPO (On-Policy RL with Optimal Reward Baseline) (#1796 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add implementations and descriptions for OPO (On-Policy RL with Optimal Reward Baseline) ### Specific Changes > Add docs of OPO in `docs/algo/opo.md`. > Add the addvantage estimation function of OPO in `verl/trainer/ppo/core_algos.py`. > Add `opo` option for addvantage estimation in `verl/trainer/ppo/ray_trainer.py`. ### Usage Example ```bash export GLOBAL_BSZ=256 python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_batch_size=${GLOBAL_BSZ} \ actor_rollout_ref.actor.ppo_mini_batch_size=${GLOBAL_BSZ} \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.actor.kl_loss_coef=0.0 \ actor_rollout_ref.actor.entropy_coeff=0.0 \ algorithm.kl_ctrl.kl_coef=0.0 \ ... ``` ### Tests Have tested the changes locally in the provided docker. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: H <linhaibin.eric@gmail.com> Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-06-02 14:46:06 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	07897f84e5	[AMD] fix: Add support for RAY_EXPERIMENTAL_NOSET__VISIBLE_DEVICES (Fix AMD support) (#1465 ) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? Add support for RAY_EXPERIMENTAL_NOSET__VISIBLE_DEVICES, also Fix AMD support ### High-Level Design Current approach for supporting AMD in verl is fundamentally not correct, and is just working out of the luck: Calls such as `torch.cuda.is_available()` or `torch.cuda.get_device_name()` will initialize the CUDA/ROCm environment: `c65ee728f0/torch/cuda/__init__.py (L342-L392)` Setting CUDA/HIP/ROCR_VISIBLE_DEVICES after CUDA/ROCm is initialized will not take effect (Please check https://github.com/pytorch/pytorch/issues/141678), which means that all current code that wrapped inside `[SUPPORT AMD: torch]` are mostly noops. CUDA_VISIBLE_DEVICES also works for AMD, but it's because that a lot of AMD migrated software call those `torch.cuda.` during importing, e.g.: - https://github.com/ROCm/TransformerEngine/pull/183 - https://github.com/vllm-project/vllm/pull/15246 While ray/vllm manipulates those _VISIBLE_DEVICES during runtime, which cause those `torch.cuda.` to poison the current process if the CUDA/ROCm environment is initialized before the manipulation happens. So, here, it would be a good solution to use only one environment variable for all (`CUDA_VISIBLE_DEVICES`) for consistency and hardware-agnostic, move all the other `_VISIBLE_DEVICES` to the CUDA one. Note that we must pay attention if both HIP/CUDA and ROCR env vars are set as they have different meanings. Both env vars accept either a list of ints or a list of UUIDs. The ROCR env var is processed first which then reduces the number of GPUs that HIP can select from. (Refering to https://github.com/pytorch/pytorch/pull/144026) To avoid the complexity of this, we simply gives out error if both are set (Also to keep consistency with ray's practice with 2.45.0). For the poisoning issue, before those 2 PRs are merged, we will need to ask the users to set `RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES` or `RAY_EXPERIMENTAL_NOSET_HIP_VISIBLE_DEVICES`, so that ray no longer manipulates these variables, and make verl workable when there is no `_VISIBLE_DEVICES`. Note that for latest ray (after their switch to `HIP_VISIBLE_DEVICES`), we also need this patch: https://github.com/ray-project/ray/pull/52794 ### Test Tested manually on both megatron and fsdp beckend with vllm. ### Additional Info. - Issue Number: none - Training: both FSDP and Megatron - Inference*: both vLLM and SGLang ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] Add CI test(s) if neccessary. Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-06-02 10:12:45 -07:00
Haoran Sun	ea81658b5f	[bugfix] fix select_idxs function in DataProto (#1794 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix the batch_size type error when using DataProto.select_idxs, which originally causes the TypeError when using DataProto.chunk ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes Fix the batch_size type of select_idxs func. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this import torch from verl import DataProto import numpy as np data = {"random_array": torch.randn(4, 2)} batch = DataProto.from_dict(data) valid_mask = np.array([True, False, True, False]) batch_select_idxs = batch[valid_mask] batch.chunk(2) # correct batch_select_idxs.chunk(2) # incorrect, raising TypeError # with tensordict version == 0.6.2 # Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "./verl/verl/protocol.py", line 667, in chunk # batch_lst = self.batch.chunk(chunks=chunks, dim=0) # File "/opt/conda/envs/verl/lib/python3.10/site-packages/tensordict/base.py", line 2134, in chunk # return self.split(split_size, dim=dim) # File "/opt/conda/envs/verl/lib/python3.10/site-packages/tensordict/_td.py", line 1715, in split # raise TypeError(WRONG_TYPE) # TypeError: split(): argument 'split_size' must be int or list of ints ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: N/A. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Co-authored-by: keithsun <keithsun@tencent.com>	2025-06-03 00:09:06 +08:00
sukrucildirr	0e127b208b	chore: fix typos across codebase (#1805 ) Fixed typos across codebase.	2025-06-02 21:05:07 +08:00
Zhenyi Zheng	6e0e860f37	[feat] worker_group: support custom label for specific devices (#1773 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Place `worker_group` in specific devices when using heterogeneous GPUs in a ray cluster. ### Specific Changes Add `accelerator_type` in `RayResourcePool` to set custom label in bundle to identify specific devices. Refer to https://docs.ray.io/en/latest/ray-core/scheduling/resources.html#custom-resources ### API > Demonstrate how the API changes if any. ### Usage Example 1. set custom label when start ray cluster ```bash # H20 Node ray start --head --port 6379 --resources='{"H20": 1}' # 4090 Node ray start --address='<master_ip>:6379' --resources='{"4090": 1}' ``` 2. specify the accelerator type when creating RayResourcePool ```python pool_h20 = RayResourcePool([4], use_gpu=True, accelerator_type='H20') pool_4090 = RayResourcePool([4], use_gpu=True, accelerator_type='4090') ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-06-02 12:47:48 +08:00
H	b93b9bc2cb	[CI] test: disable unstable test temporarily (#1799 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Disable the always failing test	2025-06-02 08:34:10 +08:00
Chi Zhang	366d29c084	[eval] fix: fix main_eval (#1797 )	2025-06-01 08:40:22 -07:00
vllbc02	3126c8b428	remove redundant 'get_custom_reward_fn' function (#1791 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > remove redundant 'get_custom_reward_fn' function. ### High-Level Design > None. ### Specific Changes > "from verl.trainer.ppo.reward import get_custom_reward_fn" instead of 'get_custom_reward_fn' function in verl/recipe/dapo/main_dapo.py verl/recipe/r1/main_eval.py verl/recipe/spin/main_spin.py verl/verl/trainer/main_eval.py verl/verl/trainer/main_eval.py > remove 'get_custom_reward_fn' function in verl/verl/trainer/main_ppo.py ### Additional Info. - [Issue Number](https://github.com/volcengine/verl/issues/1716): Fixes issue # or discussion # if any. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).	2025-06-01 21:45:54 +08:00
jinqinn	1fd729c25e	fix import issue for mcore package (#1775 ) fix import issue for mcore package in `patch_v012.py`	2025-06-01 16:23:04 +08:00
黄石	ad9470068e	fix freeze router weights for Qwen2MoE (#1792 )	2025-06-01 16:03:57 +08:00
Hongpeng Guo	0ae50562cc	[doc] fix: Fix `doc_test`ci workflow pipeline (#1767 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? The existing doc test ci won't fail, because `SPHINX` doc system only raise on `fatal`, Error and Warning won't block the doc build process. This PR tries to fix the problem by grep `Error` messages in the building log. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-31 23:49:58 -07:00
Yuzhen Zhou	4de247fe4d	[sglang] refactor: Unify async rollout under SGLangRollout, and support sglang==0.4.6.post5 (#1717 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - Unify the functionality of SGLangRollout and AsyncSGLangRollout, remove original SGLangRollout and rename AsyncSGLangRollout to SGLangRollout. - Make trivial changes due to modification in sglang==0.4.6.post5. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Co-authored-by: zyzshishui <@qq.com> Co-authored-by: Xiang Long <mindsculptor@yeah.net> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-05-31 19:47:25 -07:00
H	cef6361def	[docs] lora: fix lora image and add GRPO docs (#1788 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix image rendering	2025-06-01 09:49:42 +08:00
thelongestusernameofall	ab97d9b290	[docs] LORA: Train RL(HF) algorithms with LoRA support (#1755 ) ### Checklist Before Starting - [done] Search for similar PR(s). ### What does this PR do? > This PR adds documentation on how to train RL (HF) algorithms with LoRA support, including configuration parameters and an example script for practical training. --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-05-31 10:00:40 -07:00
H	106d33f9ec	[docs] ppo: add a page for PPO algorithm (#1781 ) ### Checklist Before Starting - [x] Search for similar PR(s). This PR includes contribution and suggestions from [richardodliu](https://github.com/richardodliu) in https://github.com/volcengine/verl/pull/979 ### What does this PR do? Update documentation page, include key configs for PPO and other recipes. Pending docs: - GRPO - DrGRPO - DAPO, etc TODO: let config.rst directly show the content of ppo_trainer.yaml and other related yaml files. In the yaml file, colocate the comment and explanation with the option. This way the yaml is always consistent with the documentation page. For critical feature or algorithms, we list the core configs in a self-contained page like PPO.md ### High-Level Design None ### Specific Changes - use k1, k2, k3 for the kl calculation, still backward compatible - changed ppo.rst to baseline.md - added ppo.md to explain core options for ppo ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-31 09:03:12 -07:00
Changling	c5bc81b692	[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (#1682 )	2025-05-31 12:51:19 +08:00
Hongpeng Guo	e23e67ba53	[feat] dataproto: Supporting new operations (sample_level_repeat, unfold_column_chunks) for `DataProto` (#1761 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Adding/ Enriching new operations on `DataProto` data class: 1. Making `DataProto` compitable with `self.batch is None`, this is useful when we are using a `DataProto` to contain non-tensor data only, i.e., images for vlm use cases; 2. `sample_level_repeat`： this function repeat the rows in DataProto multiple times in sample level; 3. `unfold_column_chunks`: this function split along the second dim into `n_splits` folds. Useful in passing grouped tensors that doesn't want to be shuffled in dataset. ### API & Test Please check the usage from the added unit test files: `tests/test_protocol.py`. There are three unit tests added, which are: `test_dataproto_no_batch`, `test_sample_level_repeat`, and `test_dataproto_unfold_column_chunks`. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-30 13:11:12 -07:00
leo-pony	316644dc8f	[docs] Add linux-arm64 platform tensordict package version problem handle FAQ (#1776 ) ### What does this PR do? > Add linux-arm64 platform tensordict package version problem handle FAQ. > Besides me, there are other people in the community who have encountered this problem(issue #919 ) ### Detailed reason for change: The Linux-arm64 platform does not have a suitable version of the tensordict package. The verl requirement for tensordict is <=0.6.2.0. The version that can be installed on the Linux-arm64 platform is 0.1.2, but the `"key" in tensordict_var ` syntax is not supported by 0.1.2, so error take place. The error message is as follows: ``` File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 191, in run trainer.fit() File "/home/mnj/models/code/verl/verl/verl/trainer/ppo/ray_trainer.py", line 1043, in fit old_log_prob = self.actor_rollout_wg.compute_log_prob(batch) File "/home/mnj/models/code/verl/verl/verl/single_controller/ray/base.py", line 50, in func output = ray.get(output) ray.exceptions.RayTaskError(NotImplementedError): ray::WorkerDict.actor_rollout_compute_log_prob() (pid=152918, ip=172.17.0.5, actor_id=64244f99243c810c9e882f3101000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0xfffc41ca44c0>) File "/home/mnj/models/code/verl/verl/verl/single_controller/ray/base.py", line 635, in func return getattr(self.worker_dict[key], name)(args, kwargs) File "/home/mnj/models/code/verl/verl/verl/single_controller/base/decorator.py", line 534, in inner return func(args, *kwargs) File "/home/mnj/models/code/verl/verl/verl/workers/fsdp_workers.py", line 739, in compute_log_prob output, entropys = self.actor.compute_log_prob(data=data, calculate_entropy=True) File "/home/mnj/models/code/verl/verl/verl/utils/debug/performance.py", line 80, in f return self.log(decorated_function, args, *kwargs) File "/home/mnj/models/code/verl/verl/verl/utils/debug/performance.py", line 90, in log output = func(args, **kwargs) File "/home/mnj/models/code/verl/verl/verl/workers/actor/dp_actor.py", line 289, in compute_log_prob entropy, log_probs = self._forward_micro_batch(micro_batch, temperature=temperature, calculate_entropy=calculate_entropy) File "/home/mnj/models/code/verl/verl/verl/workers/actor/dp_actor.py", line 83, in _forward_micro_batch if "multi_modal_inputs" in micro_batch: File "/usr/local/python3.10.17/lib/python3.10/site-packages/tensordict/tensordict.py", line 2932, in __contains__ raise NotImplementedError( NotImplementedError: TensorDict does not support membership checks with the `in` keyword. If you want to check if a particular key is in your TensorDict, please use `key in tensordict.keys()` instead. ``` ### Platform linux-arm64 available version listing as follows: `pip install tensordict==0.6.2` Output information: ``` ERROR: Could not find a version that satisfies the requirement tensordict==0.6.2 (from versions: 0.0.1a0, 0.0.1b0, 0.0.1rc0, 0.0.2a0, 0.0.2b0, 0.0.3, 0.1.0, 0.1.1, 0.1.2, 0.8.0, 0.8.1, 0.8.2, 0.8.3) ERROR: No matching distribution found for tensordict==0.6.2 ``` Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-05-30 20:56:58 +08:00
H	2aed8d0a45	[BREAKING] config: set the default value of actor.entropy_coeff to 0 (#1770 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? entropy_coeff shall be set carefully during RL. When enabled, inappropriate coefficient may case training to collapse. You can see more empirical experiments from Skywork Open Reasoner 1 Technical Report (https://arxiv.org/pdf/2505.22312). In this PR, the default value of entropy_coeff is set to 0. This is a breaking change that may affect your experiment, although majority of verl example scripts set it to 0 manually already. We let most example script just pick up the default value of 0 for entropy_coeff. For a few documentation page where the reference model performance and commands are provided, we modify the doc so that the experiment result is consistent with the config setup. ### Usage Example To enable entropy loss coefficient, use ```bash actor_rollout_ref.actor.entropy_coeff=0.001 # or other values ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-30 14:42:53 +08:00
Hongpeng Guo	ed3767dcb3	[refactor] update func generator implementation to improve its observability (#1762 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Making the `func_generator` return type as a subclass of `Functor` with `__call__` method, whose name is `method_name`. Comparing to the previous implementation. This PR will makes the log record the `method_name` explicitly, instead of the previous `<class 'function'>` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-30 13:19:45 +08:00
iTao	2981aa26db	fix a bug that Moe's GPU memory offload is not properly handled (#1766 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? This PR fixes the issue of improper memory offloading for Moe. When expert parallelism is enabled in Megatron's MoE, additional expert_parallel_buffers are used to store the buffers, which occupy a significant amount of GPU memory. The current code fails to offload and onload these expert_parallel_buffers, resulting in incomplete memory offloading of the model. This may lead to out-of-memory (OOM) problem. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-30 13:17:39 +08:00
leo-pony	562ac53d05	[docs][NPU] Document optimize: added the dataset preparation step to ascend quick start guide (#1763 ) Document optimize: added the dataset preparation step to ascend quick start guide Without this content, error will take place, error "FileNotFoundError: Unable to find '/root/data/gsm8k/train.parquet'", error stack as follows: ``` Traceback (most recent call last): File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 63, in main run_ppo(config) File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 76, in run_ppo ray.get(runner.run.remote(config)) File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(args, kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(FileNotFoundError): ray::TaskRunner.run() (pid=34827, ip=172.17.0.5, actor_id=6de3b5cdc4feb78723a1aa2901000000, repr=<main_ppo.TaskRunner object at 0xfffc17a22fe0>) File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 172, in run train_dataset = create_rl_dataset(config.data.train_files, config.data, tokenizer, processor) File "/home/mnj/models/code/verl/verl/verl/trainer/main_ppo.py", line 219, in create_rl_dataset dataset = dataset_cls( File "/home/mnj/models/code/verl/verl/verl/utils/dataset/rl_dataset.py", line 119, in __init__ self._read_files_and_tokenize() File "/home/mnj/models/code/verl/verl/verl/utils/dataset/rl_dataset.py", line 132, in _read_files_and_tokenize dataframe = datasets.load_dataset("parquet", data_files=parquet_file)["train"] File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 2062, in load_dataset builder_instance = load_dataset_builder( File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 1782, in load_dataset_builder dataset_module = dataset_module_factory( File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 1497, in dataset_module_factory ).get_module() File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/load.py", line 913, in get_module data_files = DataFilesDict.from_patterns( File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/data_files.py", line 689, in from_patterns else DataFilesList.from_patterns( File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/data_files.py", line 582, in from_patterns resolve_pattern( File "/usr/local/python3.10.17/lib/python3.10/site-packages/datasets/data_files.py", line 383, in resolve_pattern raise FileNotFoundError(error_msg) FileNotFoundError: Unable to find '/root/data/gsm8k/train.parquet' ``` Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-05-30 13:15:36 +08:00
Yi-Chen Li	28a31d1b55	[fix] self.reward_module._handle.reshard(True) not required for fsdp2 (#1765 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix RewardModelWorker when using FSDP2, where self.reward_module_handle.reshare(True) is not required. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-30 13:11:24 +08:00
ShareLer	96903e0e97	[feat] support kimi_vl VLM model (#1639 ) ### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR #1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems：（with vllm==0.8.5.post1） - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: https://github.com/vllm-project/vllm/issues/16908 Reference this method to avoid the problem temporarily: https://github.com/vllm-project/vllm/issues/16908#issuecomment-2820504215 - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1，vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: https://github.com/vllm-project/vllm/issues/18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - Issue Number: #1428 - Training: FSDP - Inference: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>	2025-05-30 11:16:06 +08:00
Nan Zhe	9c50ffd0cb	[vlm] Support ulysses sequence parallelism for vlm (#1739 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Only apply Ulysses sequence parallel to the LLM part of the VLM model, which is the main component, to avoid `the Image features and image tokens do not match` issue from occurring before `masked_scatter`. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. For the VLM model, we only pad the inputs before forward pass without slicing them; instead, we perform slicing after the embedding stage. 2. In cases where ViT and LLM share/reuse FlashAttention, distinguish the ViT scenario and skip the Ulysses logic. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ``` python -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=/mnt/hdfs/zhudelin123/data/geo3k/train.parquet \ data.val_files=/mnt/hdfs/zhudelin123/data/geo3k/test.parquet \ data.train_batch_size=64 \ data.max_prompt_length=2048 \ data.max_response_length=2048 \ data.filter_overlong_prompts=True \ data.truncation=error \ data.image_key=images \ actor_rollout_ref.model.path=/mnt/hdfs/Qwen2.5-VL-7B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.model.use_fused_kernels=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=[console,wandb] \ trainer.project_name=nanzhe_verl_grpo_example_geo3k \ trainer.experiment_name=qwen2_5_vl_7b_sp2_test \ trainer.n_gpus_per_node=8 \ trainer.nnodes=2 \ trainer.save_freq=-1 \ trainer.test_freq=-1 \ trainer.default_hdfs_dir=null \ trainer.total_epochs=1 \ trainer.resume_mode=disable ``` <img width="481" alt="image" src="https://github.com/user-attachments/assets/066db41d-46cf-4bc8-9d50-b9a8189c7654" /> ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-30 11:12:49 +08:00
Blue Space	55f13ff16f	[fix] moonlight runnable with trust_remote_code (#1749 )	2025-05-29 22:25:28 +08:00
Wayne	195f61b0f5	[feat] add fsdp2 to fsdp_sft_trainer (#1713 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add fsdp2 to fsdp_sft_trainer. Resolve issue #1504. ### High-Level Design Refer to the implementation of #1026. ### Usage Example ```python model.strategy=fsdp2 ``` ### Test <img width="1095" alt="image" src="https://github.com/user-attachments/assets/1f70db1c-9ac3-448e-abca-fd302480f0c7" /> ### Additional Info. - Issue Number: #1504 - Training: [Note which backend this PR will affect: FSDP] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-29 21:36:57 +08:00
Frank Qing	7853292336	Fix copy_to_local function calls with incorrect argument usage (#1756 ) - Fixed two copy_to_local calls where use_shm was passed as positional argument - Changed to use keyword argument use_shm=use_shm to prevent TypeError - This resolves the 'expected str, bytes or os.PathLike object, not bool' error - Affects lines 566 and 607 in verl/workers/fsdp_workers.py ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Changed `copy_to_local(self.config.model.path, use_shm)` to `copy_to_local(self.config.model.path, use_shm=use_shm)` ### Specific Changes Problem: The `copy_to_local` function was being called with `use_shm` as a positional argument instead of a keyword argument, causing `cache_dir` to receive a boolean value instead of a string path. This resulted in: ``` TypeError: expected str, bytes or os.PathLike object, not bool ``` Solution: - Changed `copy_to_local(self.config.model.path, use_shm)` to `copy_to_local(self.config.model.path, use_shm=use_shm)` - Fixed two instances in `verl/workers/fsdp_workers.py` (lines 566 and 607) Testing: - Error no longer occurs during model initialization - Function calls now correctly pass parameters according to the function signature Files Changed: - `verl/workers/fsdp_workers.py` ``` Co-authored-by: qingyuhao <qingyuhao@bytedance.com>	2025-05-29 17:01:02 +08:00
DtYXs	904a252379	Add an example script for PF-PPO training (#1753 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add an example script for PF-PPO training ### Specific Changes > Add an example script `run_deepseek7b_llm_pfppo.sh` in `examples/ppo_trainer/` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-29 15:53:43 +08:00
OC	b8ae4a1fba	[rollout] feat: Implement sglang async rollout and multi-turn using AsyncServerBase (#1698 ) …sing AsyncServerBase Implemented AsyncSglangServer similar with AsyncvLLMServer. Tested run_qwen2-7b_seq_balance_sglang.sh with TP=1, but still has some todos: TODO - [ ] improve performance when TP>1. Current implementation is slow because sglang_engine.async_generate is called in sequence for each request. - [ ] test in multi node deployment. - [ ] add an unit test ### Checklist Before Starting - [done] Search for similar PR(s). ### What does this PR do? resolve issue: https://github.com/volcengine/verl/issues/1636 ### High-Level Design <img width="462" alt="截屏2025-05-26 20 22 25" src="https://github.com/user-attachments/assets/f07b218d-8e6e-4ccb-b266-2c514d7b4370" /> https://github.com/volcengine/verl/issues/1636 ### Specific Changes add AsyncSglangServer ### API N/A ### Usage Example actor_rollout_ref.rollout.name=sglang \ actor_rollout_ref.rollout.mode=async \ ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue 1636 - Training: [none] - Inference: [SGLang] ### Checklist Before Submitting - [done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ done] Add `[BREAKING]` to the PR title if it breaks any API. - [ done] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ done] Add CI test(s) if necessary.	2025-05-29 14:41:27 +08:00
Liang Tang	1b17bb6f92	fix: show last step progress bar (#1750 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Update last step progress bar ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. Signed-off-by: shinytang6 <shinytang6@gmail.com>	2025-05-29 14:19:20 +08:00
wang	de553a2eba	feat: sandbox fusion for multi-turn (#1525 ) - As users of veRL, we want to allow the model to call certain tools during Actor rollout, incorporating the results into the training process. - We aim to support tool-calling capabilities of inference engines using `sandbox-fusion` as the code execution system, providing the community with a reimplementation of `retools`.	2025-05-29 12:12:17 +08:00
OC	bb4f97b754	[ray] fix: error when bind async method in create_colocated_worker (#1745 ) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? fix a bug when register async method to fsdp worker. When use async method in fsdp worker, it fails with: ``` > raise value.as_instanceof_cause() E ray.exceptions.RayTaskError(TypeError): ray::WorkerDict.critic_sub() (pid=232160, ip=192.168.111.50, actor_id=ca29f2b51caa8e56243d6b8e01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f8c50729270>) E File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps E cp.dump(obj) E File "/usr/local/lib/python3.10/dist-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump E return super().dump(obj) E TypeError: cannot pickle 'coroutine' object ``` /usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:919: RayTaskError(TypeError) You can reproduce this error in tests/ray_gpu/test_colocated_workers.py with async method. ### High-Level Design wrap async method if the original method is coroutine ### Specific Changes changed _bind_workers_method_to_parent ### API n\a ### Usage Example tests/ray_gpu/test_colocated_workers.py ### Test tests/ray_gpu/test_colocated_workers.py ### Additional Info. - Issue Number: required by https://github.com/volcengine/verl/issues/1721 ### Checklist Before Submitting - [done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ done] Add `[BREAKING]` to the PR title if it breaks any API. - [ done] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ done] Add CI test(s) if necessary.	2025-05-29 11:32:47 +08:00
H	abb87bc147	[docs] readme: add lora and move social icons (#1743 )	2025-05-28 17:02:07 -07:00
xichengpro	913ca6ee24	Improve run_qwen3moe-30b_megatron training script (#1742 )	2025-05-29 07:53:01 +08:00
thelongestusernameofall	16f6c1ee65	[feat] lora: new feature -- LoRA support for PPO (#1127 ) Co-Authored-By: Stephen Xie <stephenx@berkeley.edu> Co-Authored-By: Tony Lian <longlian@berkeley.edu> Co-Authored-By: Jiayi Pan <jiayipan@berkeley.edu> Co-Authored-By: Simon Huang <thelongestusernameofall@gmail.com> 测试脚本如下： ``` #!/bin/bash # # Author : simon huang # Date : 2025年04月15日14:20:30 # # For GRPO LoRA Support Dev # set -x ## master: # ray start --head --port=6379 ## slave: # ray start --address='localhost:6379' # export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export WANDB_DIR=wandb-kkr1-lora-4p3bv1 export WANDB_PROJECT=simon-kkr1-lora-4p3bv1 # wandb server start --port 9090 export WANDB_BASE_URL=http://wandblocal:9000 export WANDB_API_KEY=local-5239e89783ebebea9bac5509e2bd1a8e734f55f7 # wandb login --relogin --host=http://wandblocal:9000 # export WANDB_MODE=offline MODEL_PATH=/data1/models/Qwen/Qwen2.5-0.5B-Instruct export VLLM_ATTENTION_BACKEND=XFORMERS nproc_per_gpu=1 nnodes=1 nproc_per_node=2 total_procs=$(( nproc_per_gpu * nnodes * nproc_per_node )) mini_batch_size=$(( total_procs )) python3 -m verl.trainer.main_ppo \ --config-name=lora-ppo_trainer.yaml \ algorithm.adv_estimator=grpo \ data.train_files=data/kk/parquet/train.parquet \ data.val_files=data/kk/parquet/val.parquet \ data.train_batch_size=${total_procs} \ data.val_batch_size=${total_procs} \ data.max_prompt_length=2000 \ data.max_response_length=600 \ actor_rollout_ref.model.path=$MODEL_PATH\ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.model.lora_rank=8 \ actor_rollout_ref.model.lora_alpha=16 \ actor_rollout_ref.model.target_modules=[k_proj,v_proj] \ actor_rollout_ref.actor.optim.lr=3e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=${mini_batch_size} \ actor_rollout_ref.actor.ppo_micro_batch_size=${mini_batch_size} \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size=${mini_batch_size} \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.1 \ actor_rollout_ref.rollout.n=2 \ actor_rollout_ref.rollout.max_num_seqs=4 \ actor_rollout_ref.rollout.max_model_len=4000 \ actor_rollout_ref.rollout.max_num_batched_tokens=4000 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.ref.log_prob_micro_batch_size=${mini_batch_size} \ actor_rollout_ref.ref.fsdp_config.param_offload=False \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 \ actor_rollout_ref.actor.entropy_coeff=0.001 \ algorithm.kl_ctrl.kl_coef=0.001 \ reward_model.reward_manager=naive \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name=$WANDB_PROJECT \ trainer.experiment_name=$WANDB_PROJECT \ trainer.n_gpus_per_node=${nproc_per_node} \ trainer.nnodes=${nnodes} \ trainer.default_local_dir=$WANDB_PROJECT \ trainer.default_hdfs_dir=null \ trainer.save_freq=1 \ trainer.test_freq=1 \ trainer.total_epochs=8 $@ 2>&1 \| tee ${WANDB_PROJECT}.log ``` 输出log如下： ``` (TaskRunner pid=2931272) [Error] </answer> appears 0 times (expected 1) (TaskRunner pid=2931272) [Error] Incorrect tag order: Expected <think>...</think><answer>...</answer> (TaskRunner pid=2931272) (TaskRunner pid=2931272) Format validation: FAIL (TaskRunner pid=2931272) Format score: -2 (TaskRunner pid=2931272) (TaskRunner pid=2931272) [Content Validation] Skipped due to format errors or missing answer (TaskRunner pid=2931272) (TaskRunner pid=2931272) -------------------------------------------------------------------------------- (TaskRunner pid=2931272) --------------------------------- Final Score ---------------------------------- (TaskRunner pid=2931272) Format: -2 (TaskRunner pid=2931272) Answer: -2 (TaskRunner pid=2931272) Total: -4 (TaskRunner pid=2931272) ================================================================================ (TaskRunner pid=2931272) (TaskRunner pid=2931272) local_global_step_folder: simon-kkr1-lora-4p3bv1/global_step_10 (WorkerDict pid=2948236) [rank-0]: LoRA adapter saved to simon-kkr1-lora-4p3bv1/global_step_10/actor/lora_adapter Training Progress: 0%\| \| 10/47200 [05:16<308:34:14, 23.54s/it] (WorkerDict pid=2948236) [rank-0]: Saving model to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/model_world_size_2_rank_0.pt (WorkerDict pid=2948236) [rank-0]: Saving checkpoint to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/model_world_size_2_rank _0.pt (WorkerDict pid=2948236) [rank-0]: Saving extra_state to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/extra_state_world_size _2_rank_0.pt (TaskRunner pid=2931272) step:10 - global_seqlen/min:1981.000 - global_seqlen/max:4883.000 - global_seqlen/minmax_diff:2902.000 - global_seqlen/balanced_min:3417.000 - global_seqlen/bal anced_max:3447.000 - global_seqlen/mean:3432.000 - actor/entropy:1.657 - actor/pg_loss:0.000 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_ norm:1.258 - perf/mfu/actor:0.034 - perf/max_memory_allocated_gb:12.799 - perf/max_memory_reserved_gb:13.301 - perf/cpu_memory_used_gb:49.778 - actor/lr:0.000 - val-core/simon-kkr1/rewar d/mean@1:-5.278 - val-aux/simon-kkr1/reward/std@1:0.000 - val-core/simon-kkr1/reward/best@1/mean:-5.278 - val-core/simon-kkr1/reward/best@1/std:0.000 - val-aux/simon-kkr1/reward/worst@1/mea n:-5.278 - val-aux/simon-kkr1/reward/worst@1/std:0.000 - critic/score/mean:-3.658 - critic/score/max:-1.638 - critic/score/min:-5.734 - critic/rewards/mean:-3.658 - critic/rewards/max:-1 .638 - critic/rewards/min:-5.734 - critic/advantages/mean:-0.174 - critic/advantages/max:0.707 - critic/advantages/min:-0.707 - critic/returns/mean:-0.174 - critic/returns/max:0.707 - c ritic/returns/min:-0.707 - response_length/mean:81.500 - response_length/max:150.000 - response_length/min:28.000 - response_length/clip_ratio:0.000 - prompt_length/mean:1634.500 - prom pt_length/max:2319.000 - prompt_length/min:950.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:3.607 - timing_s/old_log_prob:0.482 - timing_s/adv:0.015 - timing_s/update_actor:1.428 - timing_s/testing:5.142 - timing_s/save_checkpoint:2.504 - timing_s/step:13.183 - timing_per_token_ms/adv:0.002 - timing_per_token_ms/update_actor:0.208 - timing_per_token_ms/gen:11.0 65 - perf/total_num_tokens:6864.000 - perf/time_per_step:13.183 - perf/throughput:260.329 (TaskRunner pid=2931272) (TaskRunner pid=2931272) ================================================================================ (TaskRunner pid=2931272) ============================ Processing New Sample ============================= (TaskRunner pid=2931272) [Warnning] Failed to locate model response header (TaskRunner pid=2931272) ``` LoRA adapter会和Checkpoint一同保存，截图如下： <img width="831" alt="image" src="https://github.com/user-attachments/assets/5b8b2283-decc-499a-b08c-62dcaa961c9f" /> 少量训练后的reward@worst曲线： <img width="511" alt="image" src="https://github.com/user-attachments/assets/d3253782-50b8-4f42-b203-38a09685dc24" /> --------- Co-authored-by: Stephen Xie <stephenx@berkeley.edu> Co-authored-by: Tony Lian <longlian@berkeley.edu> Co-authored-by: Jiayi Pan <jiayipan@berkeley.edu> Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>	2025-05-28 10:53:47 -07:00
Blue Space	18fa5c7e87	[Docker Image] hot fix moonlight tokenizer request (#1740 )	2025-05-28 23:30:10 +08:00
DtYXs	75d2b361c2	Add support for PF-PPO (#1719 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add support for [PF-PPO](https://arxiv.org/abs/2409.06957) in verl. ### Specific Changes > `verl/trainer/config/ppo_trainer.yaml`: Add config for PF-PPO `verl/trainer/ppo/core_algos.py`: Add `compute_pf_ppo_reweight_data` function. `verl/trainer/ppo/ray_trainer.py`: Do PF-PPO in `compute_advantage` when `config.algorithm.use_pf_ppo` is `True` `README.md`: Update PF-PPO in README ### Usage Example ```bash set -x python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=gae \ algorithm.use_pf_ppo=True \ algorithm.pf_ppo.reweight_method=pow \ algorithm.pf_ppo.weight_pow=2.0 \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=1024 \ data.max_prompt_length=512 \ data.max_response_length=512 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.rollout.n=5 \ critic.optim.lr=1e-5 \ critic.model.use_remove_padding=True \ critic.model.path=deepseek-ai/deepseek-llm-7b-chat \ critic.model.enable_gradient_checkpointing=True \ critic.ppo_micro_batch_size_per_gpu=32 \ critic.model.fsdp_config.param_offload=False \ critic.model.fsdp_config.optimizer_offload=False \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='verl_example_gsm8k' \ trainer.experiment_name='deepseek_llm_7b_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=1 \ trainer.save_freq=20 \ trainer.test_freq=1 \ trainer.total_epochs=15 $@ ``` ### Test Simple gsm8k test. <img width="502" alt="image" src="https://github.com/user-attachments/assets/4298ce20-a691-4edb-8e4a-ef68fb0fb6be" /> ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>	2025-05-28 21:23:48 +08:00
Guangming Sheng	7c91b103f5	[misc] fix: reduce training iter in spin and sppo ci (#1738 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Reduce training iterations in spin and sppo ci to reduce ci time. ### Test SPIN and SPPO CI ### Additional Info. No ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-28 19:39:49 +08:00
Maozhou Ge	cdad2e6504	trainer: do not repeat "multi_modal_inputs" in generate_sequences() (#1604 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? "multi_modal_inputs" is not used in generate_sequences() stage, there's no need to pass this field. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-28 18:23:04 +08:00
Yan Bai	be47ac44b2	[mcore] moonlight (small model with deepseekv3 arch) (#1284 ) achieve 74.3 at gsm8k, while moonlight reported as 77.4 still WIP with the performance diff	2025-05-28 17:10:29 +08:00
Blue Space	8fe4950061	[BugFix] fix freeze_moe_router typo to enable the config option (#1732 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix freeze_moe_router typo to enable the config option as @duomicoding in #1540 and @vermouth1992 pointed out. Maybe freeze is better than fix to describe this function. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-28 17:05:57 +08:00
Blue Space	432f9e91f1	[feat][BREAKING] Megatron support dynamic batch size, to rebalance the workloads (#1617 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Megatron support dynamic batch size, to rebalance the workloads. 2. Fix missing critic metrics. ### High-Level Design Follow the FSDP's dynamic batch size. ### Specific Changes Use the `rearrange_micro_batches` API, but compatible with Megatron VPP constraints. ```py vpp_size = mpu.get_virtual_pipeline_model_parallel_world_size() if vpp_size is not None and vpp_size > 1: microbatch_group_size_per_vp_stage = self.tf_config.microbatch_group_size_per_vp_stage micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, num_batches_devided_by=microbatch_group_size_per_vp_stage, max_token_len=max_token_len) assert len(micro_batches) % self.tf_config.microbatch_group_size_per_vp_stage == 0, f"micro_batches {micro_batches} must be divisible by microbatch_group_size_per_vp_stage {microbatch_group_size_per_vp_stage} for megatron backend" else: micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, max_token_len=max_token_len) ``` @vermouth1992 please check whether it makes sense. Megatron's constraint when using interleaving pipeline: ```py # If the final micro-batch group has fewer micro-batches than pipeline-parallel size, # the pipeline will have dependency bubbles. final_microbatch_group_size = num_microbatches % config.microbatch_group_size_per_vp_stage if 0 < final_microbatch_group_size < pipeline_parallel_size: msg = 'The remainder of M (the total micro-batches) divided by N (number of ' msg += 'contiguous micro-batches in a virtual pipeline stage) should be 0, ' msg += 'or larger than or equal to the pipeline-parallel size, but it is ' msg += f'{final_microbatch_group_size}. ' msg += 'Otherwise, it introduces dependency bubbles in the pipeline ' msg += 'and reduces throughput.' raise RuntimeError(msg) ``` ### API Megatron forward_backward_batch has changed input, and the output has become a dict, containing original `output` and the `indices` needed for compute_old_log_probs. ### Usage Example ```bash actor_rollout_ref.actor.use_dynamic_bsz=${USE_DYNAMIC_BSZ} \ actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${ppo_max_token_len_per_gpu} \ critic.ppo_max_token_len_per_gpu=${forward_max_token_len_per_gpu} \ ``` Other models will directly copy the config. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-28 10:52:36 +08:00
none0663	99e749a1f7	Fix Configuration for Micro Batch Size in Megatron's Ref Policy (#1700 ) ### What does this PR do? Fix Configuration for Micro Batch Size in Megatron's Ref Policy ### High-Level Design This pull request addresses an issue with the micro batch size configuration in the ref policy of Megatron. The default ppo_megatron_trainer.yaml only includes two configurations: log_prob_micro_batch_size and log_prob_micro_batch_size_per_gpu. `54c9b7364c/verl/trainer/config/ppo_megatron_trainer.yaml (L119-L120)` However, in `megatron_workers.py`, the required configuration is ref.log_prob_micro_batch_size_per_gpu `54c9b7364c/verl/workers/megatron_workers.py (L517-L518)` or in `megatron_actor.py ` the required configuration is ref.ppo_micro_batch_size_per_gpu, `54c9b7364c/verl/workers/actor/megatron_actor.py (L271-L274)` which are not directly related to ppo_micro_batch_size. To resolve this, I have made modifications to the configuration calculations and added raise ValueError statements to ensure that the necessary parameters are correctly defined. This update ensures that the required parameters are properly handled, preventing runtime errors and improving the overall robustness of the training process. ### Changes Made: - Modified the configuration calculations in megatron_workers.py. - Added raise ValueError statements to check for the presence of log_prob_micro_batch_size_per_gpu and ppo_micro_batch_size_per_gpu.	2025-05-28 10:51:46 +08:00
Zixiang Chen	9b186eda34	Update README.md (#1731 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR updates the README.md for the SPIN recipe to improve accuracy and completeness. Key changes include corrections and additions to the method description, the inclusion of related Works, and a more concise introduction. ### High-Level Design N/A - Focuses on documentation improvements for clarity and accuracy. ### Specific Changes - Corrected and supplemented the description of the SPIN methodology. - Inclusion of related Works along with concise introductions to relevant papers/concepts. - Refined and clarified the introductory sections of the README. ### API N/A - Changes are limited to README.md documentation. ### Usage Example N/A - This PR does not primarily focus on usage examples, but rather on descriptive content. ```python # No new standalone code snippets are part of this PR itself.	2025-05-28 10:39:31 +08:00
Hongpeng Guo	d5570c40ef	[mics][fix] Deprecate legacy `_default_compute_score` API and fix ray utils test (#1729 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Handle comments after #1397 being merged: 1. Add back `_default_compute_score` API and mark it as deprecated; 2. Fix a broken ci test `ray_utils_test` on `parallel_put`; ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-28 09:37:03 +08:00
Chi Zhang	16a13d836e	[misc] feat: support logging rollout prob vs. actor probs for debugging purpose (#1712 ) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? - Support logging rollout probs vs. actor probs for debugging purpose - Support both vllm and sglang async ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-28 08:14:31 +08:00
Hongpeng Guo	34e409b683	[docs] refactor: Adding doc strings and doc pages for public methods in `trainer` and `utils` (#1397 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? * This PR adds doc string for the public methods inside `trainer` and `utils` module, so that these methods can be reused and referenced better. * Two new doc page `PPO Trainer Interface` and `Utilities` were also provided under the API Reference section. * Renamed one function `verl.utils._default_compute_score` to `verl.utils.default_compute_score`, as it was an external function used by other modules, i.e., trainer and recipe; <img width="1093" alt="Screenshot 2025-05-26 at 9 20 31 PM" src="https://github.com/user-attachments/assets/e361e6bd-a33b-426b-85b4-9fe93ab1e398" /> ### TODO This is the second of a series of PRs to improve and stabilize the docs and API. Stacked on top of #1396 TODO includes adding more useful utility functions to the doc with improved doc strings. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: H <linhaibin.eric@gmail.com>	2025-05-27 14:39:52 -07:00
Blue Space	4d3ca21288	[CI] disable e2e_prime, always hang for 50 minutes (#1728 )	2025-05-27 22:39:27 +08:00
Bihan Rana	54b2677f72	Add dstack example (#2 ) (#1706 ) Co-authored-by: Bihan Rana <bihan@Bihans-MacBook-Pro.local> Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>	2025-05-27 08:44:03 +08:00
Casper	9846360ee0	fix TimeoutError in aiohttp (#1702 )	2025-05-27 08:09:04 +08:00
hoshi-hiyouga	4583e4c27d	[Doc] Add a visual explanation of the configuration to the documentation (#1709 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add a visual explanation of the configuration to the documentation ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-27 02:04:59 +08:00
Blue Space	5fe1839223	[CI] fix some tests scope (#1689 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Refactor and reduce some tests scope to reduce unrelated tests. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-26 09:46:30 -07:00
Blue Space	8298f7d267	[Bugfix] Fix for non_fused_kernels passing arguments (#1687 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Non_fused_kernels passing arguments error causes Qwen2_5_VL failed. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>	2025-05-26 22:09:49 +08:00
Chunyu	54c9b7364c	update ascend_quick_start doc (#1685 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? update ascend_quick_start.rst ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. rename ascend_quick_start.rst 2. add the accuracy and throughput data of GRPO. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-26 15:53:07 +08:00
Baiqing Lyu	3d5f15fa9a	[fix] use correct variable for saving hf model (#1681 )	2025-05-25 18:49:43 +08:00
Chi Zhang	c60546d305	[misc] fix: fix device (#1671 ) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? Currently, the device to run on depends on whether `is_cuda_available` is True on the driver process. However, the driver process may be a CPU process that can't see cuda devices even when cuda devices are available. Thus, it's not appropriate to use `is_cuda_available` to set the device. Instead, we should set the device explicitly. In the future, we may have a ray cluster with both NPU and GPU, and we can use different devices for different workloads. Thus, setting device explicitly would be a better choice in the long run. Why CI can't trigger this problem: because we directly run `python3 xxx` on CI machine instead of using a standard ray cluster that has dedicated CPUs for head. CI machines all have GPUs. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-25 00:06:22 +08:00
Chi Zhang	45323080ea	[misc] fix: fix megatron entropy (#1672 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? In megatron-core, `vocab_parallel_log_probs_from_logits` is an inplace operator that would modify the logits in place to save memory. This makes the `vocab_parallel_entropy` produces incorrect results if `vocab_parallel_entropy` is computed after `vocab_parallel_log_probs_from_logits`. We swap the order to make sure the result is correct. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-25 00:04:23 +08:00
Cheetah	7d26d7359e	modify the installation method of vllm on different architectures and hyperlink (#1673 ) …res and hyperlink ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? modify the installation method of vllm on different architectures and hyperlink ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1、modify the installation method of vllm on different architectures 2、modify syntax of hyperlink ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-24 21:54:32 +08:00
Xiang Long	cf731e84d9	[sglang] Fix megatron support in sglang and add sglang_async support & CI tasks (#1602 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. - Fix sglang megatron support - Add sglang_async megatron support - Add CI task to protect megatron-sglang impl ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. https://wandb.ai/swordfaith/gsm8k_async_rl/runs/6h7apmbn?nw=nwuserswordfaith ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: SGLang ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>	2025-05-24 18:37:41 +08:00
Lang Feng	69582dc177	Add verl-agent and GiGPO to the awesome work list (#1660 )	2025-05-24 18:33:47 +08:00
Cheetah	3c048ac750	modify the instructions for using verl on ASCEND NPU (#1670 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? modify the instructions for using verl on ASCEND NPU ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1、Modify table format 2、Modify the installation method of vllm and vllm-ascend ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-24 18:31:53 +08:00
Shawn/Yuxuan Tong	5dc64391fe	[CI] fix: DAPO CI & response_mask (#1666 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes: - DAPO CI triggering path patterns outdated since #1392 - `response_mask` computation missing but skipping the CI test in #1652 ### Tests - [x] DAPO CI is correctly triggered and passed, e.g., https://github.com/volcengine/verl/actions/runs/15223958183/job/42823610223?pr=1666 ### Additional Info. - Issue Number: #1392 , #1652 - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-24 14:18:57 +08:00
mingruimingrui	4779f26164	[Refactor] fused kernel in forward (#1624 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Shifts fused_linear_for_ppo into model.forward for FSDP ### High-Level Design Self explaining ### Specific Changes - Update monkey patch to return log_probs and entropy instead of last_hidden_state. ### API No changes ### Usage Example ```sh actor_rollout_ref.model.use_fused_kernels=True ``` ### Test ![image](https://github.com/user-attachments/assets/c6af68fb-0200-4aee-9596-0b445afdc562) ### Additional Info. - This is to fix #1565 - The original bug arises because we tried to access model.lm_head.weight from outside of the FSDP wrapped context. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-24 13:50:57 +08:00
Bong	02862103ba	[Megatron] Support optimizer offload for moe when ep > 1 (#1638 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This simple PR adds support for [ChainedOptimizer](`75b1ca1361/megatron/core/optimizer/optimizer.py (L938)`) offloading in the Megatron-LM training environment. In Megatron-LM, ChainedOptimizer is used when expert parallelism (expert_parallel > 1, related to #1467 ) is enabled—commonly in Mixture-of-Experts (MoE) models. This has been tested and validated with the Qwen3-235B-22A model configuration. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python ... actor_rollout_ref.actor.megatron.optimizer_offload=True \ actor_rollout_ref.actor.megatron.expert_model_parallel_size=16 \ ... ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Megatron] - Inference: [none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: charlie.cs <charlie.cs@kakaocorp.com> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>	2025-05-24 12:42:10 +08:00
Yanbin Jiang	72255445f2	[SGLang Async Rollout] Validate prompt_len + max_resp_len <= max_mode… (#1627 ) …l_len before generation ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds a validation step to prevent generation requests that exceed the model’s maximum context length in SGLang. Without this check, multi-turn RL training can fail when the combined length of the prompt and the maximum response exceeds the model limit. The new validation ensures `prompt_len + max_resp_len <= max_model_len` before sending requests to the SGLang engine. ### Test Successfully tested with my multiturn RL dataset with `max_turns==30` which keeps failing with the following error before this change(Qwen2.5-32B-instruct + GRPO): ``` Traceback (most recent call last): File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 64, in main run_ppo(config) File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 76, in run_ppo ray.get(runner.run.remote(config)) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(args, kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, *kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=1150536, ip=100.96.248.206, actor_id=85b22be1ed8ef671c739638a01000000, repr=<main_ppo.TaskRunner object at 0x796b0bba7010>) File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 183, in run trainer.fit() File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 872, in fit val_metrics = self._validate() File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 607, in _validate test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded) File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 49, in func output = ray.get(output) ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_generate_sequences() (pid=1169888, ip=100.96.248.206, actor_id=6deb9fd4b4ff01530920ada301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7e41e90afa90>) File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 625, in func return getattr(self.worker_dict[key], name)(args, *kwargs) File "/home/jobuser/resources/verl/single_controller/base/decorator.py", line 534, in inner return func(args, *kwargs) File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 630, in generate_sequences output = self.rollout.generate_sequences_with_tools(prompts=prompts) File "/home/jobuser/resources/verl/utils/debug/performance.py", line 78, in f return self.log(decorated_function, args, *kwargs) File "/home/jobuser/resources/verl/utils/debug/performance.py", line 88, in log output = func(args, *kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 613, in generate_sequences_with_tools output_req_list = loop.run_until_complete( File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 529, in _async_rollout_a_request output = await self._engine.async_generate( File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 265, in async_generate return await generator.__anext__() File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 403, in generate_request tokenized_obj = await self._tokenize_one_request(obj) File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 450, in _tokenize_one_request self._validate_token_len(obj, input_ids) File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 482, in _validate_token_len raise ValueError(error_msg) ValueError: Requested token count exceeds the model's maximum context length of 32768 tokens. You requested a total of 34009 tokens: 23769 tokens from the input messages and 10240 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit. ``` ### Additional Info. - Inference**: SGLang, ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-24 08:45:36 +08:00
Yuzhen Zhou	96c181a2e6	chore(ci): support FSDP2 for multi-turn SGLangRollout with tool calling (#1650 )	2025-05-23 22:52:04 +08:00
Cheetah	0528ba1185	[NPU] feat: Support FSDP worker and vLLM Ascend (#332 ) For developers, you can follow the docs: docs/ascend/ascend.rst This pr is committed for supporting Ascend NPU backend. Co-authored-by: Chendong98 [chendong136@huawei.com](mailto:chendong136@huawei.com) Co-authored-by: zheliuyu <15750543867@163.com> Co-authored-by: celestialli [celestialli@outlook.com](mailto:celestialli@outlook.com) In this pr, we add the capability to determine the type of NPU device and we also add a new script for training on NPU. These are change lists: 1. pyproject.toml change verison of vllm 2. requirements-npu.txt requirements for NPU 3. verl/bert_padding.py Adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py 4. verl/single_controller/ray/base.py 5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py 6. verl/trainer/fsdp_sft_trainer.py 7. verl/utils/flops_counter.py 8. verl/utils/fsdp_utils.py 9. verl/workers/actor/dp_actor.py 10. verl/workers/critic/dp_critic.py 11. verl/workers/fsdp_workers.py 12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py 13. verl/workers/sharding_manager/fsdp_vllm.py 14. verl/utils/device.py get device type for different device 15. docs/ascend/ascend.md Here are our roadmap: RoadMap - [x] sft - [x] ppo - [x] grpo News [2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported. [2025.03.03] Modify the adaptation method of Ray [2025.02.25] The PPO algorithm is supported for training on NPU with the FSDP backend. [2025.02.23] The SFT algorithm is supported for training on NPU with the FSDP backend. [2025.02.21] The GRPO algorithm is supported for training on NPU with the FSDP backend. Requirements We use this PR testing on Ascend NPU and GPU to ensure the same codes can run on different devices. The device information is 8 Atlas 800T A2 and 8 A100. Other software information is shown in the following table. \| Software \| Version \| \|:-------\|-------:\| \| transformers \| 4.47.1 \| \| accelerate \| 1.3.0 \| \| torch_npu \| 2.5.1.rc1\| \|CANN \| 8.1.RC1 (Not Released)\| About mean error Due to differences in hardware structure, we cannot guarantee that the loss of Ascend NPU is exactly the same as that of the GPU. According to our experience, the loss differences less than 2% is acceptable. If the loss difference is greater than 2%, we will try to fix it. The calculation formula is as follows. ![loss_comparison](https://github.com/user-attachments/assets/4f62f713-9240-4324-bf7d-3ae59fc85b05) N represents the number of training steps. For more information, please refer to [Calculation accuracy description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html) --------- Co-authored-by: Chendong98 <chendong136@huawei.com> Co-authored-by: zheliuyu <15750543867@163.com>	2025-05-23 21:28:57 +08:00
Shawn/Yuxuan Tong	a7b2e29cb6	fix: entropy in DAPO (#1652 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds entropy computation and logging to DAPO trainer, aligning with other trainers. ### Additional Info. - Issue Number: #1455 - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-23 20:15:55 +08:00
Shawn/Yuxuan Tong	c4faf5c94a	[CI] feat: add ignore for CI of SPIN & SPPO (#1653 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds ignore patterns to CI for SPIN & SPPO. ### Additional Info. - Issue Number: none - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-23 20:15:32 +08:00
Shawn/Yuxuan Tong	cdee00d628	fix: only load reference policy when needed in DAPO (#1651 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes wrong initialization so that verl only loads reference policy when needed. ### Additional Info. - Issue Number: none - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-23 19:32:19 +08:00
Shawn/Yuxuan Tong	9ddc72520e	fix: add `loss_agg_mode` to critics (#1340 ) # What does this PR do? This PR adds `loss_agg_mode` to critics. # Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info - Issue Number: none - Training: both - Inference: none	2025-05-23 16:09:21 +08:00
imh966	aaaaaab900	Activation Offloading (#1220 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR supports activation offloading, and currently it's only for FSDP backend. ### High-Level Design Our implementation is based on the [one](https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/cpu_offload.py) in TransformerEngine. For efficiency, it groups activations by TransformerLayer and offloads activation groups asynchronously. This means that the offloading of the i-th activation group and the computation of the i+1-th activation group happen at the same time, and there are at most two activation groups in GPU memory. ### Specific Changes 1. Add activation offloading support. ### API ### Usage Example ``` export VLLM_ATTENTION_BACKEND=XFORMERS python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=./data/gsm8k/train.parquet \ data.val_files=./data/gsm8k/test.parquet \ data.train_batch_size=512 \ data.max_prompt_length=512 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=./huggingface.co/Qwen/Qwen2-7B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.model.enable_activation_offload=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console','tensorboard'] \ trainer.project_name='verl_grpo_example_gsm8k' \ trainer.experiment_name='qwen2_7b_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.val_before_train=False \ trainer.nnodes=1 \ trainer.save_freq=-1 \ trainer.test_freq=5 \ trainer.total_epochs=15 ``` ### Test We conducted experiments on the Qwen2 7B model based on the above script. The memory and throughput data are shown in the figures below, where the blue line represents activation offloading. <img width="351" alt="image" src="https://github.com/user-attachments/assets/207576a1-3f47-4b40-bf19-60cf8105d609" /> <img width="361" alt="image" src="https://github.com/user-attachments/assets/d58f0f8b-eb5f-4e19-a892-4d778ff26135" /> ### Additional Info. - Issue Number: none - Training: This PR will affect FSDP backend - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-23 15:55:02 +08:00
Qunhong Zeng	54a5e6ee6d	[megatron] feat: save hf model config in megatron checkpoint manager (#1562 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR enables the Megatron backend checkpoint manager to save hf model config into verl checkpoints, and simplify our CI since the `--hf_model_path` has been deprecated in https://github.com/volcengine/verl/pull/1468, fixes the comment https://github.com/volcengine/verl/pull/1468#issuecomment-2883541227. Note: several changed lines in `verl/utils/megatron_utils.py` are unrelated to this PR; they were automatically reformatted by pre-commit hooks. ### Test The current CI e2e tests should sufficient cover for this PR. ### Additional Info. - Training: Megatron - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-23 14:50:48 +08:00
Geaming	2c179dae23	Add explicit position_ids to model.generate in hf rollout (#1637 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Added position_ids parameter to the model.generate method call to provide explicit control over token positions during text generation. I don't quite understand why have obtained position ids above but not passed them to generate, so I modified this.😂 ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-23 09:43:49 +08:00
Mert Unsal	01abd3c77c	Mert/passatk advantage (#1621 )	2025-05-23 07:15:27 +08:00
Chayenne	6dfa11adb1	[docs] recipe: fix spin README (#1647 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-22 13:54:07 -07:00
Yan Bai	04acd09d65	[megatron] optimization: avoid padding to logits (#1629 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Avoid a huge memory overhead (bsseq_lenvocab_size) when training with megatron given bs=4, seq_len=4k, vocab_size=150k, the memory overhead is about 4.8GB ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes calculate the log_p and entropy right after the sequence packed logits, avoid the sequence unpack of logits > List the specific changes. add a logit_processor callback to forward_function, let megatron_actor give the logit_processor to caluculate the log_p and entropy ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test recipe: qwen2-7B PPO gsm8k machine: 8H100 Before changes: TP2: OOM TP4: OOM TP2PP2: about 56s/step, actor MFU of first step is 0.133 After changes: TP2: about 40s/step, actor MFU of first step is 0.165 > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference*: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-22 23:28:41 +08:00
Chendong Wang	867d3024bf	[Recipe] SPIN: spin algorithm implementation (#1407 ) This PR introduces an implementation of the Self-Play Fine-Tuning (SPIN) algorithm, adapting the existing PPO framework within verl. You can find more information about SPIN here: https://github.com/uclaml/SPIN This implementation adapts the PPO framework for SPIN/Online DPO, involving these core changes: * Objective & Loss: * PPO maximizes cumulative reward via policy gradients and value estimates. * SPIN/Online DPO directly optimizes preference likelihood using a DPO-specific loss function (e.g., sigmoid loss). * Code Change: The primary logic change is in the actor's update step (`dp_actor.py: SPINDataParallelPPOActor.update_policy_dpo_with_ref`, `fsdp_workers.py: SPINRolloutRefWorker.update_actor_dpo`) and the loss calculation (`core_algos.py: compute_online_dpo_loss`). * Model Requirements: * PPO uses Actor, Critic, and optionally Reward/Reference models. * SPIN/Online DPO uses Actor and a mandatory Reference Model for the loss calculation, plus a reward source for preference labeling. No Critic is needed. * Code Change: The `ray_trainer.py` logic was modified to manage/update the reference model and not initialize/use the critic worker. `fsdp_workers.py` was updated for reference model initialization and checkpointing. * Update Signal: * PPO relies on Advantage estimates derived from rewards and the critic's value function. * SPIN/Online DPO uses the log probability difference between chosen/rejected pairs under the policy and reference models. * Code Change: Advantage calculation (`compute_advantage`) was removed from the training loop in `ray_trainer.py`. Preference determination (`compute_onlineDPO_pref` in `core_algos.py`) was added. * Data: * PPO uses (prompt, response, reward, value) tuples. * SPIN/Online DPO effectively uses (prompt, chosen_response, rejected_response) tuples, requiring preference data generation. * Code Change: Data processing in `ray_trainer.py` (`fit_dpo`) was adapted to handle preference pairs and prepare the specific inputs needed for `update_policy_dpo`. --------- Co-authored-by: H <linhaibin.eric@gmail.com> Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-05-22 07:08:59 -07:00
Blue Space	c803b1f769	[BugFix] Fix sglang and vllm engine args (#1634 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? #1616 causes vllm engine arg init failed, not know why CI of that PR fail to detect. Some errors have shown up. ![image](https://github.com/user-attachments/assets/ac6bb86e-1576-458e-b341-0e949724ac12) We may better separate engine args for different inference systems ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API ```yml engine_kwargs: # inference engine parameters vllm: swap_space: null # null means "use the engine default value" (usually 4 GB), setting it to, e.g., 32 means 32 GB sglang: attention_backend: null # null means use the engine default value, available options: flashinfer, triton, flashmla ``` ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-22 20:41:17 +08:00
CajZella	6cbb051753	[PRIME]bug fix: reward scoring hangs after progress reaches 100% (#1466 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Fixes a hang issue during reward scoring where the progress bar reaches 100% but the program does not continue. Adds robust support for asynchronous reward computation with subprocess cleanup. ### High-Level Design > This PR refactors the reward scoring pipeline (`PRIMERewardManager`) to: Adds forced cleanup of lingering subprocesses using psutil to avoid deadlocks. ### Specific Changes - Replaces `asyncio.run()` inside `verify()` with an external event loop using `new_event_loop()` for better compatibility with Ray and training frameworks. - Replaces `proc.kill()` with `psutil.terminate()` and `.wait()`, and move from `exception` to `finally` to clean up worker subprocesses safely and avoid zombie processes. ### Additional Info. - Issue Number: #288(maybe) - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-22 16:43:08 +08:00
Blue Space	1cfa2be530	[Megatron][BREAKING] Allow override of transformer config to enable custom megatron features like variable PP layers distribution, with CI tests (#1555 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Allow to override of transformer config to enable custom megatron features like variable PP layers distribution, with CI tests, which is in need for larger moe models with 94 layers (Qwen3 moe) or 61 layers (DeepSeek V3) We will first fix e2e_prime CI by use fused kernels. Notice that now the imbalance PP layers distribution only compatible with dist_ckpt load and save, not support huggingface direct load/save. Also, other megatron arguments can be passed through scripts. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API Breaking APIs: ```py class MegatronWorker(Worker): def _init_hf_config_and_tf_config(self, model_path, dtype, override_model_config, override_transformer_config): # and the models building ``` ```yaml actor: megatron: override_transformer_config: {} # common transformer config for all models ``` To avoid trouble of input same transformer config arguments, other models will reuse actor's config, so just need to input once. ### Usage Example ```bash run_ppo_trainer_megatron.sh \ +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=13 \ +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=11 ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: Megatron - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-22 13:38:34 +08:00
spacegoing	be215d7b08	[docker] aws efa driver dockerfile (#1631 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add sample dockerfile to support aws efa driver. Otherwise NCCL raise system error on such aws instances (like sagemaker ai pod).	2025-05-21 21:53:21 -07:00
zhangyongxin121	c07013ea39	[vllm] feat: add `engine_kwargs` in vllm_rollout_spmd to set params in vllm engine (#1442 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > add `engine_kwargs` in vllm_rollout_spmd to set swap_space in vllm engine. ### Specific Changes > add `engine_kwargs` in vllm_rollout_spmd, which can be set in config file. Same changes has been made in vllm_rollout. As the version of vllm is update to 0.8, the default vllm_rollout worker becomes vllm_rollout_spmd, which does not have `engine_kwargs` as in vllm_rollout, so this RP want to complete it. ### Usage Example > users can set vllm engine param such as `swap_space`, `seed` through the `engine_kwargs` in config file. For example, if one want to set the swap_space=32 in vllm, he can set the item in config like this ```bash actor_rollout_ref.rollout.engine_kwargs.swap_space=32 ```	2025-05-21 20:23:09 -07:00
Oliver Li	821689e0e9	docs: Add RM-R1 to awesome work list (#1608 ) ### What does this PR do? > Adding RM-R1 to the README as a list of work that used veRL ### High-Level Design > 1-line update ### Specific Changes > only changed the readme.md (1-line update) ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-21 16:09:03 -07:00
Stefan He	72683a7ced	Expose `engine_kwargs` from SGLang to Verl configuration (#1616 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Expose `engine_kwargs` from SGLang to Verl configuration This PR enables RL users to configure `engine_kwargs` directly through Verl, providing more control and flexibility over inference behavior. ### High-Level Design One key motivation is the choice of attention backend, which can significantly affect rollout performance. The SGLang team has observed that different attention backends perform better in different phases: FA3 tends to be more efficient during the prefill stage. FlashInfer or Triton generally offer better performance during decode. Moreover, the optimal backend may change across versions of SGLang. By exposing these parameters, we allow users to tune their setup based on the specific use case and version, ultimately improving performance and adaptability. > Add one-line overview of what this PR aims to achieve or accomplish. > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test In my setup about QWen 2.5 7B Instruct on H200: ``` timing_s/step:106.761 (flash infer) timing_s/step:100.520 （fa3) timing_s/step:100.364 (triton) ``` Hence, I would suggest our team to use fa3 or triton for now. > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-21 08:49:05 -07:00
Qunhong Zeng	d475654a2b	[minor] fix: use init_empty_weights instead of torch.device("meta") (#1587 )	2025-05-21 23:35:40 +08:00
Wang Zhang	cbc02ebc37	[fix] img not displaying on single controller doc (#1622 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? fix the img missing in single controller doc ### High-Level Design NA ### Specific Changes - add `?raw=true` to img link in single_controller doc - move the single_controller doc along with hybridflow programming guide in index ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-21 20:23:40 +08:00
Chi Zhang	2dc3e0ebad	[recipe] feat: support running dapo using main_ppo (#1612 ) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? - Add two scripts that run dapo using main_ppo with FSDP and megatron backend - Fix val reward manager init issue - Fix missing keys in ppo_trainer_megatron.yaml - Fix megatron optimizer offload when the optimizer state has not been initialized. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-21 16:42:29 +08:00
Stefan He	8970cb05f3	Simplify FSDP SGLang pre and post process (#1609 ) ### Checklist Before Starting Suggested by @BearBiscuit05, we can follow fsdp_vllm to simplify the device mesh and tp rank logic. - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ``` Training Progress: 7%\|▋ \| 15/210 [25:27<5:27:08, 100.66s/it] (WorkerDict pid=1408128) update_weights_from_tensor time: 0.8615233898162842 seconds (WorkerDict pid=1408128) self.sampling_params={'n': 5, 'max_new_tokens': 1024, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} (TaskRunner pid=1406018) list(reward_extra_infos_dict.keys())=[] (WorkerDict pid=1424002) update_weights_from_tensor time: 1.0217373371124268 seconds [repeated 7x across cluster] (WorkerDict pid=1424002) self.sampling_params={'n': 5, 'max_new_tokens': 1024, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'temperature': 1.0, 'top_k': -1, 'top_p': 1, 'ignore_eos': False} [repeated 7x across cluster] (TaskRunner pid=1406018) step:16 - global_seqlen/min:215129.000 - global_seqlen/max:226140.000 - global_seqlen/minmax_diff:11011.000 - global_seqlen/balanced_min:220875.000 - global_seqlen/balanced_max:220876.000 - global_seqlen/mean:220875.250 - actor/entropy_loss:0.116 - actor/kl_loss:0.008 - actor/kl_coef:0.001 - actor/pg_loss:-0.013 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:0.088 - perf/mfu/actor:1.919 - perf/max_memory_allocated_gb:26.795 - perf/max_memory_reserved_gb:52.268 - perf/cpu_memory_used_gb:77.170 - actor/lr:0.000 - training/global_step:16.000 - training/epoch:2.000 - critic/score/mean:0.957 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.957 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.003 - critic/advantages/max:1.789 - critic/advantages/min:-1.789 - critic/returns/mean:-0.003 - critic/returns/max:1.789 - critic/returns/min:-1.789 - response_length/mean:241.269 - response_length/max:725.000 - response_length/min:55.000 - response_length/clip_ratio:0.000 - prompt_length/mean:103.849 - prompt_length/max:232.000 - prompt_length/min:66.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:43.595 - timing_s/reward:0.898 - timing_s/old_log_prob:8.819 - timing_s/ref:8.876 - timing_s/adv:0.125 - timing_s/update_actor:36.083 - timing_s/step:98.803 - timing_per_token_ms/gen:0.035 - timing_per_token_ms/update_actor:0.020 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.005 - perf/total_num_tokens:1767002.000 - perf/time_per_step:98.803 - perf/throughput:2235.507 ``` > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-21 12:45:31 +08:00
Qunhong Zeng	80af51b609	[constant scheduler] fix: model won't be updated on first training step (#1463 )	2025-05-20 20:57:15 -07:00
Yan Bai	add17f029e	[megatron] support megatron expert parallel (#1467 ) ### Checklist Before Starting ### What does this PR do? support expert parallel in megatron ### High-Level Design introduce EPsize and ETPsize ETPsize is the TPsize for MoE parts, recommended to set 1, meaning that MoE parts not use TP ### Specific Changes 1. mcore model initilize 2. megatron vllm parameter transfer ### API ### Usage Example ```bash LLM=models/Qwen1.5-MoE-A2.7B-Chat NODES=1 PP=2 TP=4 VLLM_TP=4 EP=4 ETP=1 python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\ algorithm.adv_estimator=gae \ data.train_files="$train_files" \ data.val_files="$test_files" \ data.train_batch_size=128 \ data.max_prompt_length=1024 \ data.max_response_length=512 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=$LLM \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=32 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \ critic.optim.lr=1e-5 \ critic.model.path=$LLM \ critic.model.enable_gradient_checkpointing=False \ critic.ppo_micro_batch_size_per_gpu=1 \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='verl_megatron_gsm8k_examples' \ trainer.experiment_name='qwen_moe_instruct_1node_ep' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=$NODES \ trainer.save_freq=-1 \ trainer.test_freq=5 \ actor_rollout_ref.rollout.tensor_model_parallel_size=$VLLM_TP \ actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=$PP \ actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=$PP \ critic.megatron.pipeline_model_parallel_size=$PP \ actor_rollout_ref.actor.megatron.tensor_model_parallel_size=$TP \ actor_rollout_ref.ref.megatron.tensor_model_parallel_size=$TP \ critic.megatron.tensor_model_parallel_size=$TP \ actor_rollout_ref.actor.megatron.expert_model_parallel_size=$EP \ actor_rollout_ref.ref.megatron.expert_model_parallel_size=$EP \ critic.megatron.expert_model_parallel_size=$EP \ actor_rollout_ref.actor.megatron.expert_tensor_parallel_size=$ETP \ actor_rollout_ref.ref.megatron.expert_tensor_parallel_size=$ETP \ critic.megatron.expert_tensor_parallel_size=$ETP \ actor_rollout_ref.actor.megatron.use_dist_checkpointing=True \ actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \ critic.megatron.use_dist_checkpointing=True \ actor_rollout_ref.actor.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \ actor_rollout_ref.ref.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \ critic.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \ actor_rollout_ref.actor.megatron.param_offload=True \ actor_rollout_ref.ref.megatron.param_offload=True \ critic.megatron.param_offload=True \ trainer.total_epochs=100 $@ ``` ### Test ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-05-21 11:05:11 +08:00
Blue Space	7b0426a738	[Docker Image] update images and fix sglang installation (#1606 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? update images and fix sglang installation, the latest image: `whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - vLLM: 0.8.5.post1 - SGLang: 0.4.6.post4, fix installation - Megatron: core_v0.12.0 announcement - TransformerEngine: 2.3 ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-21 09:13:51 +08:00
H	d13507229a	docs: update adoption and doc index (#1607 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? update adoption and doc index ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-21 09:12:25 +08:00
Stefan He	0d3360a12d	Remove unnecessary broadcast (#1597 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Remove unnecessary broadcast > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes Removed one line > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test ```bash (WorkerDict pid=1099764) Before broadcast: TensorDict( (WorkerDict pid=1099764) fields={ (WorkerDict pid=1099764) attention_mask: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) input_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) position_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) prompts: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) responses: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True)}, (WorkerDict pid=1099764) batch_size=torch.Size([330]), (WorkerDict pid=1099764) device=None, (WorkerDict pid=1099764) is_shared=False) (WorkerDict pid=1099764) After broadcast: TensorDict( (WorkerDict pid=1099764) fields={ (WorkerDict pid=1099764) attention_mask: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) input_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) position_ids: Tensor(shape=torch.Size([330, 2048]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) prompts: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True), (WorkerDict pid=1099764) responses: Tensor(shape=torch.Size([330, 1024]), device=cuda:0, dtype=torch.int64, is_shared=True)}, (WorkerDict pid=1099764) batch_size=torch.Size([330]), (WorkerDict pid=1099764) device=None, (WorkerDict pid=1099764) is_shared=False) ``` > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-20 10:15:28 -07:00
Wang Zhang	f41a57a827	[misc] docs: add design doc of single_controller (#1549 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add the design doc for `verl.single_controller`. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - `docs/single_controller.rst` - `docs/imgs/call_generate_sequences.png` - `docs/imgs/worker_group_init.png` ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-20 09:24:00 -07:00
Blue Space	1d4e23a562	[BugFix] Megatron: fix missing grad_norm and lr calculation, and fix fsdp grad_norm storage (#1601 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? fix Megatron missing grad_norm and lr calculation, and fix fsdp grad_norm storage ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test Tested Qwen2-7b with FSDP. Different configuration makes the divergence. <img width="387" alt="image" src="https://github.com/user-attachments/assets/183c62d0-a86a-4f4b-8168-d98c98961f7b" /> ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-20 22:04:12 +08:00
Yanbin Jiang	15b1b15f99	Support multi-turn rollout with Qwen chat template (#1593 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support multi-turn rollout with Qwen chat template ### Specific Changes Currently, Verl's multi-turn rollout only supports ChatML-style messages. However, Qwen uses a different chat formatting template with the following key differences: 1. Qwen uses the `user` role tag to wrap tool responses. 2. Qwen merges consecutive tool responses into a single message. For example, for parallel tool calls, ChatML renders consecutive tool responses like this: ``` <\|im_start\|>tool tool response content 1<\|im_end\|> <\|im_start\|>tool tool response content 2<\|im_end\|> ``` In contrast, the Qwen chat template renders them as: ``` <\|im_start\|>user <tool_response> tool response content 1 </tool_response> <tool_response> tool response content 2 </tool_response><\|im_end\|> ``` This PR introduces a new `qwen` format option in the config to support this tool message style. ### Usage Example Set the multi-turn format to `qwen`: ``` multi_turn: enable: True max_turns: 5 format: qwen ``` ### Test Verified the rendered messages via print output to ensure: 1. ChatML format remains unchanged. 2. Qwen format aligns with the Qwen chat template as defined in its HuggingFace tokenizer config. Tested across the following scenarios: 1. Assistant message without tool calls. 2. Assistant message with one tool call + one tool response message. 3. Assistant message with parallel tool calls + multiple consecutive tool response messages. ### Additional Info. - Inference: SGLang ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-20 16:56:57 +08:00
Shizhan Lu	a3c4cb386c	Disable fused kernels in prime (#1598 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Currently, the `e2e_prime` test encounters the error` AttributeError: 'NoneType' object has no attribute 'squeeze'`, which is caused by [ #1212]. In PR [#1568], the parameter `use_fused_kernel` in `ppo_trainer.yaml` was set to `false`, but the corresponding parameter in `prime_trainer.yaml` was not updated. This is preventing the CI from passing. Before the root cause of `use_fused_kernel` is fully resolved , I guess we should temporarily set `use_fused_kernel` to `false` in `prime_trainer.yaml` ### High-Level Design Not needed ### Specific Changes - Default use_fused_kernels = False ### API Not needed ### Usage Example Not needed ### Test Not needed ### Additional Info. Not needed ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-20 16:27:33 +08:00
Joel	3eaaf24d5a	[rollout] perf: replace AsyncOpenAI to aiohttp client in ChatCompletionScheduler (#1588 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? AsyncOpenAI has very severe performance issue due to httpx, replace it to aiohttp client. For train_batch_size=1024, AsyncOpenAI introduces ~25s per generation phase. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-20 11:31:19 +08:00
Pavel Gein	457ccd9962	[feat] support logging to ClearML (#1582 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support logging to [ClearML](https://clear.ml/) system ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-20 10:27:40 +08:00
Blue Space	88527e6aa5	[BugFix] Megatron: fix checkpoint manager, some states only rank 0 need to save to avoid errors (#1586 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Megatron: fix checkpoint manager, some states only rank 0 need to save to avoid errors ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-20 09:47:06 +08:00
Xiang Long	8160ec6a58	Bump to sglang 0.4.6.post4 & unified generate sequences ability between sgl and sgl async (#1577 ) ### Checklist Before Starting - [x] Search for similar PR(s). - Thanks to: - close #1558 due to mix of prs - close #1449 due to partial fix sgl new version issue - close #1300 which is part of current pr - This pr is co-authored with @ocss884 ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. - bump sglang to 0.4.6.post4 - unified sglang and sglang_async `generate_sequences` api behavior, e.g. image support - fix warning for cuda barrier at start of fsdp_workers ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>	2025-05-20 09:39:07 +08:00
Hongpeng Guo	8788e55807	[doc] single_controller: Adding doc strings and doc pages for public methods in `single_controller` (#1396 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds doc string for the public methods inside `single_controller` module, so that these methods can be reused and referenced better. A new doc page `Single Controller Interface` was also added under the API Reference section. ![Screenshot 2025-05-04 at 4 58 23 PM](https://github.com/user-attachments/assets/3848b0d3-fbab-4023-915f-47620ed2676a) ### TODO: This is the first of a series of PRs to improve and stabilize the docs and API. TODOs include: * `verl/trainer` docs * `verl/utils` docs * Generally refine doc string of the whole repo Next PR to review is #1397 ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-19 16:44:05 -07:00
Patrick Jiang	877e097f74	README: add back DeepRetrieval and add a new work s3 (#1592 ) ### Checklist Before Starting - [✅ ] Search for similar PR(s). ### What does this PR do? > (1) Add back DeepRetrieval (the first search agent framework interacting with search engine) to the "awesome work" of main page, and (2) add a new work s3 (much more efficient way (70x less data) to train an powerful search agent!) ### High-Level Design > Only updates two readme files. ### Specific Changes > (1) Added "- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): RL Training of Search Agent with Search/Retrieval Outcome ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)" to the main page's README.md. (2) Added "- [s3](https://github.com/pat-jj/s3) Efficient Yet Effective Search Agent Training via RL ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/s3)" to the recipe/README.md ### API > N/A ### Usage Example > N/A ### Test > N/A ### Additional Info. N/A ### Checklist Before Submitting - [✅] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [✅] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [N/A] Add `[BREAKING]` to the PR title if it breaks any API. - [N/A] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [N/A] Add CI test(s) if necessary.	2025-05-19 16:28:31 -07:00
Blue Space	ab24d7b5bb	[BugFix] fix sglang CI, use stable way to download Qwen 7B model (#1585 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sglang CI, use stable way to download Qwen 7B model ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-19 22:47:13 +08:00
Hongpeng Guo	5b24e01d56	[misc] use lazy import on megatron utils components (#1551 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR moves the `from megatron.core import ModelParallelConfig, tensor_parallel` into lazy import, so that when utilities who don't use these modules are imported, we don't always import `megatron.core.xxx` by default. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-19 13:08:32 +08:00
Blue Space	8176d3b96d	[Megatron] Qwen3moe-part3: fix mcore qwen3 moe config, no need for patching now, offer option to freeze moe router (#1540 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? fix mcore qwen3 moe config `moe_router_pre_softmax`, no need for patching now, and offer option to freeze moe router ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API Moe models initialization: ```py def initialize(self, kwargs): ``` ### Usage Example ```python moe_router_pre_softmax=False, ``` ```yaml override_config: moe_config: freeze_moe_router: False ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-19 11:26:43 +08:00
Wang Zhang	8845a33d6c	[misc] ci: fix typo in PULL_REQUEST_TEMPLATE.md (#1571 )	2025-05-19 09:58:30 +08:00
Guangming Sheng	8653b1b200	[misc] feat: support return full prompt with chat template in RLHFDataset (#1567 )	2025-05-19 01:13:21 +08:00
mingruimingrui	b9a6890ff3	disable fused kernels by default (#1568 )	2025-05-18 23:27:34 +08:00
Qunhong Zeng	530154e153	[merger] fix: avoid setting torch's global device to meta (#1564 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes several issues (https://github.com/volcengine/verl/issues/1484, https://github.com/volcengine/verl/issues/1255) that cause the error: "Cannot copy out of meta tensor; no data!". The related code in our part is: `d36b5e81d6/scripts/model_merger.py (L131-L132)` The `torch.device("meta")` context manager sets the current global torch device to "meta". During `auto_model_class.from_config`, various import statements load third-party libraries, whose `__init__.py` files may contain global statements that use torch for calculations. For example, transformers imports [[torchao](`5549da8af9/torchao/optim/subclass_4bit.py (L33)`), which executes the following during initialization: ```python QMAP_UNSIGNED = torch.linspace(0, 1, 17)[1:].tolist() # no zero ``` In this case, when using the `torch.device("meta")` context manager, `torch.linspace(0, 1, 17)` gets created on the meta device, which only assigns metadata and cannot be moved to CPU. This causes the `.tolist()` call to fail with the error "Cannot copy out of meta tensor; no data!" To fix this, we're now using `init_empty_weights` from `accelerate`, which patches `nn.Module.register_parameter` instead of patching torch's global device (`417bc52965/src/accelerate/big_modeling.py (L96-L170)`), thus avoiding this issue. Here's a simple illustration: ```python >>> import torch >>> from accelerate import init_empty_weights >>> with init_empty_weights(): ... QMAP_UNSIGNED = torch.linspace(0, 1, 17)[1:].tolist() ... >>> QMAP_UNSIGNED [0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.5, 0.5625, 0.625, 0.6875, 0.75, 0.8125, 0.875, 0.9375, 1.0] >>> with torch.device("meta"): ... QMAP_UNSIGNED = torch.linspace(0, 1, 17)[1:].tolist() ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/usr/local/lib/python3.10/dist-packages/torch/utils/_device.py", line 104, in __torch_function__ return func(args, kwargs) NotImplementedError: Cannot copy out of meta tensor; no data! ``` cc @ETOgaosion ### Additional Info. - Issue Number: Fixes issue https://github.com/volcengine/verl/issues/1484, https://github.com/volcengine/verl/issues/1255, https://github.com/volcengine/verl/pull/1468#issuecomment-2886345570 - Training: both - Inference*: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-18 19:17:27 +08:00
Edward Z. Yang	d36b5e81d6	Add missing fi to install script (#1559 )	2025-05-18 11:15:57 +08:00
Lei	40dcabec38	[BUG] Fix silent bug of using dtype from previous loop scope in build_memory_reference_from_module() (#1553 ) This pull request includes a minor fix in the `build_memory_reference_from_module` function within `verl/utils/memory_buffer.py`. The change ensures that the correct data type is passed when calculating the padded number of elements. * Bug Fix: - Updated the `calc_padded_numel` function call to use `param.dtype` instead of `dtype`, ensuring compatibility with the parameter's actual data type. (`[verl/utils/memory_buffer.pyL107-R107](diffhunk://#diff-77d53102508293685e0b9a1281dbacf7720fb8070db73157aa90157d516004a4L107-R107)`) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-17 10:55:28 +08:00
Blue Space	b8bd596811	[Docker Image] use latest vLLM (0.8.5) to fully support Qwen3 moe (#1544 )	2025-05-17 07:28:55 +08:00
Qunhong Zeng	3f4647f9bc	[model merger] refactor model merger for better usage and maintainability (#1468 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR refactors `model_merge`, making the code cleaner and more maintainable: - now verl checkpointer manager will save model config and processor/tokenizer (introduced in https://github.com/volcengine/verl/pull/1288), so there is no need for `hf_model_path`. This PR deprecates this argument and keeps it for backward compatibility. - the current `model_merge` has two purposes, merge checkpoints and test checkpoints (mainly for CI). This PR separates these two purposes into two sub-commands to better manage user input argument for improved user experience. - generally cleans up the code and makes it look better. ### Test Our current CI hasn't tested DDP+FSDP e2e training. This PR also adds DDP+FSDP e2e into CI and tests merging DDP+FSDP checkpoints. The current CI should test this PR correctly. ### Additional Info. - Training: both - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-16 23:53:08 +08:00
mingruimingrui	eb077f66e5	Feat/memory optimized loss (#1212 ) # What does this PR do? This PR implements fused losses for alignment. #710 It reduces the memory required for loss calculation to a small constant amount. # ChangeLog: - added the option use_fused_kernels - monkey patch to make model.forward return last_hidden_state and not calculate logits - Added FusedLinearForPPO to verl/utils/experimental/torch_functional.py # Usage Simply add the following option ``` actor_rollout_ref.model.use_fused_kernels=True ``` ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [ ] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [ ] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - The current implementation uses chunking to reduce the memory consumption to a constant value. - It works by splitting the loss calculations into chunks of 512 tokens. Calculating the log_probs / entropy values / gradients for each chunk and accumulating them. - However the current implementation can be slow. It processes each chunk sequentially in a python for loop. - In the future we should consider converting the fused functions into triton or some other JIT solution. - Compared to FusedPPOLossFunction, optimizing hidden_states -> entropy & log_probs is much better for algorithm developers as the memory heavy part is optimized away for them and they are free to combine the values for their own custom loss functions. --------- Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com> Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-05-16 22:52:54 +08:00
Blue Space	b52956409c	[megatron] Qwen3moe-part 2: Allow Infer and train tp to be different with CI tests, Fix vllm resharding process (#1444 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. This PR eliminates the micro-dp group as the article says, and support train-infer tp to be different. 2. Side Effect: able to run Qwen3moe on megatron aligned with FSDP. 3. CI tests have been added to check the effect. ### High-Level Design This PR eliminates the micro-dp group as the article says, since the `generate_sequence` process only relates to inference engine, there is no need for us to consider the training side. The only problem now is that the `dispatch/collect` function cannot directly use the inference parallel size, so current solution is that we define a new `MEGATRON_ALL_DP` dispatch method to view all ranks as Data Parallel rank, which is the same as FSDP. So we follow the way of FSDP to pre/post-process the data. ### Specific Changes Mainly in `megatron_vllm.py` ### API None ### Usage Example ```sh actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ # or actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ ``` ### Test Added CI tests. For e2e test with Qwen 2.5 7B, please refer to `examples/grpo_trainer/run_qwen2_5-7b_math_megatron_diff_tp.sh` ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: Megatron - Inference: vLLM ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-16 16:39:01 +08:00
gaokaiz2	12bb85777d	[Refactor] Add middle truncation (#1488 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds support for a new truncation mode, middle, for loading datasets. It enables data that exceed the `max_prompt_length` to retain both the beginning and the end of the prompt, instead of truncating content only from the left or only from the right. ### High-Level Design The implementation introduces a `"middle"` option, alongside the existing truncation modes, making changes in both `rl_dataset.py` and `torch_functional.py`. When selected, the logic splits the allowed max length roughly in half and keeps the head and tail of the sequence, effectively discarding the middle section. ### Specific Changes In `verl/utils/dataset/rl_dataset.py`: - Added support for `self.truncation == "middle"` at line ~233. - Performs symmetric truncation from both ends of the prompt: ```python elif self.truncation == "middle": left_half = raw_prompt_ids[: self.max_prompt_length // 2] right_half = raw_prompt_ids[-self.max_prompt_length // 2 :] raw_prompt_ids = left_half + right_half ``` In `verl/utils/torch_functional.py`: - Added support for `"middle"` truncation mode in the `postprocess_data` function. - Updated truncation assertion to include `"middle"`: ```python assert truncation in ["left", "right", "middle", "error"] ``` - Implemented middle truncation logic: ```python elif truncation == "middle": left_half = max_length // 2 right_half = max_length - left_half input_ids = torch.cat([input_ids[:, :left_half], input_ids[:, -right_half:]], dim=-1) attention_mask = torch.cat([attention_mask[:, :left_half], attention_mask[:, -right_half:]], dim=-1) ``` ### API - Adds `"middle"` as a valid option to the `truncation` argument in the API. ### Usage Example ```python # Example usage when loading prompts with middle truncation from verl.utils.dataset.rl_dataset import RLDataset # Assume tokenizer and other necessary args are already initialized rl_dataset = RLDataset( ..., # other args truncation="middle" ) ``` ### Test This change aligns with precedents from long-context evaluation benchmarks, where middle truncation is the default/preferred method for handling overly long inputs: - [LongBench implementation](`2e00731f8d/LongBench/pred.py (L56)`) ([paper](https://arxiv.org/pdf/2308.14508)) - [InfiniteBench implementation](`51d9b37b0f/src/eval_utils.py (L413)`) ([paper](https://arxiv.org/pdf/2402.13718)) Both benchmarks favor middle truncation for long inputs, as it better preserves relevant context information from both the beginning and end of the sequence. ### Additional Info. - Issue Number: N/A (no linked issue yet) - Training: None affected - Inference: None affected ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Co-authored-by: Wang Siyuan <v-siywang@microsoft.com> Co-authored-by: Wang Siyuan <wsy0227@sjtu.edu.cn>	2025-05-16 11:31:57 +08:00
ShareLer	2c991f6ca2	[megatron] fix head_dim in GQA model when load from hf ckpt (#1513 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? fix head_dim in GQA model when load from hf ckpt ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Change the acquisition methods of q and kv head_dim to be compatible with GQA. - Add the conversions of q_layernorm and k_layernorm in convert_megatron_model_to_transformers_model for Qwen3. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue #1510 ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>	2025-05-16 10:21:57 +08:00
H	771bd756b3	[misc] docs: move dev folder to scripts. add sandbox documentation to index.rst. (#1539 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - move dev folder to scripts @ETOgaosion - add sandbox documentation to index.rst @chenhaiq - installation docs have been updated ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-16 08:12:31 +08:00
湛露先生	a43db53bb5	[chore] refactor: clean utils code. (#1290 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-05-15 16:20:34 -07:00
hijkzzz	4e9586a3a0	Fix reinforce_plus_plus_baseline advantage mask (#1527 )	2025-05-15 23:39:33 +08:00
linxxx3	6de40fcdfa	fix #1534 , sglang_async missing offload_param config (#1536 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-15 22:43:00 +08:00
Guangming Sheng	146676091f	[misc] fix: no need to use world_size to decide whether to use full_tensor in FSDP2 (#1529 ) [misc] fix: no need to use world_size to decide whether to use full_tensor() for FSDP2 state_dict() when world_size==1 ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR simplifies the parameter loading logic within the `FSDPVLLMShardingManager` by removing an unnecessary `world_size` check when determining whether to call `full_tensor()` on parameters obtained from an FSDP2 model's `state_dict()`. As the FSDP2 parameters are all `DTensor`. ### High-Level Design The change modifies the update_params method. When loading weights into the vLLM model, parameters from the FSDP state_dict() (which might be ShardedTensor or DTensor instances under FSDP2 when world_size == 1) are converted to full tensors using param.full_tensor(). This PR ensures this conversion happens if the full_tensor() method is available on the parameter, without an additional, potentially incorrect, check against world_size == 1. ### Specific Changes Skip. See file changes ### API No ### Usage Example No ### Test No CI changes ### Additional Info. - Issue Number: No - Training: [Note which backend this PR will affect: FSDP - Inference: [Note which backend this PR will affect: vLLM ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-15 19:18:00 +08:00
Yuyu Zhang	11622fc72f	Add Seed-Coder project in README (#1532 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Update README to add Seed-Coder as an example project using verl. ### High-Level Design N/A ### Specific Changes Add one line in README about the Seed-Coder project. ### API N/A ### Usage Example N/A ### Test N/A ### Additional Info. N/A ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-15 18:01:06 +08:00
OC	2c8b2b995f	[feat] Sandbox: support sandbox fusion on FaaS & localhost (#1429 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Implement sandbox fusion backend on FaaS. For example, reward score using a FaaS instance on volcengine.com. It have better performance and security comparing to local sandbox. ### Specific Changes Added a code branch in _default_compute_score to choose sandbox according to sandbox_fusion_url configuration. ### Usage Example examples/ppo_trainer/run_deepseek7b_llm_sandbox_fusion.sh ### Test tests/reward_score/test_sandbox_fusion.py However, the new testcase requires to setting Sandbox API URL in env SANDBOX_FUSION_URL. If the env is not set, most testcases will be skipped. ### Additional Info. Using sandbox on Faas have save 60% time on reward process comparing local sandbox: <img width="273" alt="截屏2025-05-07 20 37 05" src="https://github.com/user-attachments/assets/fc9c0e23-6afe-4f34-a28a-a1756e85d45f" /> ### Checklist Before Submitting - [] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [] Add `[BREAKING]` to the PR title if it breaks any API. - [] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [] Add CI test(s) if neccessary.	2025-05-15 17:53:47 +08:00
Qunhong Zeng	e12edc7f35	[lr_schedular] fix: implement proper min_lr_ratio support in cosine scheduler (#1400 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix the cosine learning rate scheduler to properly respect `min_lr_ratio` parameter during both warmup and decay phases. Update warmup phase to start from `min_lr_ratio` instead of 0, ensure decay phase never goes below `min_lr_ratio`, and add explicit `num_cycles` parameter to scheduler config. Set default values in configuration files and handle `null` values since in some example yaml config, the `min_lr_ratio` is set to `null`. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: fix https://github.com/volcengine/verl/issues/1376 - Training: FSDP - Inference: None ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-15 09:49:11 +08:00
llkn-2	537003548d	[bugfix] correct retrieval of `max_position_embeddings` from config (#1520 )	2025-05-15 07:06:23 +08:00
Necolizer	2d16173baa	[doc] update docs for custom tool config (#1523 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Update the docs for Custom Tool Configuration, fixing one broken link and providing more instructions. ### High-Level Design N/A ### Specific Changes - fix broken link to gsm8k_tool_config.yaml - update docs about custom tool config ### API N/A ### Usage Example N/A ### Test N/A ### Additional Info. - Issue Number: #1511 - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-14 21:25:28 +08:00
Mansoor	9b45fc14f7	Skip max_position_embeddings > sequence length check for vLLM rollouts if RoPE scaling is used (#1522 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Allow usage of sequence lengths longer than model's `max_position_embeddings` when RoPE scaling is used. Added documentation on how to override RoPE scaling config for models that support RoPE scaling, but don't have it in its config.json file. ### Specific Changes Skip context length greater than sequence length check for vLLM rollouts if RoPE scaling is used. ### API No API changes ### Usage Example Please see the updated docs for example usage. ### Test I didn't capture any metrics, but I verified this works for my own training run with Qwen/Qwen2.5-7B-Instruct with long contexts. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: This affects vLLM, but I can also update SGLang. I've only tested vLLM for my use case ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-14 19:42:07 +08:00
Xiang Long	258a0d92ed	[metrics] Add diversed reduce metrics method according to key name (#1497 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Add support to reduce max and min with np.max and np.min ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-14 16:12:12 +08:00
Blue Space	43782a24bd	[Doc/Docker Image] Update mcore image to use vLLM which support qwen3 and rewrite installation from conda (#1505 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Update mcore image to use vLLM which support qwen3 and rewrite installation from conda ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes Docker image and docs ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: both - Inference: both ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-14 14:40:13 +08:00
Andrew Zhao	0a4d54551f	Add Absolute Zero to awesome work list (#1514 ) ### What does this PR do? > Just adding Absolute Zero work to the README as a list of work that used veRL ### High-Level Design > just information ### Specific Changes > only changed README.md, added Absolute Zero ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-14 14:38:20 +08:00
Hongpeng Guo	21e3acd6d4	[fix][DataProto] Make classmethod `from_single_dict` return a cls not the class name (#1509 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? `from_single_dict` is a classmethod, right now, it directly returns `DataProto.from_dict(.....)`. This PR changes it to `cls.from_dict(.....)`, In this way any subclass of `DataProto` may reuse this classmethod to instantiate a subclass. In the current implementation, when subclass, i.e., `MyDataProto.from_single_dict()` is called, it returns a parent class instance, i.e., a `DataProto`, but not a `MyDataProto` instance. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-14 09:12:16 +08:00
Hongpeng Guo	d4a11ebb44	[utils] Enrich and fix utils from `fsdp_utils` and `seqlen_balancing` (#1495 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Enrich and fix utility functions in `verl/utils/fsdp_utils.py` and `verl/utils/seqlen_balancing.py`. * In `get_fsdp_wrap_policy`, introduce a unified `_get_attr` helper so both dict‑based (OmegaConf) and dataclass‑style configs can work. * In `rearrange_micro_batches`, add two new parameters (`same_micro_num_in_dp`, `min_num_micro_batch`). * Also re-organized the workflow pipeline structure to make it align better with the verl file structure. ### API In `verl.utils.seqlen_balancing.rearrange_micro_batches`, add two new parameters (`same_micro_num_in_dp`, `min_num_micro_batch`). ### Usage Example ```python # A very toy example dataproto = DataProto.from_single_dict({"input_ids": input_ids, "attention_mask": attention_mask}) micros, idx_map = rearrange_micro_batches(batch, max_token_len=300, same_micro_num_in_dp=False, min_num_micro_batch=2) ``` ### Test * Added in `tests/utils/gpu_tests/test_seqlen_balancing.py` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-13 17:01:16 +08:00
Lumeng Wu	9a956c01b3	[doc] Clarifying `gpu_memory_utilization` for different engines (#1491 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix the outdated description of `gpu_memory_utilization`. Clarify its definition for different engines( vLLM<v0.7.0, vLLM>=v0.7.0, SGLang) ### Additional Info. - Reference: - for vLLM v0.5.4 and v0.6.3: `cb1adda924/verl/third_party/vllm/vllm_v_0_5_4/worker.py (L208)` and `cb1adda924/verl/third_party/vllm/vllm_v_0_6_3/worker.py (L205)` - for vLLM v0.7.0 and later: `d6484ef3c3/vllm/worker/worker.py (L247-L257)`, and https://docs.vllm.ai/en/latest/api/vllm/vllm.config.html#vllm.config.CacheConfig.gpu_memory_utilization - SGLang: `6b8706cd4f/verl/workers/rollout/sglang_rollout/sglang_rollout.py (L176)`, and https://docs.sglang.ai/backend/server_arguments.html#memory-and-scheduling ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-13 10:52:24 +08:00
Hongpeng Guo	033853168a	[refactor][single_controller] Small refactor and fixes in `worker.py` and `ray.base.py` (#1470 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. The existing `WorkerHelper::_get_pid()` returns nothing; this PR fix this nit by returning `os.getpid()`; 2. The `WorkerMeta` class in `worker.py` is only used in `Worker.init()` and no where else. This class just maintains a list of env keys and wraps a dict named `store`. This PR delete this class and move the contents inside the `Worker` class. In this way, it would be easier if a user want to subclass Worker, with different env keys; 3. In `merge_resource_pool` function, instead of directly return a `RayResourcePool`, this PR changes the return type to be `type(rp1)`. In this way, the function can be applied to not only `RayResourcePool`, but also any subclass of `RayResourcePool`. ### Note This PR splits off some small nits and refactors from #1454, so that the small things here could be reviewed and merged sooner before we decide on the structural PR #1454 ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-13 10:27:13 +08:00
Wei Wu	cb1adda924	[Bug] Fix the problem of long inference timeouts when using Async rollout (#1483 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In Async rollout, `AsyncOpenAI` has a default 600-second timeout, which can lead to timeouts during longer inference. See details at https://github.com/volcengine/verl/pull/1138#issuecomment-2869686490. ### High-Level Design See details at https://github.com/volcengine/verl/pull/1138#issuecomment-2869686490. ### Specific Changes See details at https://github.com/volcengine/verl/pull/1138#issuecomment-2869686490. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-12 17:01:50 +08:00
H	c3b20575d2	[util] docs: add docstrings to metric util functions that recipes reuse (#1395 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In `/recipes`, a few functions under `trainer/ppo/metric_utils` are imported and reused. Right now many of them are task dependent and assume specific keys in the input metric dict. To make these functions more robust and backward compatible, a few tests are added. Additionally, one method is moved to verl.utils as a public API due to its general purpose nature. A API doc page is added correspondingly. In order to make it easy for others to customize verl trainers, many more other classes require further documentations, such as: - AdvantageEstimator, RayPPOTrainer, apply_kl_penalty, compute_advantage - from verl.single_controller.ray import RayWorkerGroup - from verl.trainer.ppo.core_algos import agg_loss - from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role, WorkerType - from verl.utils.checkpoint.checkpoint_manager import find_latest_ckpt_path They shall be enhanced in future PRs. ### High-Level Design None ### Specific Changes - added tests - added verl.utils.metric namespace ### API `verl.trainer.ppo.metric_utils.reduce_metrics` changed to `verl.utils.metric.reduce_metrics`. deprecation warnings are added. ### Usage Example None ### Test Added ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. https://github.com/volcengine/verl/issues/1354 - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-12 08:49:14 +08:00
H	f88e2ec4ca	[distro] chore: fix incorrect verl main branch version (#1480 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? The main branch version is still 0.2.x, should have been 0.3.x instead. ### Test Relying on existing tests. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-12 08:48:39 +08:00
Xiang Long	bc9062d74f	[sglang] Fix tool format and response position ids padding in AsyncSGLangRollout (#1475 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Resolved the tool formatting issue: Previously, arguments were stored as strings, causing iterative addition of `\\` due to multiple calls to `json.dumps`. Fixed the `response_position_ids` mismatch between `generate_sequences` and `generate_sequences_with_tools`: In the earlier implementation, `generate_sequences_with_tools` used zero padding for positions where `attention mask == 0`, which resulted in NaN values during the training phase. ### Specific Changes > List the specific changes. - Introduced a new schema, `OpenAIFunctionCallSchema`, to store converted tool calls. - Updated the `AsyncSGLangRollout` tool to skip non-dict type arguments instead of handling any string at the arguments position. - Aligned `response_position_ids` in `generate_sequences_with_tools` with the behavior of `generate_sequences`. - Enhanced tool descriptions to prevent misleading parse errors, as returning 0.0 caused the model to incorrectly modify answers. ### API > Demonstrate how the API changes if any. - Revise the `execute` interface of the tool to directly accept `dict[str, Any]` instead of a JSON string. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-11 08:01:36 -07:00
swtheing	db83855616	[ReadMe] Add Seed Paper Explore Paper Data Scale in ReadMe (#1479 ) ### Checklist Before Starting None ### What does this PR do? Add Seed Paper Explore Data Scale in ReadMe ### High-Level Design Add Seed Paper Explore Data Scale in ReadMe ### Specific Changes Add Seed Paper Explore Data Scale in ReadMe ### API None ### Usage Example None ### Additional Info. None ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-11 20:20:27 +08:00
Yuhua Jiang	f147ede208	[BUG] fix value mask bug in dp_critic (#1440 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. When critic.use_dynamic_size is enabled, values rearrange indices but attention_mask does not, causing values * attention_mask to produce unpredictable bugs. This bug may have affected nearly all previous PPO-based experiments if critic.use_dynamic_size was turned on. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-10 18:29:04 +08:00
H	249c26fdc8	[tests] BREAKING: move recipe.dapo.src to recipe.dapo; move test files to their own namespaces (tests/verl/xxx -> tests/xxx) (#1392 )	2025-05-10 11:21:53 +08:00
Qunhong Zeng	17f283b1e8	[vllm rollout] minor fix: make vllm version determination stronger (#1401 )	2025-05-09 18:11:30 -07:00
H	2d81677ac8	[docs] refactor: use verl consistently in the codebase (#1390 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Always use verl instead of veRL in the codebase, and add a CI check for this. ### Specific Changes mostly doc changes ### Test Added to sanity tests. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. cc @ShaohonChen	2025-05-10 08:54:57 +08:00
Hongpeng Guo	c06b9624b3	[utils] Enrich features to a few `verl` utilities (#1421 )	2025-05-09 16:50:59 -07:00
Ethan (Yusheng) Su	6b8706cd4f	[Hardware] Support AMD (ROCMm Kernel) - hardware-agnostic (remove the redundant code) (#1453 ) ### Checklist Before Starting - [X] Search for similar PR(s): [PR#1369](https://github.com/volcengine/verl/pull/1369), [issue#1488](https://github.com/volcengine/verl/issues/1448) ### What does this PR do? - Complete [issue#1488](https://github.com/volcengine/verl/issues/1448) ### High-Level Design - New PR for hardware-agnostic sglang rollout ### Specific Changes - `verl/workers/rollout/sglang_rollout/async_sglang_rollout.py` - `verl/workers/rollout/sglang_rollout/sglang_rollout.py` > We've already submitted the PR to `ray>=2.45`. Actually, in that version, it's been already supported hardware-agnostic rollout implementation within verl codebase. Just need to assign `HIP_VISIBLE_DEVICES` in the training script. Thus, I discard the patch part that I added last time in verl codebase. ### Usage Example [amd_tutorial](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst) ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst). - [ ] Add CI test(s) if neccessary. --------- Co-authored-by: Yusheng Su <yushensu@pduks-slu000010.amd.com>	2025-05-09 09:22:34 -07:00
H	b2ca3c855f	docs: include SPPO, qwen3, FSDP2 in readme (#1450 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Update news ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-09 09:38:59 +08:00
OC	d3b6c7052e	add pip dependances (#1439 )	2025-05-08 23:35:55 +08:00
Xiang Long	325c028ad2	[sglang] Fix data preprocess mismatch in sgl_multiturn example (#1445 )	2025-05-08 23:14:23 +08:00
Joel	f90b717653	[ray] fix: make spawn worker group hold strong reference to actors (#1443 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Spawned RayWorkerGroup get actors by name, which holds a weak reference to the actor and causes actors garbage collected unexpectedly. Pass actor handle explicitly in spawn to make RayWorkerGroup have strong reference to these actors. close #1365 https://github.com/volcengine/verl/pull/1138#issuecomment-2862087324 ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-08 23:08:36 +08:00
Rihong Qiu	c59ab2f478	[BUG] fix swanlab init bug when config is None (#1441 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. When user choose swanlab logger and not set config, original code `config={"FRAMEWORK": "verl", config}` would raise error. This PR try to fix this by init config as an empty dict if it is None ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ``` if config is None: config = {} # make sure config is not None, otherwise config will raise error swanlab.init( project=project_name, experiment_name=experiment_name, config={"FRAMEWORK": "verl", config}, # this is the cause of error when config is None logdir=SWANLAB_LOG_DIR, mode=SWANLAB_MODE, ) ``` ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-08 18:14:57 +08:00
Qunhong Zeng	4ae9a0fdab	[rollout] fix: missing trust_remote_code option in rollout initialization (#1423 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add missing `trust_remote_code` option in customized vllm rollout and sglang rollout, fix https://github.com/volcengine/verl/issues/1412. ### Additional Info. - Issue Number: https://github.com/volcengine/verl/issues/1412 - Training: FSDP - Inference: both ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-08 14:10:53 +08:00
wangfuchun-fc	8a158a50a6	feat: add qwen3 grpo example (#1435 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Tested successfully on the hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.4-flashinfer0.2.2-cxx11abi0 image. It outperforms the Qwen2 7B base model by two percentage points on the test set of GSM8K. <img width="786" alt="image" src="https://github.com/user-attachments/assets/a753a383-5fc0-42a8-92a8-be4f8eddec60" /> > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-08 14:10:32 +08:00
Hongpeng Guo	8cac3f8efe	[single_controller][decorator] Define a `DynamicEnum` class to make `Dispatch` and `Execute` extensible. (#1424 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Today, extending `verl` in proprietary usage large requires forking it, and padding code changes in the private fork. For example, the current `verl` API doesn't support adding new `Dispatch` and `Execute` mode in runtime. The only way to achieve it is to make a new private fork. This PR replace the static `Enum` type of `Dispatch` and `Execute` into a new `"DynamicEnum"` type, that the users can use new APIs `register_dispatch_mode` and `update_dispatch_mode` to adding and define new distributed mode at runtime using native `verl` API, instead of making a fork. ### Specific Changes * Defined `DynamicEnum` class in `utils.py_functional.py`; * Re-defined `Dispatch` and `Execute` classes, all existing Enum API and usage are still usbale; * Added `register_dispatch_mode` and `update_dispatch_mode` for users to register new dispatch modes at runtime; * nit: `pre-commit` automatically fixed part of code format in another PR #1331 ### Usage Example > Provide usage example(s) for easier usage. ```python def test_register_new_dispatch_mode(): # Test registration def dummy_dispatch(worker_group, args, *kwargs): return args, kwargs def dummy_collect(worker_group, output): return output register_dispatch_mode("TEST_MODE", dummy_dispatch, dummy_collect) # Verify enum extension _check_dispatch_mode(Dispatch.TEST_MODE) # Verify registry update assert get_predefined_dispatch_fn(Dispatch.TEST_MODE) == {"dispatch_fn": dummy_dispatch, "collect_fn": dummy_collect} ``` ### Test Added `tests/verl/test_decorator.py` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-08 12:09:41 +08:00
Xiang Long	5acd5cab11	[sglang] fix format issue and data_preprocess file path issue in sglang multiturn example README.md (#1437 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Fix format issue and data_preprocess file path in sglang multiturn example README.md ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-08 11:37:46 +08:00
BearBiscuit	312a8cbceb	[SGLang] Add support between mcore0.11 and sglang (#1055 ) Based on the ongoing alignment between mcore and vllm #851 , I believe we can simultaneously advance the alignment between mcore and sglang, as their interfaces are similar. In the end, we will only need to obtain a generator parameter. [link](https://github.com/sgl-project/sglang/pull/5345)	2025-05-07 08:57:03 -07:00
Qunhong Zeng	8d3631168f	docs: update config documentation with validation parameters (#1355 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR update some outdated docs on config: - Add `filter_overlong_prompts_workers` configuration option, which introduced in #890 - Add documentation for `actor_rollout_ref.rollout.val_kwargs` parameters, fix #1352 - Fix attribution of several configuration options to their proper namespaces ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-07 22:38:36 +08:00
Qunhong Zeng	ba6a2e0bb5	[FSDPCheckpointManager] feat: save huggingface model when 'hf_model' in checkpoint_contents (#1288 ) Before, `FSDPCheckpointManager` will not save hf model when `hf_model` is given in `checkpoint_contents`, instead, it only save the hf model's config. This PR correctly save the huggingface model when 'hf_model' is in `checkpoint_contents`.	2025-05-07 20:44:46 +08:00
ShareLer	fd3f21cb0e	[megatron] qwen3 support (#1337 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support qwen3 to run with megatron backend. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Update offline weight convert script(from hf to megatron) for qwen3. - Add config converter from hf config to mcore config for qwen3. - Add qk_layernorm weight load logic in mcore loader for qwen3(dense). - Add model initializer and forward func for qwen3(moe). - Add online weight converter from mcore to hf for qwen3. - Fix typo in megatron CriticWorker.update_critic. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```bash # example for qwen3-8B HF_MODEL_PATH="Your hf ckpt path" DIST_CKPT_PATH="Your mcore ckpt path" # convert ckpt from hf to megatron python3 scripts/converter_hf_to_mcore.py --hf_model_path $HF_MODEL_PATH --output_path $DIST_CKPT_PATH NODES=1 N_PER_NODE=8 PP=1 TP=8 CP=1 VLLM_TP=8 python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\ algorithm.adv_estimator=gae \ data.train_files="$train_files" \ data.val_files="$test_files" \ data.train_batch_size=64 \ data.max_prompt_length=1024 \ data.max_response_length=2048 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=$HF_MODEL_PATH \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.actor.megatron.tensor_model_parallel_size=$TP \ actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=$PP \ actor_rollout_ref.actor.megatron.context_parallel_size=$CP \ actor_rollout_ref.actor.megatron.use_dist_checkpointing=True \ actor_rollout_ref.actor.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \ actor_rollout_ref.actor.megatron.param_offload=True \ actor_rollout_ref.actor.megatron.grad_offload=True \ actor_rollout_ref.actor.megatron.optimizer_offload=True \ actor_rollout_ref.ref.megatron.tensor_model_parallel_size=$TP \ actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=$PP \ actor_rollout_ref.ref.megatron.context_parallel_size=$CP \ actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \ actor_rollout_ref.ref.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \ actor_rollout_ref.ref.megatron.param_offload=True \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \ actor_rollout_ref.rollout.tensor_model_parallel_size=$VLLM_TP \ critic.optim.lr=1e-5 \ critic.model.path=$HF_MODEL_PATH \ critic.model.enable_gradient_checkpointing=False \ critic.ppo_micro_batch_size_per_gpu=4 \ critic.megatron.tensor_model_parallel_size=$TP \ critic.megatron.pipeline_model_parallel_size=$PP \ critic.megatron.context_parallel_size=$CP \ critic.megatron.use_dist_checkpointing=True \ critic.megatron.dist_checkpointing_path=$DIST_CKPT_PATH \ critic.megatron.param_offload=True \ critic.megatron.grad_offload=True \ critic.megatron.optimizer_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='verl_gsm8k_qwen3-8B' \ trainer.experiment_name='qwen3_8b_gsm8k_gae_megatron' \ trainer.n_gpus_per_node=$N_PER_NODE \ trainer.nnodes=$NODES \ trainer.save_freq=50 \ trainer.test_freq=10 \ trainer.total_epochs=100 $@ ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>	2025-05-07 20:41:41 +08:00
none0663	a43ead6f82	Fix for RM Data Attention Mask Bug (#1411 ) [BUG] This issue addresses the bug related to the RM data attention mask, which was also mentioned in a previous [issue](https://github.com/volcengine/verl/issues/1341). The fix has been implemented to ensure proper functionality.	2025-05-07 14:53:57 +08:00
Shawn/Yuxuan Tong	d6e1c6e3c2	[Metric] fix: boostrap with `n == n_resps` since with replacement (#1419 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes https://github.com/volcengine/verl/pull/1320 since bootstrapping is done with replacement, which makes it still meaningful even when `n == n_resps` ### Additional Info. - Issue Number: https://github.com/volcengine/verl/pull/1320 - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-07 13:22:38 +08:00
Blue Space	c05f6c26b6	Qwen2moe[part1]: add cpu converter option, add CI test for current solutions temporarily (#1267 ) Temporarily use CPU to initialize larger models for huggingface to dist_ckpt conversion. And Support GQA Moe model. May not require CI as this function can be dependent to VeRL, but current solution may need.	2025-05-07 13:11:02 +08:00
Ethan (Yusheng) Su	76084d36cb	[AMD] upgrade: Upgrade dockerfile and verl codebase (#1369 ) ## Checklist Before Starting - [x] Search for similar PR(s). ## What does this PR do? 1. Base Docker Image: Upgraded the base sglang docker to `lmsysorg/sglang:v0.4.6.post1-rocm630` along with `torch_memory_saver (hip version)`, which resolves the ROCm/aiter compatibility [issue](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/amd-verl-dev/dev.md). 2. vLLM-0.6.3 Rollout Fix: Adjusted the rollout logic to ensure the latest VeRL upstream codebase remains both compatible with `vLLM versions ≤ 0.6.3`, along with sync mechanism, and `vLLM versions >= 0.6.3`, along with async mechanism. 3. Update the ray version to [2.45.0](https://github.com/ray-project/ray/releases/tag/ray-2.45.0): [PR#52794](https://github.com/ray-project/ray/pull/52794) and also support `ray>=2.45.0` within verl - resolve [verl-issues#1399](https://github.com/volcengine/verl/issues/1399). - [To-do-1] 3rd party lib - `torch_memory_saver` - rocm virtual memory allocator issue should be resolved within the [HIP version](https://github.com/fzyzcjy/torch_memory_saver/issues/9). - [To-do-2] New PR for hardware-agnostic vllm/sglang rollout. ## Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting) - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Co-authored-by: Yusheng Su <yushensu@pduks-slu000010.amd.com>	2025-05-06 18:06:05 -07:00
OC	3a7376acfe	fix: ray worker exit with SYSTEM_ERROR caused by SIGALRM from math re… (#1331 ) …ward Since SIGALRM only works in main thread, if it is fired in a sub thread, the ray worker will exit with SYSTEM_ERROR. Fixed this problem by using multiprocessing.Process instead of SIGALRM handling. # What does this PR do? bug fix for ray worke exit with SYSTEM_ERROR when timeout in prime math # ChangeLog: Fixed this problem by using multiprocessing.Process instead of SIGALRM handling. # Usage - see tests/utility/test_timeout_decorator.py	2025-05-07 01:17:52 +08:00
Chayenne	6ae2de6195	Update guidance of sppo (#1415 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Just add a line of code to git clone verl. I do not know why this is missed. lol > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-06 08:49:02 -07:00
Changlong Yu	78c8b2711e	[megatron] support mixtral training with megatron backend (#1325 ) # What does this PR do? Add support of Mixtral MOE model training with Megatron backend including ``Mixtral8x7B`` and ``Mixtral8X22B``. # ChangeLog: It is still labor-heavy to add new type of model to ``mcore`` format including the following changes: - ``hf_to_mcore_config_mixtral``: convert `hf_config` to `TransformerConfig`. some common configs are merged into one function `_get_base_transformer_config`. - ``MixtralModel`` in model_initialzier.py: implement a model initializer class to initialize GPTModel from config. - `McoreToHFWeightConverterMixtral`: model conversion class from mcore to huggingface basically rename - model entry in `registry.py`: add entry function or class in corresponding registries. # Usage - convert [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/) to mcore format ([converted](https://huggingface.co/clyu/Mixtral-8x7B-Instruct-v0.1-mcore/tree/main)) - Run RLOO script as follows: ```bash set -x train_files=$gsm8k_train_path test_files=$gsm8k_test_path export MEGATRON_MODEL="Mixtral-8x7B-Instruct-v0.1-mcore" python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer' \ algorithm.adv_estimator=rloo \ data.train_files=$train_files \ data.val_files=$test_files \ data.train_batch_size=128 \ data.truncation="left" \ data.max_prompt_length=512 \ data.max_response_length=4096 \ actor_rollout_ref.model.path=Mixtral-8x7B-Instruct-v0.1 \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.ppo_mini_batch_size=128 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.actor.megatron.tensor_model_parallel_size=8 \ actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=2 \ actor_rollout_ref.actor.megatron.use_dist_checkpointing=True \ actor_rollout_ref.actor.megatron.dist_checkpointing_path=${MEGATRON_MODEL} \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8 \ actor_rollout_ref.rollout.max_num_batched_tokens=8192 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.rollout.n=4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=128 \ actor_rollout_ref.ref.megatron.tensor_model_parallel_size=8 \ actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=2 \ actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \ actor_rollout_ref.ref.megatron.dist_checkpointing_path=${MEGATRON_MODEL} \ algorithm.use_kl_in_reward=True \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.critic_warmup=0 \ trainer.val_before_train=True \ trainer.logger=['console','wandb'] \ trainer.log_val_generations=100 \ trainer.project_name='verl_gsm8k_test' \ trainer.experiment_name='mixtral-8x7b-rloo-gsm8k' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=2 \ trainer.save_freq=50 \ trainer.test_freq=10 \ trainer.total_epochs=15 ``` # What is Missing - refactor hf2mcore conversion scripts as https://github.com/volcengine/verl/pull/1267 - Have a good design of onboarding new model class to avoid labor-intensive changes. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [ ] Did you write any test cases if necessary? Please add CI tests to your new feature. # Additional Info: - Issue Number: None - Training: Megatron - Inference: None --------- Co-authored-by: changlyu <changlyu@ip-10-0-53-184.us-west-2.compute.internal>	2025-05-06 22:38:48 +08:00
BearBiscuit	d60499d170	[misc] add support for qwen3 model (dense/moe) (#1409 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > add mfu compute function for qwen3 model ### Additional Info. - Issue Number: Fixes issue #1313 ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).	2025-05-06 19:45:17 +08:00
Qingquan Song	dd591e8588	docs: Fix readme ppo.rst (#1413 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. Fix readme NVIDIA GPU Results ### Specific Changes Mark down fix > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. Before Fix ![Screenshot 2025-05-05 at 9 47 13 PM](https://github.com/user-attachments/assets/487c6aa4-999f-42af-b47f-e03555d83232) After Fix ![Screenshot 2025-05-05 at 9 47 05 PM](https://github.com/user-attachments/assets/c9db6bd1-5e1c-4614-b82e-7ba74c53dc37) ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-05 22:11:13 -07:00
Shawn/Yuxuan Tong	8bb009bf47	[CI] feat: separate FSDP2 test & fix: CI trigger (#1389 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Separate the FSDP2 test to avoid blocking other tests. 2. Fix the CI trigger rule to avoid redundant runs (since I find the original PR triggers unrelated tests, so I fix the rule based on [the doc](https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore)) ### Test For 2, I test by commenting out the matching path for workflow `.yml`, and see only related workflows are triggered: Before: <img width="870" alt="image" src="https://github.com/user-attachments/assets/2f7dbe0c-f638-4a75-8cbc-a364081271fc" /> After: <img width="869" alt="image" src="https://github.com/user-attachments/assets/f5a35d85-f03c-452e-abed-3ca3ce22d699" /> ### Additional Info. - Issue Number: https://github.com/volcengine/verl/issues/1388 - Training: FSDP - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-05 07:20:35 -07:00
yhyang201	ee8c34749d	[recipe] sppo: SPPO algorithm implementation (#1222 ) Here is a version of the SPPO algorithm implementation. You can find more information about SPPO here: [https://github.com/uclaml/SPPO/tree/main](https://github.com/uclaml/SPPO/tree/main) In short, the main differences between SPPO and PPO are: 1. There is no need to use a critic model. 2. SoftMean is used as the AdvantageEstimator in the trainer. 3. Different loss functions. I have made an attempt to implement minimal modifications without altering the code outside the recipe. However, due to the following two issues, the current code is not entirely elegant: 1. To modify the loss function (including both the loss itself and the parameters passed in), it is sufficient to modify the `update_policy` in the `DataParallelSPPOActor`. I attempted to patch this `update_policy`, but it was unsuccessful. Therefore, I created a class that inherits from `DataParallelSPPOActor` to override the `update_policy`. 2. Since `ActorRolloutRefWorker` imports `DataParallelSPPOActor` in the `init_model` function, it also needs to inherit from `ActorRolloutRefWorker` to override `init_model`. However, I encountered an issue during inheritance. The base class of `verl`’s `single controller` calls `actor_rolloutrefworker.super().__init__()`, and the original `super()` in `ActorRolloutRefWorker` is `Worker`. If we inherit, it would become `ActorRolloutRefWorker`, which requires passing parameters to `super().__init__()`, but the `single controller base` code does not provide any parameters, making inheritance impossible. I have now submitted a draft PR and would appreciate any suggestions on code modifications or optimizations! --------- Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-05-05 09:55:49 +08:00
Franz Srambical	91fa2a6b94	[docs] fix: typo (#1391 )	2025-05-04 12:15:07 -07:00
Junrong Lin	ec6843c604	[sglang] Upgrade sglang to 0.4.6.post1 & misc fixes (#1385 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default https://github.com/sgl-project/sglang/pull/5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch #1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-04 11:53:21 -07:00
Shawn/Yuxuan Tong	709796f849	[dev] fix: validation metrics (#1374 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Fix the error that `metric` is not added when `n == 1`. 2. Remove `std@1`. 3. Add assertation for doing initial validation but `val_metrics` is empty. ### Additional Info. - Issue Number: none - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-04 09:06:53 -07:00
Joel	1e47e412a4	[rollout] misc: add demo chat completion scheduler described in ReTool paper (#1297 ) Co-authored-by: shengguangming <shengguangming@bytedance.com>	2025-05-04 19:07:22 +08:00
Hongpeng Guo	96b46d2661	[feat] Enable `update_model_config` to take nested dict to update `AutoConfig` of transformers (#1379 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? * Enable `update_model_config` to take nested dict to update `AutoConfig` of transformers * Added a test pipeline for all the tests under `tests/utils`, Any future unit tests for `verl/utils` should be added here * Re-organized the tests file structure. ### Usage Example For the new `update_model_config`, an example looks like below: ```python override_config_kwargs = { "bos_token_id": self.tokenizer.bos_token_id, ... "nested_config": {k1: v1, k2, v2}, } update_model_config(actor_model_config, override_config_kwargs=override_config_kwargs) ``` ### Test Added `tests/verl/utils/test_model.py::test_update_model_config` ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-04 18:07:09 +08:00
Hongpeng Guo	dfb3f70bc5	[fix][ci] fix two pipelines that fails on the main branch (#1378 )	2025-05-04 08:02:08 +08:00
Hongpeng Guo	9e4074b71a	[ci][fix] Enable part of ray test to be run on CPU machine (#1372 )	2025-05-03 18:23:33 +08:00
HL	52437be1a6	[trainer] breaking: pass dataset as required args to SFTTrainer; also change ppo ray trainer to take custom datasets as inputs (#1282 )	2025-05-02 21:03:22 -07:00
Jinn	cee3dca867	docs: Add runllm widget for VeRL Doc sites (#1366 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Add runllm widget for https://app.readthedocs.org/projects/verl/ ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.	2025-05-02 16:28:45 -07:00
Hongpeng Guo	78abf052e8	[ray] feat: Making decorator register available for async function (#1370 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR enables the decorators to be able to be applied onto async functions. ### High-Level Design * Simply added a inner wrapper function available for async func inside the `register` function. ### Usage Example ```python @register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False) async def async_fn(self, sleep_time): return await asyncio.sleep(sleep_time * 0.1) ``` ### Test * `tests/ray/test_decorator.py` ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>	2025-05-02 16:25:14 -07:00
Hongpeng Guo	0035afee9c	[dataproto] feat: Add auto padding for DataProto (#1356 ) ### Checklist Before Starting - [x] Search for similar PR(s). Coming from #577 , credit to @zw0610 ### What does this PR do? Today, users must manually duplicate (repeat) a DataProto so its batch size matches the data‑parallel (dp) size of the target WorkerGroup. This PR enables `auto_padding` to pad the `DataProto` when chunk is called. ### Specific Changes * Enriched the `DataProto` so that it can have context of padding during chunking; * Modified the `decorator.py` that a DataProto can be automatically padded and chunked with `dispatch_dp_compute_data_proto`; * Added unit tests under `tests/ray/test_auto_padding.py`. ### API Two new API under `DataProto` are introduced, which are `padding` and `is_padding_enabled` ### Test Tests added to `tests/ray/test_auto_padding.py` ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: Wang Zhang <zhangwang.nozomi@bytedance.com> Co-authored-by: Wang Zhang <zw199006@gmail.com>	2025-05-02 10:21:27 -07:00
Earl St Sauver	52d8ae3179	[docs] fix: Fix Arxiv Link (#1364 ) Arxiv link is not rendering on github or https://verl.readthedocs.io/en/latest/index.html# ### Checklist Before Starting - [x ] Search for similar PR(s). ### What does this PR do? Makes external link to arxiv paper resolve properly. ### High-Level Design N/A ### Specific Changes Single line doc change ### API N/A ### Usage Example N/A ### Test N/A ### Additional Info. ### Checklist Before Submitting All N/A	2025-05-02 10:04:29 -07:00
lxg2015	db84a40076	[fsdp] feat: support fsdp2 training and inference in fsdp_workers (#1026 ) # What does this PR do? This PR supports fsdp2 for fsdp_worker. Torch version 2.4 or higher is required. # Usage Example ``` sh examples/grpo_trainer/run_qwen2-7b.sh \ actor_rollout_ref.ref.strategy=fsdp2 \ actor_rollout_ref.actor.strategy=fsdp2 ``` To save more memory, you can add the parameter below to enable the fsdp2 OffloadPolicy: ``` actor_rollout_ref.actor.offload_policy=True ``` You can see the profile comparison between fsdp1 and fsdp2 here: https://github.com/volcengine/verl/pull/1026#issuecomment-2824343860 --------- Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com> Co-authored-by: shengguangming <shengguangming@bytedance.com>	2025-05-02 21:03:57 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	3f41534ad2	[installation] doc: Fix pip install instructions (#1353 ) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? There should be no space between `.` and `[vllm]` or `[sglang]`, or it will result in error: ```logs ERROR: Invalid requirement: '[vllm]': Expected package name at the start of dependency specifier [vllm] ``` In addition, I rewrite this part to make the instructions more clear (as `.. or ..` can't be executed by bash directly) ### Additional Info. - Issue Number: none - Training: none - Inference: none ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] Add CI test(s) if neccessary. Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-05-01 15:30:11 -07:00
Franz Srambical	335a79da72	[docs] fix: typo (#1351 )	2025-05-01 11:20:04 -07:00
Shawn/Yuxuan Tong	ed498f9fa5	[recipe] feat: latest reproduction of DAPO (#1336 ) # What does this PR do? This PR updates the latest reproduction results of DAPO. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - Issue Number: none - Training: none - Inference: none	2025-05-01 12:03:46 +08:00
Shawn/Yuxuan Tong	0e50afc363	[dev] feat: improve PR template (#1343 ) This PR tries to imporve the PR template itself.	2025-05-01 12:02:36 +08:00
Patrik Bartak	856f902b46	[FIX] metric_utils log best, worst, maj only for n_resps > 1 (#1248 ) Solves #1249 Instead of logging best@1/mean and worst@1/mean, which is identical to mean@1, just do not log it when there is only one validation response per prompt (`n_resps == 1`). Same applies to std. Otherwise we get many duplicated plots that show the same thing. The only change is the addition of the `if n_resps > 1:` statement.	2025-05-01 05:11:34 +08:00
Tianyun Zhao	c9787146e2	[test] fix: test arithmetic_sequence failed to run (#1333 ) # What does this PR do? e2e test `arithmetic_sequence` is currently broken, with error `TypeError: not a string` thrown on code `tokenizer = AutoTokenizer.from_pretrained(local_path)` when running `tests/e2e/run_ray_trainer.sh`. This PR aims to fix it. In the `arithmetic_sequence` task, `tests.e2e.envs.digit_completion` module was imported in the beginning but not used. This import seems meaningless. However, when this library is imported, `AutoTokenizer.register()` will be called to set configurations for `AutoTokenizer`. Only after that can `AutoTokenizer` be successfully initialized in test code to perform subsequent tasks. ## Timeline - In #934 , to improve CI efficiency, the CI corresponding to `arithmetic_sequence` was removed. - In #1010 , according to the `unused_import` rule, this import was deleted, triggering the bug. # ChangeLog - `AutoTokenizer.register` was added explicitly, which ensures the configurations were set before initialization of `AutoTokenizer`. # Usage - the original code `tests/e2e/run_ray_trainer.sh` is available for tests. ```python bash tests/e2e/run_ray_trainer.sh ``` ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - Issue Number: none - Training: none - Inference: none	2025-04-30 19:46:07 +08:00
Tianyun Zhao	1d66de22e9	[feat] add FusedWorker (#1278 ) on behalf of @zw0610 FusedWorker is designed to enhance the ability of colocated workers. FusedWorker keeps most of the interfaces as colocated workers: Users shall use `create_colocated_worker_cls_fused` to create colocated worker class, use `spawn` to split FusedWorker to dict of workers. In colocated workers, access the methods of child workers is done by using `spawn` then access via worker dict or calling `{worker_group}.{worker}_{method}`. In FusedWorker, the first method was preserved, while the latter was change to a new way: First use `{worker_group}.fuse(prefixes)` to bind workers to the worker group, then use `{worker_group}.{worker}.foo()` to access child workers.	2025-04-30 17:29:19 +08:00
BearBiscuit	d7c3d127ca	[doc] fix dataset path for gsm8k and url error (#1327 ) # What does this PR do? fix dataset path for gsm8k and some url error. # ChangeLog: change the readme file to fix gsm8k download path. # Usage - You can add one use example below. ```python # Add code snippet or script demonstrating how to use this ``` - For algorithm implementation and new model support, you can add training curve plots and evaluatuion results below. ## Before submitting - [ ] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [ ] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [ ] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]	2025-04-30 15:18:58 +08:00
HL	940caadf72	docs: add community blogs and fix link rendering (#1324 ) # What does this PR do? Add one-line overview of what this PR aims to achieve or accomplish. # ChangeLog: - Add two reference blogs to README # Usage None ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [] Did you write any test cases if neccessary? No tests needed	2025-04-30 09:46:04 +08:00
Mert Unsal	6d58ca6ea0	cancel bootstrapping for n=n_samples (#1320 ) # What does this PR do? The validation metrics currently bootstraps its estimates by randomly sampling 1,2,4,8,16,...,n_samples results out of n_samples results. However, this bootstrapping doesn't make sense for `n=n_samples` as you cannot have more information about the estimate for `pass@n_samples` if you only have `n_samples` samples. This results in weird results when doing RL with only one problem in the validation set (best@N is a value between 0 and 1 instead of 0 or 1) This PR turns off bootstrapping for n=n_samples case and leaves rest of the computations the same.	2025-04-30 09:45:14 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	015db832dc	[fix] Remove grad_offload in rloo example script (#1323 ) # What does this PR do? `grad_offload` option was removed in #284 for fsdp backend, current script will error out due to this. # ChangeLog: - Remove grad_offload in rloo example script # Usage - Run the changed script ## Before submitting - [X] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [X] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [X] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - Issue Number: N/A - Training: FSDP - Inference: None Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-04-30 08:54:29 +08:00
Xiang Long	e0d035cd4a	[sglang] feat: Add SGLang async multi-turn rollout with tool support (#1037 ) A redesigned version of #917 ## Current Status [Develop log & Tracker](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/113) What Has Been Done - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. What Is WIP - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. Future Plan [Discussion Thread](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/74#issuecomment-2763192625) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>	2025-04-29 13:20:06 -07:00
Blue Space	0234d8e3ab	fix reward model and add CI test (#1252 ) Fix bugs related to #1165 . Megatron backend reward model has no CI test, add to current ppo trainer. Fix `micro_batch_size_per_gpu` but not sure whether it is right for reward config. The output format is also not right with current `forward_micro_batch` implementation.	2025-04-29 21:20:21 +08:00
BearBiscuit	7299763c06	[vllm] add moe patch for qwen3-moe (#1316 ) # What does this PR do? Add moe patch for qwen3-moe. Fix the weight loader issue in vLLM MoE models. This isn’t a permanent solution, and we may need to contribute code to vLLM to address the problem caused by FusedMoE. I’m already seeking suggestions for this. # ChangeLog: - Add Qwen3MoeForCausalLM class for moe_patch	2025-04-29 21:18:45 +08:00
Shawn/Yuxuan Tong	93d2ed5ee8	fix: catch any error in math reward function (#1312 ) # What does this PR do? This PR fixes collapse in the math reward function by catch any possible errors. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info: - Issue Number: None - Training: None - Inference: None	2025-04-29 18:58:17 +08:00
Changlong Yu	1e75fc04b5	[docs] add pr template (#1287 ) # What does this PR do? add the PR template to improve the readability of PR. ## Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [ ] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [ ] Did you write any test cases if neccessary? Please add CI tests to your new feature.	2025-04-29 15:20:39 +08:00
HL	1c66aab162	docs: add DeepWiki and ICLR links (#1283 )	2025-04-29 13:52:42 +08:00
BearBiscuit	1f3cbfcf19	[doc] add the multi modal doc (#1292 ) ## Motivation There is currently no docs support for multimodal task on verl, so I think we need to add a related document.	2025-04-29 13:44:38 +08:00
HL	958eae3523	[example] chore: remove verl_getting_started.ipynb (#1281 ) remove the out-dated notebook	2025-04-29 10:55:27 +08:00
Shawn/Yuxuan Tong	f9dae2bb11	[CI] feat: only check changed files (#1294 )	2025-04-28 20:24:41 +08:00
Mert Unsal	ba38413aa5	Option to make model private when pushing to hub, pushing the tokenizer for convenience (#1259 ) Very small changes to `model_merger.py` so that tokenizer is pushed to hub and model can be pushed privately.	2025-04-28 20:17:42 +08:00
Qunhong Zeng	ea4cd31987	[merger] fix: merged generation config is inconsistent with hf pre-trained model (#1277 ) `afeac9a023/scripts/model_merger.py (L195-L200)` Model created by `from_config` won't load the `generation_config.json` from `args.hf_model_path`, instead it create a generation config separately. This inconsistency will lead to strange generating error when user using vllm/hf rollout without carefully override sampling_params/generation_config, see issue here: https://github.com/volcengine/verl/issues/1246 This PR introduce a `patch_model_generation_config` function which patch the model from config to correctly use the pretrained generation config. Fix https://github.com/volcengine/verl/issues/1246.	2025-04-28 09:23:19 +08:00
Hunter Zhang	1971133d23	[doc] fix: fix 2 minor issues in installation and reward explanation (#1215 ) close - #1214 - #1213 Co-authored-by: HL <linhaibin.eric@gmail.com>	2025-04-27 15:40:17 -07:00
pengsun	b75c4e16d6	[logging] fix: typo of fsdp_checkpoint_manager saving optim path (#1276 ) fix a minor typo of printing optim saving path in fsdp_checkpoint_manager.py	2025-04-27 15:30:30 -07:00
Shawn/Yuxuan Tong	8e5ad4688a	[Lint] fix: linting errors in all files (#1280 ) This PR enables checking on all files after fixing all the errors: ``` examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120) examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120) examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120) examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120) examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120) examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120) examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120) examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120) recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120) recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120) recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120) recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120) recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120) recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120) recipe/r1/data_process.py:131:9: E722 Do not use bare `except` recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except` scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found scripts/model_merger.py:42:121: E501 Line too long (184 > 120) scripts/model_merger.py:146:13: E722 Do not use bare `except` tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120) tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120) tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20 tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120) tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18 tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120) tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120) tests/sanity/check_license.py:22:1: E402 Module level import not at top of file tests/sanity/check_license.py:23:1: E402 Module level import not at top of file tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/__init__.py:22:1: E402 Module level import not at top of file verl/__init__.py:24:1: E402 Module level import not at top of file verl/__init__.py:25:1: E402 Module level import not at top of file verl/__init__.py:29:1: E402 Module level import not at top of file verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120) verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120) verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120) verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120) verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120) verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120) verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120) verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120) verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120) verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120) verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120) verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120) verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120) verl/protocol.py:303:121: E501 Line too long (125 > 120) verl/protocol.py:352:121: E501 Line too long (171 > 120) verl/protocol.py:578:121: E501 Line too long (142 > 120) verl/protocol.py:580:121: E501 Line too long (150 > 120) verl/protocol.py:583:121: E501 Line too long (167 > 120) verl/protocol.py:715:1: E402 Module level import not at top of file verl/protocol.py:725:121: E501 Line too long (121 > 120) verl/protocol.py:766:1: E402 Module level import not at top of file verl/protocol.py:768:1: E402 Module level import not at top of file verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120) verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120) verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+` verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120) verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120) verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120) verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29 verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120) verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120) verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120) verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82 verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120) verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120) verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120) verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key` verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key` verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120) verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120) verl/utils/debug/profile.py:15:1: I001 [] Import block is un-sorted or un-formatted verl/utils/debug/profile.py:19:15: UP039 [] Unnecessary parentheses after class definition verl/utils/debug/profile.py:50:23: F541 [] f-string without any placeholders verl/utils/debug/profile.py:52:49: F541 [] f-string without any placeholders verl/utils/debug/profile.py:53:47: F541 [] f-string without any placeholders verl/utils/debug/profile.py:54:67: F541 [] f-string without any placeholders verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120) verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120) verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120) verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120) verl/utils/megatron_utils.py:17:1: I001 [] Import block is un-sorted or un-formatted verl/utils/megatron_utils.py:22:20: F401 [] `torch.nn` imported but unused verl/utils/megatron_utils.py:34:38: F401 [] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused verl/utils/megatron_utils.py:332:30: B009 [] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/megatron_utils.py:366:27: B009 [] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/model.py:464:121: E501 Line too long (124 > 120) verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120) verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120) verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120) verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120) verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120) verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120) verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120) verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120) verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120) verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120) verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120) verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120) verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120) verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l` verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used verl/workers/actor/megatron_actor.py:22:1: I001 [] Import block is un-sorted or un-formatted verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120) verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120) verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120) verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120) verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120) verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120) verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120) verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120) verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120) verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120) verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120) verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120) verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120) verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used Found 305 errors. ``` --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-04-27 15:24:30 -07:00
gzpan	0fb0bedb7f	[profile] print cuda system memory and offload actor model after init (#1118 ) Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>	2025-04-28 02:11:38 +08:00
Joel	cea529116f	feat: move AsyncLLM ChatCompletionScheduler to separate thread (#1274 ) Move AsyncLLM ChatCompletionScheduler to separate thread to avoid making PPOTrainer async class.	2025-04-27 22:02:52 +08:00
runluo	cb6fc3951d	Adding GUI-R1 to the Awesome work (#1275 )	2025-04-27 22:00:51 +08:00
BearBiscuit	afeac9a023	[misc] add offload and profile doc, add validate in profile (#1272 )	2025-04-27 17:12:14 +08:00
Shawn/Yuxuan Tong	fbb93e44b1	[CI] feat: only test for push to main (#1271 )	2025-04-27 09:51:09 +08:00
BearBiscuit	cc8fca504d	[mcore] add offload param and opt function for magetron (#1162 ) ## Motivation This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress. ## TODO - [x] Accuracy testing	2025-04-27 02:03:34 +08:00
BearBiscuit	85a9b09d85	[profile] add profile for megatron train (#1146 ) ## Motivation This is a new feature that adds the functionality of collecting profiles during the training phase. Since the RL process repeatedly enters the training process, by default, the profile temporarily captures the results of the first `update_policy`. Moreover, this modification should be seamlessly integrated into other training frameworks.	2025-04-27 01:59:32 +08:00
Dai, Weinan	64056835b9	[bugfix] fix: add `await` for `_validate()` (#1269 ) As titled.	2025-04-26 20:32:46 +08:00
Qunhong Zeng	281ed3a41a	[rollout] feat: support rollout.n > 1 in hf_rollout (#1199 ) Currently, the hf rollout backend only support `rollout.n == 1`, when `rollout.n > 1` it will lead to an error (https://github.com/volcengine/verl/issues/1134) This PR make hf rollout support `do_sample` and `is_validate` to make it consistent with vllm and sglang backend, and correctly support `rollout.n > 1`.	2025-04-25 15:03:22 -07:00
湛露先生	5c3802687f	distro: clean req packages. (#1253 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-25 07:14:00 -07:00
Chen Jie	af15ae12f5	fix: Correct sampling params setting in sglang evaluation (#1181 ) This PR fixes an issue where parameters in `val_kwargs` are not effectively passed during sglang evaluation when `do_sample=True` is set. Additionally, since the validation data has already been repeated in `ray_trainer`, the `n` parameter in `sampling_params` needs to be correctly configured to prevent errors caused by dimension mismatches.	2025-04-25 20:53:54 +08:00
wangfuchun-fc	e8cd4196e3	fix: remove deprecated remove_previous_ckpt key in prime_ray_trainer.py (#1254 ) deprecated remove_previous_ckpt key cause save checkpoint crash. See: https://github.com/volcengine/verl/issues/1183	2025-04-25 18:12:18 +08:00
Yang Wang	5c0426e134	[AMD] Update AMD performance tuning documentation (#1256 ) Update AMD performance tuning documentation according to @yushengsu-thu's suggestion. 1. fix git branch and link 2. fix tab	2025-04-25 18:10:58 +08:00
Joel	aacd3660fc	[rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138 ) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture ![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d) New Components: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](https://github.com/vllm-project/vllm/pull/9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <shengguangming@bytedance.com>	2025-04-25 17:56:34 +08:00
sicer	984e8a96c9	[proto] feat: Add bool-type index selection for DataProto (#1082 ) After the last change, current DataProto cannot use bool-type index due to hard-coded batch_size equal to idxs.shape[0]. This patch changes the new batch_size for bool-type idx to idxs.sum(). It's useful when users filter the batch with bool-type masks.	2025-04-24 22:12:24 -07:00
Junrong Lin	c71b24d2e9	[SGLang] feat: upgrade to 0.4.5.post3 & fix ipv6 (#1203 ) The ipv6 part is picked from https://github.com/volcengine/verl/pull/1184 cc @BearBiscuit05 --------- Co-authored-by: BearBiscuit05 <xiangyongan@bytedance.com> Co-authored-by: Gelee-Q <leege233@gmail.com>	2025-04-24 18:23:53 -07:00
Patrik Bartak	5080f47df0	[logging] feat: Add step and epoch metrics (#1250 ) Solves #1251 Right now the current global step and current epoch are not being logged. This would be a useful feature.	2025-04-24 13:43:58 -07:00
Yang Wang	5bd1ce3f42	[AMD] Add AMD performance tuning documentation (#1240 )	2025-04-24 12:42:56 -07:00
Mantas Bakšys	7341f52ca5	[logging] feat: Add Rollout and Validation dumps to file (#916 ) Co-authored-by: Mert Unsal <mertunsal1905@gmail.com>	2025-04-24 10:31:03 -07:00
BearBiscuit	f315ac3b98	[misc] refactor moe bash (#1245 )	2025-04-24 22:46:47 +08:00
Shawn/Yuxuan Tong	d5a44dabe5	fix: validation top_p=0.7 for DAPO full (#1241 )	2025-04-24 16:15:09 +08:00
Blue Space	a35c044627	Migrate to new image with FlashInfer 0.2.2 + vLLM 0.8.3 + SGLang 0.4.5 + MCore 0.12.0 + TE 2.2 + cuDNN 9.8.0 (#1237 ) As support both, we let TE to choose attention backend now. New Image: `whatcanyousee/verl:ngc-cu124-vllm0.8.3-sglang0.4.5-mcore0.12.0-te2.2`	2025-04-24 16:14:48 +08:00
湛露先生	650115fba9	Fix docs about config page. (#1236 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-24 13:12:57 +08:00
HL	f01a932f80	[mcore] refactor: remove the mcore patches (#1229 )	2025-04-24 09:40:45 +08:00
BearBiscuit	22f7e2c21c	[vllm] update moe patch for megatron and fsdp (#1200 ) ## Motivation This is a fix for the issue where the `weight_loader` in FusedMoe of the vLLM code could not be used correctly during the resharding phase, addressed in #923, #1137, and #1139 respectively. Currently, the results of these PRs can be used together, allow both FSDP and Megatron to use the same function, reducing code maintenance costs.	2025-04-24 09:40:12 +08:00
aoshen524	7a01e8c4f3	Update ray_debug_tutorial.rst (#1228 )	2025-04-24 09:38:23 +08:00
Baiqing Lyu	f95cc7bb54	docker: update Dockerfile.sglang (#1207 ) Install ray[default] to include missing components	2025-04-23 11:25:04 -07:00
Franz Srambical	7cfd705451	fixt: typo (#1217 ) Alternatively, we should properly expand on the role of the parameter `mapping`	2025-04-23 19:21:11 +08:00
湛露先生	a5a77680b6	fix util reward_score/math_dapo.py notes. (#1185 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-23 19:19:46 +08:00
aoshen524	65f512bbee	Update the ray debug tutorial (#1204 ) ## Motivation The existing Ray tutorial is difficult to follow and doesn’t explain how to debug across multiple breakpoints. ## Modifications - Updated `multinode.rst` ## Checklist - [x] Created independent `ray_debugger.rst` with step‑by‑step instructions	2025-04-23 19:18:54 +08:00
湛露先生	7b6b7cb5b8	clean codes (#1219 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-23 18:11:23 +08:00
thibautbar	6a9bef731c	[feat] trainer: compute reward during log_prob for ppo trainer (#1114 ) ### Description Add a new parameter to ppo_trainer that enables asynchronous reward computation during the log_probs phase. This is particularly useful when reward manager is time-consuming and we want to overlap its computation with GPU-intensive operations, improving overall throughput. By default, this parameter is set to False. #### Example: before and after this PR with the parameter set to True (GRPO on a 1.5b model): In the following plot, the CPU reward computation function (taking around 5min in this case) is now called during log prob phases to avoid wasting GPU resources. <img width="617" alt="image" src="https://github.com/user-attachments/assets/eca2ea18-c966-4525-adde-e9cb96878830" /> --------- Co-authored-by: mertunsall <mertunsal1905@gmail.com>	2025-04-22 23:10:26 -07:00
Blue Space	4081d8af1f	refactor example and test scripts to use megatron comm/comp overlap and checkpoint save (#1202 ) Examples megatron scripts are outdated.	2025-04-23 11:30:30 +08:00
HL	f1a18a2785	docs: update iclr news and gair-nlp/cognition-engineering (#1205 )	2025-04-22 18:28:00 -07:00
湛露先生	6501e79589	docker: clean redundant pre-commit dockerfile pip-package (#1195 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-22 11:45:54 -07:00
Cheetah	ad4881e16b	feat: add torch_compile param for ref model (#1164 )	2025-04-23 01:13:43 +08:00
tao-githup	6cc3afb04c	fix: enable val_batch_size to address the oom issue during validatat… (#986 ) …ion when the val dataset is large in multi-modal scenarios Co-authored-by: 刘涛 <liutao.lt@bytedance.com>	2025-04-22 22:57:26 +08:00
Hao	74d9918568	fix: update assertion of ppo_mini_batch_size and ppo_micro_batch_size_per_gpu (#833 ) original assertion is inside `if`, only executed when `ppo_micro_batch_size` is not None, and otherwise may results in NaN when training Related to [Issue #405](https://github.com/volcengine/verl/issues/405) and [PR #382](https://github.com/volcengine/verl/pull/382).	2025-04-22 21:46:57 +08:00
Blue Space	99fdbf6985	Log gpu mem refactor (#1190 ) Use wrapper to refactor logging GPU memory enter or exit a function. Simply use `VERL_LOGGING_LEVEL=DEBUG` to open current implemented memory logger wrapped around common functions.	2025-04-22 13:28:10 +08:00
hoshi-hiyouga	64672aef34	fix vllm version in setup.py (#1186 ) We have upgrade vllm to 0.8.3 in our docker file: https://github.com/volcengine/verl/blob/main/docker/Dockerfile.ngc.vllm0.8	2025-04-21 21:18:02 +08:00
Shawn/Yuxuan Tong	103f90113f	[dev] fix: instructions about merging from before using ruff (#1180 ) Our pre-commit hook and CI action only check the changes for now. In this PR, 1. We apply `ruff check --fix` and `ruff format`. 2. We remove the unnecessary pipeline from the immigration warning, since directly merging without applying `ruff`, which might cause extra conflicts, is the best way to avoid introducing extra file changes.	2025-04-20 13:51:46 -07:00
Ethan Yusheng Su	b0e3f1361e	[AMD] docker: Support AMD (ROCMm Kernel) - Support SGLang (#1179 ) [Done] - Update the Docker file and Apptainer file to support the SGLang engines - Add the 3rd-party [torch_memory_saver](torch_memory_saver](https://github.com/ExtremeViscent/torch_memory_saver) within the docker file in rocm version	2025-04-20 12:51:10 -07:00
Shawn/Yuxuan Tong	725c67666f	[ray] fix: ray hang due to num_cpus (#1009 ) Fixing #523 according to https://github.com/volcengine/verl/issues/523#issuecomment-2723652147 Concern: will `num_cpus=1` limit the performance of the cluster scheduler?	2025-04-20 12:50:17 -07:00
HL	5313d96f9b	[CI] fix: add additional pre-commit test before ppo trainer tests (#1175 )	2025-04-20 11:16:19 -07:00
SunJin Kim	6d8f2f6ab9	[algo] feat: Add DrGRPO (#990 ) https://github.com/volcengine/verl/issues/742 - Add an option for disabling standard-deviation normalization of advantages in GRPO. - This completes one out of two algorithmic changes made by Dr.GRPO to GRPO, the other one being the removal of sequence-length averaging during loss aggregation.	2025-04-20 08:44:45 -07:00
Yeonwoo Sung	b0e2a0ac88	[logging] refactor: use 'from e' for exception stack trace (#1177 ) Use the 'from e' inside the try-except statement to keep the stack trace of the error	2025-04-20 08:43:43 -07:00
Shawn/Yuxuan Tong	28e45cbde2	[Config] fix: disable XFORMERS by default since we immgrated to newer vLLM versions (#1178 )	2025-04-20 07:46:20 -07:00
BearBiscuit	3c46da551d	[megatron] fix: avoid initialization of Megatron if not use (#1143 ) ## Motivation When using FSDP in an environment that includes Megatron, the components of Megatron will also be loaded, which may lead to some unnecessary issues. Therefore, the initialization of Megatron can be postponed until it is actually used. --------- Co-authored-by: HL <linhaibin.eric@gmail.com>	2025-04-20 06:51:57 -07:00
Changlong Yu	1ab271e1b5	[megatron] fix optimizer config (#1104 )	2025-04-20 06:50:35 -07:00
Yan Bai	4fa7ed6c0d	[mcore] qwen2moe support (#1139 ) support qwen2moe structure to run with megatron-core including: * qwen2moe config converter * qwen2moe model initializer * refactor the online weight converter from mcore to vllm * qwen2moe online weight converter * qwen2moe offline weight conversion script from hf to mcore * a script to run training qwen1.5moe_a2.7b with 4 nodes TODO add option to freeze the MoE router weight during training	2025-04-20 12:48:46 +08:00
HL	c54ec18693	docs: update recent news and logo (#1173 )	2025-04-19 21:42:19 -07:00
none0663	b39c0214c8	Fix ImportError in is_megatron_core_available() and is_vllm_available() Functions (#1131 ) Issue: In a Python 3.10 environment, when using import importlib, calling importlib.util.find_spec('megatron.core') results in the following error: `833e7d7878/verl/utils/import_utils.py (L21-L30)` ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'importlib' has no attribute 'util' ``` This error causes more_spec to be None, which can lead to further issues in the code execution. Proposed Solution: I recommend adding import importlib.util to ensure that the util module is properly imported and available for use. This change will prevent the AttributeError and allow the find_spec function to work as intended. Please see the attached screenshot for reference. <img width="534" alt="Clipboard_Screenshot_1744872935" src="https://github.com/user-attachments/assets/92f63ed5-7a52-43ac-86be-2c9585320234" />	2025-04-20 10:17:59 +08:00
HL	0fd56b2080	docs: add ReTool (#1154 )	2025-04-20 09:20:00 +08:00
湛露先生	f49c5311a4	change deepspeedai url site. (#1171 ) DeepSpeed has moved to `deepspeedai` repo. Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-20 09:18:49 +08:00
Shawn/Yuxuan Tong	121f0b034c	[CI] fix: only check changed files in CI (#1168 ) We also remove previous workaround of adding ignores.	2025-04-19 11:55:28 -07:00
Blue Space	8719371949	revert multinode first (#1161 ) Will explore further	2025-04-19 16:00:15 +08:00
Baiqing Lyu	6effd52e16	Update test_dapo_7b.sh to remove extra line (#1155 ) Seems like there's an extra argument here that's causing an error when running	2025-04-19 14:53:56 +08:00
Blue Space	f3dc1d7b78	[BREAKING] ray: rewrite multi-node doc (#1160 ) The way to use ray has changed. Ray related issue: https://github.com/ray-project/ray/issues/52454	2025-04-18 23:14:52 -07:00
HL	568239fb38	CI: limit ruff checks and enable push tests (#1157 )	2025-04-19 13:54:45 +08:00
Shawn/Yuxuan Tong	b00f77d855	[dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010 ) > [!WARNING] > We are [immigrating to `ruff` as the linter and formatter and `pre-commit` as the managing tool](https://github.com/volcengine/verl/pull/1010). > > If your branch is based on a previous commit using `yapf` and `pylint`, simply merging might trigger overwhelming linting errors, while you are only expected to resolve ones in the files related to your PR. > > To resolve this issue, please try the following workaround to only include the files you really changed in the PR: > > 1. In your branch, fix linting and format with `ruff`: `ruff check --fix && ruff-format` > 2. Squash into a single commit in a new branch: `git reset --soft $(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."` > 3. Merge with the latest main: `git merge origin/main` > 4. Force push to your branch: `git push --force` We add the reminder above to the documentation to tell contributors how to avoid overwhelming linting errors. ### Motivation According to dicussion in #896, this PR immigrates from yapf & pylint to ruff based on pre-commit, which allows unified version control and automatic hook on committing. ### Summary The `pre-commit` hook and CI - checks staged / committed files in commits / PR's - checks all files each month (This should fail before we fix all the files by the ruff standard) ### Explanation for the Failing CI Workflow `pre-commit` For now, we only apply `ruff format` and `ruff check --fix` without resolving all the errors, since there are too many errors to resolve, which causes the CI workflow `pre-commit` fails. For resolving the remaining errors, we leave to future commits. Specifically, the `pre-commit` hook and CI will require every commit to fix its related files with `ruff`, which will fix all the files incrementally. ### Reviewing Suggestion The commit `3d93f51ba8` is huge since we apply `ruff` to all the files. To review the main changes, please check the commits before and after it.	2025-04-18 07:49:31 -07:00
mlmz	c98fb3197b	Doc: add a environment to fix that the memory capacity is unbalanced (#1105 ) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](https://github.com/sgl-project/sglang/pull/5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl？ 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory？ ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>	2025-04-17 21:28:17 -07:00
BearBiscuit	ec59b8788c	[misc] add sglang support for hdfs file load (#1060 )	2025-04-18 10:39:08 +08:00
BearBiscuit	0bdf7f4698	[misc] qwen moe patch for not find 'load_weights' func (#1137 )	2025-04-18 08:05:09 +08:00
LiLei	ba988bbeb5	[dapo] fix: fix timer for dapo (#1075 ) When training with Dapo, because there is a continuous filter and dynamic sampling, each iteration involves multiple samplings. The time should be summed up to represent the total sampling time for one iteration. --------- Co-authored-by: lilei <>	2025-04-17 15:00:16 -07:00
Blue Space	25b0f2262f	Move entropy to comput log probs to reduce peak memory when calculating entropy. (#1100 ) Actor do not calculate Entropy loss if `entropy_coeff==0`, and move the calculation of entropy to `compute_log_probs` Tested configuration: ```sh data.max_prompt_length=$((1024 * 2)) \ data.max_response_length=$((1024 * 10)) \ actor_rollout_ref.rollout.max_num_batched_tokens=$((1024 * 12)) \ context_parallel_size=2 \ ```	2025-04-17 17:35:59 +08:00
BearBiscuit	19d0d07329	[mcore] resharding model weights by per tensor (#1107 ) ## Motivation This is an optimization approach using a per-tensor method to reduce the additional memory required for model weights during the resharding phase. Our ultimate goal is to enable mcore to have a method that aligns with the `full_tensor()` function in FSDP and to deprecate the `AllGatherPPModel` class in future versions. Currently, this task may need to be broken down into several subtasks: ## Impact Analysis 1. The model accuracy has been tested on Qwen-7B in vllm version 0.8.2, and it aligns with the accuracy of the previous method. 2. In terms of memory usage, the `pp_cache` in `AllGatherPPModel` has been completely deprecated. 3. In terms of runtime, the performance is comparable to the original method. ## TODO - [x] Deprecate the `AllGatherPPModel` class in version 0.8.2. - [x] Ensure forward compatibility for this method. - [x] Completely deprecate the `AllGatherPPModel` class.	2025-04-17 14:57:10 +08:00
Qunhong Zeng	833e7d7878	refactor: main generation should also use pad/unpad from verl.protocol (#1103 ) The main generation should use the padding/unpad from verl.protocol to align with ray_trainer, instead of a seperate padding/unpad logic. Also make small improvements to make code looks better.	2025-04-17 12:26:04 +08:00
ann-qin-lu	f04a6dbdb7	fix: loading HF model in rank0 for mcore megatron model (#998 ) [bug fix] Loading 72B Qwen model for mcore megatron is causing OOM. Extracted out the HF loading logic to a helper function (and disable `device_map=auto`), and refactored the legacy function `load_megatron_model_weights` and the new function `load_megatron_gptmodel_weights`. `load_megatron_model_weights` should be deprecated once the the class RewardModelWorker is also migrated to Mcore.	2025-04-17 10:20:40 +08:00
Yan Bai	be9def6900	[mcore] refactor (#1064 ) refactor the mcore code, add registry for extensibility for more types of model such as MoE or VLM. clean some deprecated code such as megatron_config. reward model worker uses GPTModel api now.	2025-04-17 09:49:30 +08:00
Xingyao Wang	d6821a051a	[sft] feat: Add WSD (Warmup-Stable-Decay) scheduler for SFT (#1041 ) # Add WSD (Warmup-Stable-Decay) Learning Rate Scheduler ## Overview This PR adds a new learning rate scheduler called WSD (Warmup-Stable-Decay) that provides more control over the learning rate schedule during training. The WSD scheduler extends the traditional cosine scheduler by adding a stable phase where the learning rate remains constant. ## Features - Three-phase schedule: Warmup → Stable → Decay - Configurable stable phase: Control what percentage of training maintains a constant learning rate - Compatible with existing code: Minimal changes to the trainer infrastructure - Default to cosine: Maintains backward compatibility with existing configurations ## Implementation Details 1. Added `get_wsd_schedule_with_warmup` function to `verl/utils/torch_functional.py` 2. Updated the SFT trainer to support the new scheduler type 3. Added `lr_scheduler: cosine` as the default in the SFT trainer config Here's the reference implementation: `6397d56279/pytorch_optimizer/lr_scheduler/wsd.py (L8)` ## Usage To use the WSD scheduler, set the following in your configuration: ```yaml optim: lr_scheduler: wsd # Options: 'cosine' (default) or 'wsd' ``` ## Benefits - Better control over learning rate behavior during training - Potentially improved training stability for certain tasks - Allows experimentation with different learning rate schedules without code changes (trying to get this in to make sure my own branch don't end up with huge chunk of git conflict 😓 ) --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: HL <linhaibin.eric@gmail.com>	2025-04-16 11:26:19 -07:00
mingruimingrui	54fbd156b1	[feat] video inputs (#1116 ) ## What does this MR do? Adds video input support for qwen2 vl models ## Changes - process_image function is moved to vision_utils.py - switch process_image & process_video to both use fetch_image and fetch_video functions from qwen-vl-utils - fixed a mrope bug in vllm rollout	2025-04-17 00:39:02 +08:00
Lumeng Wu	3e3d9372a5	fix: missed loss_agg_mode in dp_actor (#945 ) as titled	2025-04-17 00:28:48 +08:00
Silver	f845a46a17	Update install.rst to fix a typo (#1111 ) Fix a typo	2025-04-16 18:25:54 +08:00
BearBiscuit	635997a5ec	[misc] add dummy load for sglang (#1068 )	2025-04-16 09:46:09 +08:00
hoshi-hiyouga	d814965003	[vlm] data: use hf template for vlm models (#1085 ) ## What does this PR do? In this PR, we use huggingface's chat template (i.e., `processor.apply_chat_template`) to compute input ids for VLMs, it could be generalized to more model architectures compared with the earlier implementation. ## Who can review? @vermouth1992 @eric-haibin-lin	2025-04-15 15:05:58 -07:00
hoshi-hiyouga	588404196b	doc: upgrade to vllm 0.8.3 (#1081 ) ## What does this PR do? - Upgrade docker image to vllm 0.8.3 to avoid memory leakage - Add wake up tags to megatron rollout worker ## Who can review? @vermouth1992 @BearBiscuit05 @ETOgaosion	2025-04-16 01:11:52 +08:00
Shawn/Yuxuan Tong	189c87c37c	[CI] feat: try HF_HUB_OFFLINE to fix network errors (#1098 ) Trying to fix network errors like ``` huggingface_hub.errors.HfHubHTTPError: 429 Client Error: Too Many Requests for url: ... ```	2025-04-15 22:27:27 +08:00
Shawn/Yuxuan Tong	6bfa45c5c8	[doc] feat: adding CI tests (#1099 )	2025-04-15 19:45:13 +08:00
HL	5c984b7748	docs: update awesome work (#1090 )	2025-04-14 22:52:44 -07:00
XinyuanTong	4ec9974735	CI: add vlm CI for sglang rollout (#1088 ) As titled.	2025-04-14 20:21:29 -07:00
pengsun	68958ef877	[feat] vllm: add rollout config swap_space for vllm_rollout (#960 ) When training big model and/or super-long seq len using vLLM rollout, you may encounter the error ``` ... in _swap_out raise RuntimeError( RuntimeError: Aborted due to the lack of CPU swap space. Please increase the swap space to avoid this error ``` (updated)This can be fixed by setting bigger `swap_space` for vLLM. E.g., in your training bash you can do the ``` ... actor_rollout_ref.rollout.engine_kwargs.swap_space=32 \ ... ``` which sets the swap_space to 32GB. Note in most vLLM releases the default value is 4GB.	2025-04-14 14:16:53 -07:00
lei-lei-shanda	ebc3294b7e	[misc] ray: Fix typo in colocate (#1074 ) - Force both usages of `colocate` and `collocate` to `colocate`, to be consistent with [vllm terminology](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html). - both `ResourcePool` and `RayResourcePool` use the same default value for `max_colocate_count` to avoid surprises.	2025-04-14 10:22:50 -07:00
Shawn/Yuxuan Tong	1559f62d1e	fix: remove output.txt (#1086 ) Fixing https://github.com/volcengine/verl/pull/1032#discussion_r2042521317	2025-04-14 10:19:40 -07:00
Shawn/Yuxuan Tong	5ba1dbc606	[ci] feat: improve CI speed to 1-2min per test (#1032 ) ### Summary #### Minimize Test Workloads This PR minimizes the test workloads while keeping them meaningful, reducing the time cost of a test from >10 min to 1~2 min. Specifically, we 1. set batch sizes and steps as small but still meaningful numbers: ```bash train_traj_micro_bsz_per_gpu=2 # b n_resp_per_prompt=4 # g train_traj_micro_bsz=$((train_traj_micro_bsz_per_gpu * NUM_GPUS)) # b * n train_traj_mini_bsz=$((train_traj_micro_bsz * 2)) # 2 * b * n train_prompt_mini_bsz=$((train_traj_mini_bsz * n_resp_per_prompt)) # 2 * b * n / g train_prompt_bsz=$((train_prompt_mini_bsz * 2)) # 4 * b * n / g # ... TOT_TRAIN_STEPS=${TOT_TRAIN_STEPS:-1} ``` 2. disable validation (this costs a lot!) / saving / resuming for training tests by default and leave them to specialized tests ```bash # Validation VAL_BEFORE_TRAIN=${VAL_BEFORE_TRAIN:-False} TEST_FREQ=${TEST_FREQ:--1} # Save & Resume RESUME_MODE=${RESUME_MODE:-disable} SAVE_FREQ=${SAVE_FREQ:--1} ``` #### Improve Triggering Mode This PRs introduces a more comprehensive triggering logic mode. Specifically, we 1. consider all Python code by default 2. include related entrypoints (the workflow config, scripts used by it and hydra config, etc.) 3. exclude unrelated Python code from other components (e.g., recipes, examples, Megatron, SFT, generation, evaluation, etc. for FSDP training) An example from `e2e_ppo_trainer`: ```yaml on: paths: - "*/.py" # Entrypoints - ".github/workflows/e2e_ppo_trainer.yml" - "examples/data_preprocess/gsm8k.py" - "examples/data_preprocess/geo3k.py" - "tests/e2e/ppo_trainer" - "verl/trainer/main_ppo.py" - "verl/trainer/config/ppo_trainer.yaml" - "!examples" - "!verl/trainer/main_.py" - "!verl/trainer/fsdp_sft_trainer.py" # Recipes - "!recipe" # Megatron - "!verl/workers//megatron_.py" ``` #### Avoid missing out errors Some test scripts didn't end with the main python command and might miss out the error. To address this issue, this PR introduces the following options: ```bash set -xeuo pipefail ``` , which means - `x`: Print each command before executing it (useful for debugging) - `e`: Exit immediately if any command fails (returns non-zero exit status) - `u`: Treat unset variables as an error - `o pipefail`: Return the exit status of the last command in a pipeline that failed, or zero if all succeeded Together, these options make the script fail fast and provide verbose output, which helps with debugging and ensuring the script doesn't continue after encountering errors. #### Others Besides, we also 1. unify runner labels into `"L20x8"` to enable preemptive scheduling of jobs 2. reduce test scripts of minimal differences, grouping by entrypoint (e.g. `ppo_trainer`, `ppo_megatron_trainer`, recipes, etc.), into a base script with options	2025-04-14 09:48:10 -07:00
Ikko Eltociear Ashimine	d7978b66d9	chore: update diagnose.py (#1078 ) occured -> occurred	2025-04-14 21:35:57 +08:00
Chi Zhang	f6b9bcc359	[logger] fix: fix mlflow (#1073 )	2025-04-14 18:13:17 +08:00
Shawn/Yuxuan Tong	866e9808d4	[CI] feat: unify CI label to enbale preemptive schedule for jobs (#1072 )	2025-04-14 16:52:30 +08:00
Yan Bai	0a4f4b3cc1	mcore readme (#1071 ) add a doc for mcore	2025-04-14 16:29:51 +08:00
Wenjie Zhao	c46d542772	fix: replace '@' with '_at_' in metric names to comply with MLflow naming constraints (#984 ) Fix MLflow metric name errors by replacing '@' with '_at_' during MLflow logging MLflow rejects metric names with '@' as below `mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid metric name: 'val-aux/semantic_matching/reward/mean@1'. Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ), and slashes (/).` Co-authored-by: wenjie zhao <aswenjie@amazon.com>	2025-04-14 16:25:09 +08:00
Blue Space	f976b1853d	Update vllm 0.8.2 with megatron 0.11.0 (#1054 ) Parts of #851 Including minimal of upgrade: 1. vllm 0.8.2 with megatron 2. part of per-tensor allgather and load weights 3. fix bugs with context parallel, because of dataloader random seed, seems behavior changed in torch 2.6.0	2025-04-14 09:27:35 +08:00
Blue Space	d9df9bbb5f	Fix megatron default config (#1053 ) #1047 may cause some case fail with tp=1, since megatron prohibit use sequence parallel in that case. Now we still default enable sp for user to write scripts conviniently, and automatically enable and disable sp inside `_validata_config`	2025-04-14 01:33:18 +08:00
Chayenne	c4b5f097af	docs: update sglang_worker author list and image (#1045 )	2025-04-13 07:43:36 -07:00
Yan Bai	d4cae44726	[mcore] option to use dist checkpoint (#1030 ) mcore dist checkpointing is a parallel-invariant weight format, you can save and load in arbitrary parallel settings. e.g. save in tp2pp2 and load in tp4pp1. This PR introduce an option to use dist checkpoint with mcore backend. It is disabled by default for backward compatibility. But future support for mcore MoE models and VLM models will work only when dist ckpt is enabled for a easier implementation. Before this PR, when initing actor and critic workers, each GPU would load the entire huggingface weights and then re-shard to correct mcore model state dict, making the procedure slow and complicated. With this PR, we convert hf weight to dist ckpt by offline scripts, and each GPU will only load its parts from dist ckpt. The speed is faster and no more online resharding needed. When loading `Qwen2-7B-Instruct` for critic worker, the loading time reduced from 109s to 25s, speedup by 4.36x The `converter_hf_to_mcore.py` in this version use existing online resharding function to convert weights. And it should be refactored for better efficiency and MoE/VLM models. Thanks to #998 for the optimization of loading hf weight only at GPU 0. Future TODO: * refactor the converter for efficiency * support converting MoE models * support converting VLM models * re-design `megatron_checkpoint_manager.py` with dist ckpt * implement converter from mcore dist ckpt to hf / `model_merger.py` * add docs and example scripts	2025-04-13 17:59:43 +08:00
NascentAscension	6dd5e39a11	fix: Megatron_workers batch_size config is not processed correctly (#1029 ) The following two batch_sizes don't work correctly when using megatron backend: 1. actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu for update_actor() 2. actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu for compute_ref_log_prob() #1028	2025-04-13 17:27:21 +08:00
Blue Space	9830b17ba2	fix checkpoint rng_states confliction (#1046 ) Only 1 node in a machine save rng_states to avoid conflicts and read properly New version of torch.save can cause races here. FSDP also split the rng_states in extra states	2025-04-13 16:01:06 +08:00
Blue Space	eda9f0e9be	reset default tp size (#1047 )	2025-04-13 05:58:22 +08:00
HL	d882b62b01	tests: add import utils tests (#1042 )	2025-04-11 18:55:54 -07:00
Chayenne	dc1714a428	docs: update sglang_worker authors (#1038 ) Add full authors of SGLang RL team. Thanks!	2025-04-11 11:19:07 -07:00
Shawn/Yuxuan Tong	7a4242324c	[log] fix: val-core pattern (#1012 ) This PR 1. fixes the problem that important metrics like `"mean@{n}"` can not be recognized as `val-core` due to lack to `/...` at the end 2. removes `"std@{n}"` from `val-core`	2025-04-11 09:51:38 -07:00
Happy	379945b0d3	docs: update README.md for TRPA (#1034 )	2025-04-11 09:50:43 -07:00
G.O.D	8491b9c56d	docs: fix doc typo (#1035 )	2025-04-11 09:49:53 -07:00
Qunhong Zeng	6cbfa48a90	fix: use packaging to compre versions instead of str comparing (#1027 ) Use `packaging.version` to compare tensordict's version instead of string comparing, string comparing sometimes will fail, for example, "0.10.0" < "0.5.0" when using string comparing. Also remove a unnecessary return_type since this return_type will always be `DataProtoItem`.	2025-04-11 17:20:47 +08:00
Shawn/Yuxuan Tong	a9bf431075	[recipe] fix: loss_agg_mode for dapo early (#1018 )	2025-04-11 11:09:37 +02:00
Taiwei Shi	d2e602a1a7	docs: add AdaRFT to awesome work using verl (#1024 )	2025-04-10 22:11:33 -07:00
Qunhong Zeng	3256142434	[Breaking] dataset: support customized datasets for RayPPOTrainer (#924 ) This PR enable user to specify their customized dataset for RayPPOTrainer. NOTE: the RLHFDataset interface has been broken into: ``` RLHFDataset( data_files: Union[str, List[str]], tokenizer: PreTrainedTokenizer, config: DictConfig, processor: Optional[ProcessorMixin] = None ) ``` and the custom dataset class MUST also use this interface. cc @eric-haibin-lin	2025-04-10 22:07:42 -07:00
Chi Zhang	c9e3c57cf8	[megatron] feat: optimize entropy loss (#1007 )	2025-04-11 09:37:37 +08:00
HL	3fbb1ad7ed	[sglang] docs: fix README index (#1016 )	2025-04-11 09:29:09 +08:00
BearBiscuit	c62e7ac7bc	[tuning] docs: add more case for grpo train (#983 ) Co-authored-by: HL <linhaibin.eric@gmail.com>	2025-04-10 15:16:47 -07:00
BearBiscuit	aa58617c69	[sglang] docs: add quickstart doc to use sglang in verl (#1001 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>	2025-04-10 08:34:12 -07:00
BearBiscuit	550bbbbffe	[vllm] fix oom when vllm wakeup (vllm >=0.8.3) (#987 ) This is a memory optimization method implemented based on this [fix](https://github.com/vllm-project/vllm/pull/15500). I just successfully ran a 72B model on 8*H800 cards. Before the fix, I would encounter an OOM issue. Please note that this fix is only effective for vLLM >= 0.8.3.	2025-04-10 18:07:10 +08:00
Yan Bai	9f405b48a4	[Mcore] context parallel (#970 ) support context parallel for mcore backend. Changes on: * configs * model loader * checkpint * single control dispatcher * forward preprocess and postprocess --------- Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>	2025-04-10 13:05:58 +08:00
Tony Yu Cao	90f5ce15de	Change behaviour during raw prompt extraction (#989 ) This PR suggest a fix on a bug that when `_switch_chat_template()` method is called. According to https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L222 `data.non_tensor_batch['raw_prompt'][i]` is already a list if `data.return_raw_chat=True`. Calling `.tolist()` again will result an error. Now we check if it is a list before run this method.	2025-04-10 09:04:20 +08:00
HL	babd2c183c	docs: update recent talks (#996 )	2025-04-10 09:03:43 +08:00
SunJin Kim	1ee730163f	fix: add seed to vllm spmd 0.8.3 (#912 ) See `8b664706aa` In summary, now when using external launcher in vLLM, a Seed must be set. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>	2025-04-09 17:40:52 +08:00
Fangkai Jiao	fefe951f2a	Add support to HSDP model merging. (#971 ) Currently the model merger does not support HSDP (the `ddp` mesh dim is not considered). This PR fixes this.	2025-04-09 07:55:39 +08:00
Lumeng Wu	88db554073	fix: wrong pg_clipfrac_lower (#972 ) Currently, `pg_clipfrac_lower` is always 0 by mistake.	2025-04-09 07:55:22 +08:00
BearBiscuit	fa23b696dd	[tuning] docs: record the resource requirements for 70b model (#976 ) Co-authored-by: HL <linhaibin.eric@gmail.com>	2025-04-08 11:33:44 -07:00
Shawn/Yuxuan Tong	1a42f14da0	fix: reward_fn_key for PRIME (#975 )	2025-04-08 20:03:21 +08:00
Shawn/Yuxuan Tong	713e99e6a1	fix: DAPO wandb link (#978 )	2025-04-08 20:02:33 +08:00
Shawn/Yuxuan Tong	6433fd4a97	fix: return list from bootstrap_metric (#969 ) Fixing #950	2025-04-08 16:28:43 +08:00
HL	96f7177972	docs: add open-hands, vagen (#963 )	2025-04-08 14:11:05 +08:00
Zhe Chen	fd0eba03cd	fix: optim.warmup_style do not take effect (#418 ) (#959 ) Support to set warmup_style=='cosine'.	2025-04-08 11:57:40 +08:00
Qunhong Zeng	8400beb87c	[merger] fix: move megatron import into megatron related branch (#958 ) users using fsdp backend may no have megatron installed, directly running this script will lead to an import error.	2025-04-07 09:50:21 -07:00
Hongpeng Guo	c87e9f69e5	[distributed] enhancement: Make `register_center` named actor waiting time configurable & providing better error info (#947 ) ## Summary As mentioned in #491, the `register_center` named actor could be `None` after 2mins waiting time and crash the job for some verl users. This might be due to (1) uncleaned ray resources from previous runs; or (2) too short waiting time of 120s if the `named_actor` launching task is delayed in the cluster. This PR makes the `register_cetner` named actor waiting time configurable and longer by default . This PR also provides better error info to help users to self debug the issue. ## Related issues #491 --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-04-07 09:48:27 -07:00
Altair-Alpha	f20f552873	fix: support non-DTensor when converting fsdp checkpoints to hf model (#925 ) As mentioned in https://github.com/volcengine/verl/issues/903, the model_merger script has some problem when dealing with saved fsdp checkpoint trained with `trainer.n_gpus_per_node=1`. The loaded `weight` is of type `Tensor` instead of `DTensor`. This PR supported this situation.	2025-04-07 15:46:33 +08:00
Changlong Yu	d13434fd7b	[megatron] feat: add gradient checkpointing in megatron backend (#944 ) ### Changes Add gradient checkpointing (aka `activation recomputation`) config and support from Megatron core (`b7ec711cf6/megatron/core/transformer/transformer_config.py (L208-L233)`) to make activation checkpointing more efficient for LLMs with 20B+ parameters. ``` gradient_checkpointing_kwargs: activations_checkpoint_method: null activations_checkpoint_granularity: null activations_checkpoint_num_layers: null ``` ### Test Tested on loading Qwen7b/32b of 16k input prompts and bypass the OOM issues after adding gradient checkpointing. ### Next Step Add one `ppo_trainer for megatron` doc to explain the config details in https://verl.readthedocs.io/en/latest/examples/config.html	2025-04-06 20:49:20 -07:00
Mert Unsal	82cbc43dc7	feat: Batch Rewards (#871 )	2025-04-07 11:15:48 +08:00
XinyuanTong	6efa0181fa	[sglang] feat: enhence sglang_rollout to handle image input (#824 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: GeLee <leege233@gmail.com> Co-authored-by: ocss884 <ocss.lin@gmail.com>	2025-04-06 12:01:55 -07:00
BearBiscuit	f8d19735c5	[vllm] feat: enable FSDP and vLLM 0.8.2 to support DPSK v3 training. (#923 ) This is a solution to an error that occurs when vLLM 0.8.2 loads models in the ds_v3 MoE format. For MoE models, directly calling vLLM's `load_weights· function will result in an error, primarily because the model params lose the related methods of the `FusedMoe` class. Therefore, it is necessary to identify and call the specific parameter loading method based on the model parameter names at runtime. However, the current version is a dirty implementation because, in reality, if invasive changes were made to vLLM, it would only require modifying 2 lines of code and adding a new function to identify the layer count (I have already marked comments in the code). I’m not sure if there’s a better implementation and would like some suggestions.	2025-04-06 10:54:31 -07:00
Shawn/Yuxuan Tong	d2c60642e7	[log] fix: validation metrics of reward and maj voting (#927 ) This PR: 1. calculate metrics for "reward" by default if "acc"s are not avaiblable 2. don't calculate majority voting metrics if "pred"s are not available	2025-04-06 10:42:03 -07:00
Junrong Lin	7753d37d71	[sglang] ci: upgrade to sglang 0.4.4.post4 (#941 )	2025-04-06 10:33:53 -07:00
none0663	7afa6c6225	[rf++] style: eos_mask to response_mask for reinforce++-baseline method (#938 ) [fix] misleading eos_mask->response_mask for reinforce_plus_plus_baseline.	2025-04-06 10:10:07 -07:00
HL	40c00c5d52	[ci] chore: reduce CI load part-2 (#942 )	2025-04-07 01:07:58 +08:00
HL	526c0908be	[ci] chore: reduce CI load (#934 )	2025-04-06 10:06:10 -07:00
Fengqing Jiang	15263cb86a	prompt: Fix computer_score in fsdp_workers.py (#629 ) Fix issue when not switching the chat template, `rm_data` is undefined --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-04-05 23:00:54 -07:00
HL	7471e015d2	docs: add triton compile err to faq (#809 )	2025-04-06 13:16:20 +08:00
Shawn/Yuxuan Tong	4d722b1768	fix: seq-mean-token-sum loss (#931 )	2025-04-06 06:54:39 +02:00
Junrong Lin	7fc8330d99	[sglang] feat: SGLang rollout multinode support (#915 ) Allow multinode tensor parallel for furture plan --------- Co-authored-by: zobinHuang <zobin1999@gmail.com> Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-04-05 20:17:35 -07:00
BearBiscuit	de46048420	[vllm] fix: skip vllm initialization with weight loading (#922 ) In version 0.8.2, forgetting to add `dummy` parameter resulted in repeated loading. And it also needs to be compatible with the default parameter `dummy_dtensor`.	2025-04-05 19:42:34 -07:00
Junrong Lin	cc6dd901f7	docker: add verl-sglang dockerfile (#930 ) As stated in #915 , add the dockerfile for building verl-sglang image	2025-04-05 10:03:50 -07:00
Shawn/Yuxuan Tong	8447937cb8	[math-verify] fix: TimeoutException (#929 )	2025-04-05 08:41:05 -07:00
hijkzzz	30259d2c0b	Feat: support REINFORCE++-baseline and add script for REINFORCE++ (#908 ) Refer to the paper REINFORCE++ (https://arxiv.org/abs/2501.03262) and the OpenRLHF project (https://github.com/OpenRLHF/OpenRLHF). We find that the RF++-baseline demonstrates greater stability than GRPO, particularly in mathematical scenarios and reasoning tasks.	2025-04-04 18:59:57 -07:00
Xingyao Wang	fb0394143f	feat: Add multi-turn SFT support (#195 )	2025-04-04 16:17:06 -07:00
HL	4f245a3bd7	tool: add diagnosis script (#918 ) add dependency detector for vllm/sglang, as well as cuda info usage: `python3 scripts/diagnose.py`	2025-04-04 22:51:07 +02:00
HL	0fc8e77b59	docs: update installation and adoption docs (#921 )	2025-04-04 22:35:48 +02:00
Qunhong Zeng	0407cad23b	[dataset] refactor: remove unused filter_prompts parameter from RLHFDataset (#889 ) `filter_prompts` has never been used, I think this parameter has been replaced by `filter_overlong_prompts` so we can simply remove this.	2025-04-04 09:32:49 -07:00
Shawn/Yuxuan Tong	d5a1c810bd	fix: set gen_batch_size based on config (#909 )	2025-04-04 09:31:07 -07:00
Qunhong Zeng	6974bbaeea	[dataset] refactor: use hf Dataset instead of pandas DataFrame in RLHFDataset for speedup (#890 ) HF Dataset provides better memory management and can handle larger datasets. It also supports multi-process acceleration during map/filter operations (while pandas requires version >2.0). Now we can specify `filter_overlong_prompts` on large-scale datasets when set `filter_overlong_prompts_workers` to a appreciate num. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>	2025-04-03 21:51:53 -07:00
Shawn/Yuxuan Tong	6d931df9ad	[log] fix: log after generate_sequences (#819 )	2025-04-03 21:39:22 -07:00
Lumeng Wu	f9256b8dbf	[algo] misc: remove redundant tile([1, response_length]), efficient broadcast instead (#868 ) as titled	2025-04-04 10:29:11 +08:00
Shawn/Yuxuan Tong	3a27a98647	[recipe] feat: integrate DAPO and provide reproduction script (#623 ) > [!WARNING] > As mentioned in https://github.com/volcengine/verl/pull/623#issuecomment-2733688593, the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation. > We keep it as is for reproducibility in this branch and will fix it in another PR for the main branch. --------- Co-authored-by: Guangming Sheng <shengguangming@bytedance.com> Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>	2025-04-04 05:46:47 +08:00
Shawn/Yuxuan Tong	cc612dbae6	[dev] feat: default VSCode repo settings to help consistency with CI (#894 ) This PR adds default VSCode repo settings to help keep consistent with the CI, which: 1. enable the `pylint` linter extension 2. set the default formatter as `yapf` 3. but don't organize imports for now (since we haven't got a functionality for this)	2025-04-04 03:36:47 +08:00
BearBiscuit	1a7e53d076	test: update vllm_spmd test for > 0.7.3 (#861 ) I tested the `Deepseek-7B-chat` and `Qwen2-7B` these two models, the former showed a 0% difference in output, while the latter exhibited a 10.25% difference in output, with no significant issues in the output. So I manually adjusted the error tolerance to 15%. I’m not sure if this will work.	2025-04-04 00:53:58 +08:00
Blue Space	0338805954	reuse GPTModel, try to fix CI issue (#884 ) Also try to reduce CI time in this version, grpo hangs too much tasks in L20-1 In current CI device mapping: Ckpt 0 16m Dataset 1 1m dcf 1 6m dc 0 6m Ea 1 8m grpo 0 15m grpo 1 15m Mega 0 15m Gp 1 2m Gsm8k 1 24m Lora 1 1m sft 1 2m sglang 1 3m Vlm 1 5m Model 1 3m Ray 0 3m Sandbox 0 1m Vllm 0 12m 0 16+6+15+15+3+1+12=68 1 1+6+8+15+2+24+1+2+3+5+3=70	2025-04-03 23:52:52 +08:00
JackieWu	b6dc157202	fix: the error is not raised when using both megatron and hf inference (#885 ) Hi there, when using both megatron and `actor_rollout_ref.rollout.name=hf`, the NotImplementedError is not raised. The PR fixes it. ``` (TaskRunner pid=229016) File "/root/verl/verl/workers/megatron_workers.py", line 285, in _build_rollout (TaskRunner pid=229016) return rollout, sharding_manager (TaskRunner pid=229016) UnboundLocalError: local variable 'rollout' referenced before assignment ```	2025-04-03 17:17:04 +08:00
Yuyang Ding	b0e0ac5da7	docs: add config docs for evaluation.yaml (#886 ) https://github.com/volcengine/verl/pull/777#discussion_r2024195591	2025-04-03 17:16:30 +08:00
Lumeng Wu	8cae42dc29	fix: misleading eos_mask->response_mask (#878 ) https://github.com/volcengine/verl/pull/868#discussion_r2024416560	2025-04-03 13:01:07 +08:00
HL	7895c1f472	docs: add megatron grpo qwen2 training logs (#881 )	2025-04-03 13:00:24 +08:00
HL	81a15ed78a	revert: "Use Mcore GPTModel" (#883 ) Reverts volcengine/verl#706 temporarily as it breaks CI https://github.com/volcengine/verl/actions/runs/14220739954/attempts/2 ``` (TaskRunner pid=10086) 'Initial validation metrics: {}' (TaskRunner pid=10086) step:0 (TaskRunner pid=10086) list(reward_extra_infos_dict.keys())=[] (TaskRunner pid=10086) test_gen_batch meta info: {'eos_token_id': 32021, 'pad_token_id': 32014, 'recompute_log_prob': False, 'do_sample': False, 'validate': True} (TaskRunner pid=10086) validation generation end (TaskRunner pid=10086) [prompt] You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer (TaskRunner pid=10086) ### Instruction: (TaskRunner pid=10086) Training Progress: 33%\|███▎ \| 1/3 [02:39<05:18, 159.11s/it] (WorkerDict pid=18977) /root/miniconda3/lib/python3.10/site-packages/torch/autograd/graph.py:768: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) [repeated 7x across cluster] (WorkerDict pid=18977) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass [repeated 7x across cluster] (TaskRunner pid=10086) Training Progress: 33%\|███▎ \| 1/3 [04:51<09:43, 291.93s/it] (WorkerDict pid=18980) [rank4]:[E402 16:49:38.988158820 ProcessGroupNCCL.cpp:1515] [PG 97 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered (WorkerDict pid=18980) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (WorkerDict pid=18980) For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (WorkerDict pid=18980) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (WorkerDict pid=18980) (WorkerDict pid=18980) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): (WorkerDict pid=18980) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc6e4177f86 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) (WorkerDict pid=18980) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x64 (0x7fc6e4126d10 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) (WorkerDict pid=18980) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const, char const, int, bool) + 0x118 (0x7fc6e4594f08 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10_cuda.so) (WorkerDict pid=18980) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fc6927d2a56 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fc6927d7c70 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fc6927de92a in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc6927e0d6c in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #7: <unknown function> + 0xdbbf4 (0x7fc9fd477bf4 in /root/miniconda3/bin/../lib/libstdc++.so.6) (WorkerDict pid=18980) frame #8: <unknown function> + 0x94ac3 (0x7fc9ff2f0ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) (WorkerDict pid=18980) frame #9: clone + 0x44 (0x7fc9ff381a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) (WorkerDict pid=18980) (WorkerDict pid=18980) [2025-04-02 16:49:38,666 E 18980 20767] logging.cc:97: Unhandled exception: N3c1016DistBackendErrorE. what(): [PG 97 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered (WorkerDict pid=18980) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (WorkerDict pid=18980) For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (WorkerDict pid=18980) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (WorkerDict pid=18980) (WorkerDict pid=18980) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): (WorkerDict pid=18980) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc6e4177f86 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) (WorkerDict pid=18980) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x64 (0x7fc6e4126d10 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) (WorkerDict pid=18980) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const, char const, int, bool) + 0x118 (0x7fc6e4594f08 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10_cuda.so) (WorkerDict pid=18980) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fc6927d2a56 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fc6927d7c70 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fc6927de92a in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc6927e0d6c in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #7: <unknown function> + 0xdbbf4 (0x7fc9fd477bf4 in /root/miniconda3/bin/../lib/libstdc++.so.6) (WorkerDict pid=18980) frame #8: <unknown function> + 0x94ac3 (0x7fc9ff2f0ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) (WorkerDict pid=18980) frame #9: clone + 0x44 (0x7fc9ff381a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) (WorkerDict pid=18980) (WorkerDict pid=18980) Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1521 (most recent call first): (WorkerDict pid=18980) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc6e4177f86 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libc10.so) (WorkerDict pid=18980) frame #1: <unknown function> + 0xe1a5e4 (0x7fc6924625e4 in /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) (WorkerDict pid=18980) frame #2: <unknown function> + 0xdbbf4 (0x7fc9fd477bf4 in /root/miniconda3/bin/../lib/libstdc++.so.6) (WorkerDict pid=18980) frame #3: <unknown function> + 0x94ac3 (0x7fc9ff2f0ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) (WorkerDict pid=18980) frame #4: clone + 0x44 (0x7fc9ff381a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) (WorkerDict pid=18980) (WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:104: Stack trace: (WorkerDict pid=18980) /root/miniconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe543a) [0x7fc9fe5a143a] ray::operator<<() (WorkerDict pid=18980) /root/miniconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe7b78) [0x7fc9fe5a3b78] ray::TerminateHandler() (WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xb135a) [0x7fc9fd44d35a] __cxxabiv1::__terminate() (WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7fc9fd44d3c5] (WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xb134f) [0x7fc9fd44d34f] (WorkerDict pid=18980) /root/miniconda3/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so(+0xe1a695) [0x7fc692462695] c10d::ProcessGroupNCCL::ncclCommWatchdog() (WorkerDict pid=18980) /root/miniconda3/bin/../lib/libstdc++.so.6(+0xdbbf4) [0x7fc9fd477bf4] execute_native_thread_routine (WorkerDict pid=18980) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fc9ff2f0ac3] (WorkerDict pid=18980) /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fc9ff381a04] __clone (WorkerDict pid=18980) (WorkerDict pid=18980) * SIGABRT received at time=1743612578 on cpu 118 * (WorkerDict pid=18980) PC: @ 0x7fc9ff2f29fc (unknown) pthread_kill (WorkerDict pid=18980) @ 0x7fc9ff29e520 (unknown) (unknown) (WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:361: * SIGABRT received at time=1743612578 on cpu 118 * (WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:361: PC: @ 0x7fc9ff2f29fc (unknown) pthread_kill (WorkerDict pid=18980) [2025-04-02 16:49:38,675 E 18980 20767] logging.cc:361: @ 0x7fc9ff29e520 (unknown) (unknown) (WorkerDict pid=18980) Fatal Python error: Aborted (WorkerDict pid=18980) (WorkerDict pid=18980) (WorkerDict pid=18980) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, _brotli, zstandard.backend_c, uvloop.loop, ray._raylet, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, msgspec._core, sentencepiece._sentencepiece, PIL._imagingft, regex._regex, multidict._multidict, yarl._helpers_c, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, pyarrow._json, zmq.backend.cython.context, zmq.backend.cython.message, zmq.backend.cython.socket, zmq.backend.cython._device, zmq.backend.cython._poll, zmq.backend.cython._proxy_steerable, zmq.backend.cython._version, zmq.backend.cython.error, zmq.backend.cython.utils (total: 96) Error executing job with overrides: ['algorithm.adv_estimator=gae', 'data.train_files=/github/home/data/gsm8k/train.parquet', 'data.val_files=/github/home/data/gsm8k/test.parquet', 'data.train_batch_size=1024', 'data.max_prompt_length=512', 'data.max_response_length=512', 'actor_rollout_ref.model.path=/github/home/models/deepseek-ai/deepseek-coder-1.3b-instruct', 'actor_rollout_ref.actor.optim.lr=2e-6', 'actor_rollout_ref.actor.ppo_mini_batch_size=256', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4', 'actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=2', 'actor_rollout_ref.actor.megatron.virtual_pipeline_model_parallel_size=2', 'actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4', 'actor_rollout_ref.actor.use_kl_loss=False', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8', 'actor_rollout_ref.rollout.tensor_model_parallel_size=2', 'actor_rollout_ref.rollout.name=vllm', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.5', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16', 'actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=2', 'actor_rollout_ref.ref.megatron.virtual_pipeline_model_parallel_size=2', 'actor_rollout_ref.ref.megatron.tensor_model_parallel_size=2', 'critic.optim.lr=2e-5', 'critic.model.path=/github/home/models/deepseek-ai/deepseek-coder-1.3b-instruct', 'critic.model.enable_gradient_checkpointing=False', 'critic.ppo_micro_batch_size_per_gpu=4', 'critic.megatron.pipeline_model_parallel_size=2', 'critic.megatron.virtual_pipeline_model_parallel_size=2', 'critic.megatron.tensor_model_parallel_size=2', 'algorithm.use_kl_in_reward=True', 'algorithm.kl_penalty=kl', 'algorithm.kl_ctrl.kl_coef=0.001', 'trainer.critic_warmup=0', 'trainer.logger=[console]', 'trainer.project_name=verl_megatron_gsm8k_examples', 'trainer.experiment_name=deepseek_llm_1b3_function_rm', 'trainer.n_gpus_per_node=8', 'trainer.nnodes=1', 'trainer.save_freq=-1', 'trainer.test_freq=1', 'trainer.total_epochs=15', 'trainer.total_training_steps=3'] (TaskRunner pid=10086) Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market? Let's think step by step and output the final answer after "####". (TaskRunner pid=10086) ### Response: (TaskRunner pid=10086) (TaskRunner pid=10086) [response] I'm sorry, but as an AI programming assistant, I'm specialized in answering questions related to computer science. I'm not equipped to provide answers to questions about economics or business calculations. I recommend using a calculator or a business-oriented tool for this type of question. (TaskRunner pid=10086) (TaskRunner pid=10086) [ground_truth] 18 (TaskRunner pid=10086) [score] 0.0 (TaskRunner pid=10086) step:1 - global_seqlen/min:[486](https://github.com/volcengine/verl/actions/runs/14220739954/job/39861249946#step:6:487)35.000 - global_seqlen/max:51694.000 - global_seqlen/minmax_diff:3059.000 - global_seqlen/balanced_min:49636.000 - global_seqlen/balanced_max:49637.000 - global_seqlen/mean:49636.125 - actor/reward_kl_penalty:0.000 - actor/reward_kl_penalty_coeff:0.001 - critic/vf_loss:0.015 - critic/vf_clipfrac:0.001 - critic/vpred_mean:0.007 - perf/mfu/critic:0.105 - actor/entropy_loss:0.550 - actor/pg_loss:-0.000 - actor/pg_clipfrac:0.018 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - perf/mfu/actor:0.106 - critic/score/mean:0.000 - critic/score/max:0.000 - critic/score/min:0.000 - critic/rewards/mean:0.000 - critic/rewards/max:0.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.000 - critic/advantages/max:4.994 - critic/advantages/min:-5.666 - critic/returns/mean:-0.000 - critic/returns/max:0.000 - critic/returns/min:-0.000 - critic/values/mean:-0.164 - critic/values/max:0.785 - critic/values/min:-1.000 - critic/vf_explained_var:-2803.085 - response_length/mean:239.112 - response_length/max:512.000 - response_length/min:11.000 - response_length/clip_ratio:0.029 - prompt_length/mean:148.670 - prompt_length/max:275.000 - prompt_length/min:106.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:18.608 - timing_s/old_log_prob:15.249 - timing_s/ref:14.[488](https://github.com/volcengine/verl/actions/runs/14220739954/job/39861249946#step:6:489) - timing_s/values:16.315 - timing_s/adv:0.264 - timing_s/update_critic:33.651 - timing_s/update_actor:33.472 - timing_s/testing:25.497 - timing_s/step:157.587 - timing_per_token_ms/adv:0.001 - timing_per_token_ms/gen:0.076 - timing_per_token_ms/update_actor:0.084 - timing_per_token_ms/values:0.041 - timing_per_token_ms/update_critic:0.085 - timing_per_token_ms/ref:0.036 - perf/total_num_tokens:397089.000 - perf/time_per_step:157.587 - perf/throughput:314.976 (TaskRunner pid=10086) list(reward_extra_infos_dict.keys())=[] (TaskRunner pid=10086) test_gen_batch meta info: {'eos_token_id': 32021, 'pad_token_id': 32014, 'recompute_log_prob': False, 'do_sample': False, 'validate': True} (WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered (WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (WorkerDict pid=18980) WARNING 04-02 16:49:38 model_runner_base.py:143] Traceback (most recent call last): File "/data00/tiger/huggingface/verl/verl/verl/trainer/main_ppo.py", line 54, in main run_ppo(config) File "/data00/tiger/huggingface/verl/verl/verl/trainer/main_ppo.py", line 72, in run_ppo ray.get(runner.run.remote(config)) File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(args, *kwargs) File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 2667, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/root/miniconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 864, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(RuntimeError): ray::TaskRunner.run() (pid=10086, ip=172.20.0.2, actor_id=11bc451866f5759f3a7f540[501](https://github.com/volcengine/verl/actions/runs/14220739954/job/39861249946#step:6:502)000000, repr=<main_ppo.TaskRunner object at 0x7fd00c61a110>) File "/data00/tiger/huggingface/verl/verl/verl/trainer/main_ppo.py", line 184, in run trainer.fit() File "/data00/tiger/huggingface/verl/verl/verl/trainer/ppo/ray_trainer.py", line 950, in fit val_metrics: dict = self._validate() File "/data00/tiger/huggingface/verl/verl/verl/trainer/ppo/ray_trainer.py", line 545, in _validate test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded) File "/data00/tiger/huggingface/verl/verl/verl/single_controller/ray/base.py", line 42, in func output = ray.get(output) ray.exceptions.RayTaskError(RuntimeError): ray::WorkerDict.actor_rollout_generate_sequences() (pid=18980, ip=172.20.0.2, actor_id=4f21075809bd462a5907ebea01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7fc62ae1ce20>) File "/root/miniconda3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1708, in execute_model output: SamplerOutput = self.model.sample( File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 571, in sample next_tokens = self.sampler(logits, sampling_metadata) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 231, in forward self._init_sampling_tensors(logits, sampling_metadata) File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 195, in _init_sampling_tensors do_min_p) = SamplingTensors.from_sampling_metadata( File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 471, in from_sampling_metadata sampling_tensors = SamplingTensors.from_lists( File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 529, in from_lists temperatures_t = torch.tensor( RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ```	2025-04-03 07:38:21 +08:00
Yuyang Ding	233c11c173	[recipe] r1: support R1 Benchmark Evaluation (#777 ) https://github.com/volcengine/verl/issues/708 Support Evaluaton: - [x] GPQA Diamond (english) - [x] LiveCodeBench (code) - [x] AIME 2024 (math) - [x] CNMO 2024 (math) Test - [x] DS-R1-Distill-Qwen2.5-1.5B - [x] DS-R1 --- Example Eval Scripts in `recipes/r1/run_r1_distill_qwen.sh` --- Eval Results of DS-R1-Distill-Qwen2.5-1.5B (k=8) Dataset \| Test Results \| Reported -- \| -- \| -- GPQA Diamond \| 35.3 \| 33.8 LiveCodeBench \| 16.9 \| 16.9 AIME 2024 \| 30.4 \| 28.9 CNMO 2024 (en) \| 45.1 \| - CNMO 2024 (zh) \| 41.0 \| - --- Eval Results (DS-R1) Dataset \| Test Results (k=1) \| Test Results (k=4) \| Reported -- \| -- \| -- \| -- GPQA Diamond \| 67.7 \| 69.6 \| 71.5 LiveCodeBench \| 64.7 \| 63.1 \| 65.9 AIME 2024 \| 86.7 \| 79.2 \| 79.8 CNMO 2024 \| 75.0 \| 78.5 \| 78.8 The final eval results will be placed [here](https://huggingface.co/datasets/dyyyyyyyy/r1-benchmark-eval).	2025-04-02 10:02:15 -07:00
Yan Bai	b6cd6b759e	Use Mcore GPTModel (#706 ) Use official GPTModel in megatron worker, supporting actor and critic workers.	2025-04-02 15:19:28 +08:00
BearBiscuit	4fec38c5b3	[fix] add dpsk v3 type in config for mfu compute (#872 ) The MFU has been tested with the relevant model and can be calculated normally.	2025-04-02 11:45:45 +08:00
none0663	6272b8ce1d	Implement Dual-Clip PPO Algorithm (#784 ) Add the [Dual-Clip PPO](https://arxiv.org/pdf/1912.09729) algorithm to enhance the current PPO implementations. The Dual-Clip PPO introduces a approach by applying a lower bound to the policy ratio when the advantage is less than zero, when multiplied by a huge raito, does not exceed a specified lower bound. The concept is illustrated in the figure below: <img width="626" alt="Clipboard_Screenshot_1743047374" src="https://github.com/user-attachments/assets/93952edc-30c8-477e-bc3d-4770fabe55b8" /> So, the finall loss of the ppo is <img width="624" alt="Clipboard_Screenshot_1743047410" src="https://github.com/user-attachments/assets/5900490b-f64a-4bde-87d6-8359615b3337" /> This adjustment leads to a modified final loss calculation for the PPO, which could potentially improve training stability and performance in certain scenarios. I believe integrating this feature could provide significant benefits, and I look forward to feedback on this suggestion.	2025-04-02 10:13:22 +08:00
BearBiscuit	05bdeadc3d	[misc] add trust_remote_code param for loading custom tokenizer (#865 )	2025-04-02 07:49:17 +08:00
Jiacheng Lin	45e02c88e0	Docs: add Rec-R1 in Readme (#869 )	2025-04-01 15:19:39 -07:00
BearBiscuit	437e96bc02	[sglang] doc: Update the SGLang installation instructions to the latest version. (#867 )	2025-04-01 09:47:06 -07:00
Shawn/Yuxuan Tong	9dcac14f1b	[critic] fix: normalize mini batch size for critic (#853 ) To keep consistent with `816dacc7da/verl/workers/fsdp_workers.py (L117)`	2025-04-01 19:20:05 +08:00
HL	776b0a9ddc	docs: improve installation and ulysses docs (#854 )	2025-04-01 10:37:35 +08:00
Lumeng Wu	072fc9feed	feat: support no reference model; fix KL issues (#644 ) ### Before get started Difference between KL penalty in reward and KL loss > [!TIP] > > 1. In-reward KL penalty > > > $$ > r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log \frac{\pi_{\theta}(o_t \| q, o_{<t})}{\pi_{\text{ref}}(o_t \| q, o_{<t})}} > $$ > > 2. KL Loss > > $$ > L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t, \text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ] > $$ > > $$ > \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} \|\| \pi_{\text{ref}})} > $$ ### Problems 1. The current code doesn't support not using reference model This feature is half-implemented since the very first commit but never completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy` but it's always True since role_worker_mapping always has `Role.RefPolicy`. 2. Restriction of `use_kl_loss` Currently, `use_kl_loss` determines whether to use in-reward kl penalty or kl loss. So we can not use both or neither. `87a813658f/verl/trainer/ppo/ray_trainer.py (L875-L879)` `87a813658f/verl/workers/actor/dp_actor.py (L299-L307)` > [!CAUTION] > > ### You may have unintentionally adopted in-reward KL penalty > > For the experiments you've conducted, if you set `actor.use_kl_loss`=False or didn't set it (Default is False),*You unintentionally used in-reward KL penalty.* If you don't want any KL, you should set `actor_rollout_ref.actor.use_kl_loss=False` and `algorithm.use_kl_in_reward=False` (or not to set them because they are the default config) after this commit. 3. Deprecated config After investigation, I guess Critic may used to be responsible for in-reward KL. But this feature seems paralyzed. 1. Line 290, there may used to be `config.algorithm.kl_ctrl.target_kl` and `config.critic.kl_ctrl.horizon` , which are not supported currently. `3ec83117c3/verl/trainer/ppo/ray_trainer.py (L289-L293)` 2. In `verl/workers/critic/megatron_critic.py` : redundant set of `self.kl_ctrl` `3b18b0eb74/verl/workers/critic/megatron_critic.py (L69-L73)` ### What’s Changed? 1. Add support for not using reference model 2. Fixed the incomplete code of the KL controller. 3. A test case for using both kl terms 4. Some other misc issues in the code. ### How to disable reference model * set `actor_rollout_ref.actor.use_kl_loss=False` and `algorithm.use_kl_in_reward=False` (They are by default False, so you can simply not set them)	2025-04-01 10:14:38 +08:00
Joel	c0621e1bcd	[ulysses] fix: repeat kv heads by sp_size//nheads_k if nheads_k is less than sp_size (#850 )	2025-03-31 16:25:53 -07:00
HL	77babf1956	[BREAKING] feat: support custom datasets for SFT trainer (#832 ) This PR breaks the SFTDataset interface, but provides more flexibility on dataset type and arguments passed in. Usage: ``` --data.custom_cls.path=/path/to/dataset.py --data.custom_cls.name=MyDataset ```	2025-04-01 05:36:33 +08:00
Changlong Yu	d5fbf42b67	[doc] add log_val_generations in trainer (#844 )	2025-03-31 12:22:40 -07:00
Shawn/Yuxuan Tong	816dacc7da	[doc] feat: doc for val_before_train (#840 )	2025-03-31 09:38:15 -07:00
BearBiscuit	1f78e8b09c	[fix] Add param to resolve custom model loading failure (#845 )	2025-03-31 19:25:25 +08:00
Shawn/Yuxuan Tong	a03a72a35a	[doc] fix: typo for REINFORCE (#846 )	2025-03-31 19:24:53 +08:00
BearBiscuit	7646e08fca	[example] rollout: add vllm 0.8.2 mutli nodes generation bash (#838 )	2025-03-30 23:07:42 -07:00
Shawn/Yuxuan Tong	64bddb68f5	[BREAKING config] fix: move val_before_train to config yaml. Using trainer.val_before_train instead of +trainer.val_before_train going forward (#820 )	2025-03-30 23:05:48 -07:00
Changlong Yu	7fbf609197	[BREAKING config] feat: add mlflow val generation log and uri config (#822 ) ### Changes - Add mlflow validation generation in `ValidationGenerationsLogger` in the form of MLFlow artifact files. - Add the config of `MLFLOW_TRACKING_URI` in mlflow tracking. - rename `val_generations_to_log_to_wandb` to `log_val_generations` ### Test Tested in the self-host mlflow servers.	2025-03-30 08:44:36 -07:00
Jie Cheng	0e99caa2b3	docs: add PURE to README.md (#826 ) add our work, [PURE](https://tungsten-ink-510.notion.site/Stop-Gamma-Decay-Min-Form-Credit-Assignment-Is-All-Process-Reward-Model-Needs-for-Reasoning-19fcb6ed0184804eb07fd310b38af155?pvs=4), to the "Awesome work using verl" section in README	2025-03-30 15:38:10 +00:00
Xiang Long	5138a22c66	[sglang] fix: add memory saver support to sglang rollout to avoid OOMs (#756 ) as title --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>	2025-03-30 08:36:16 -07:00
Blue Space	ccab83654c	Megatron checkpoint default not save hf_models, and provide model merge tool. (#780 ) Because CI is too slow, combine the features and functions of checkpoint here in 1 PR. # Add Layer idx to decode layers But it seems to be hard to attach a "correct" layer number to each layer, now verl implemented megatron each pp and vpp rank's layers start from index 0, leading to some inconvenience for merging tool. The difficulty mainly comes from `torch.nn.ModuleList` implementation, [it suggests and forces to directly use index rather than custom layer number](`8a40fca9a1/torch/nn/modules/container.py (L302C5-L324C66)`). Current solution is that we modify the layer number to actual number starts from pp and vpp offset when saving megatron checkpoint, and recover when loading. When use merging tool, there is no need for extra scans. # Huggingface Model loader logic simplified Since every rank can have access to state_dict, there is actually no need to broadcast the weights among mp and dp groups at all, and all from rank 0. The implementation before is too costly and may cause OOM issue because each rank can take up whole model space in GPU. And the loader logic is not straight-forward, since everyone only need to load its vpp_size number of layers, why iterate over whole num_layers. So current solution is every rank load itself's sharded weights from `state_dict`. But this requires users having storage nodes available to connect with every calculation nodes. For those who can only use rank 0 to store huggingface model, we move original implementation to deperacated besides new version of file. # Modify test scripts to reuse downloaded huggingface model Avoid errors when connecting with huggingface to access metadata. # Modify CI workflows to enable load-balance of CI machines Currently L20-0 takes up 6 more jobs than L20-1, try reduce the pipeline bubble of each task.	2025-03-30 10:39:40 +08:00
dingyuan	797f9994b7	Fix typo on installation guide (#813 ) Modify the version number of Megatron-llm from ``core_v0.11.0`` to ``core_r0.11.0``	2025-03-29 17:27:10 +08:00
BearBiscuit	0cf4ca4757	[misc] add deepseek v3 flops compute func (#814 )	2025-03-29 17:26:41 +08:00
Mingjie LIU	f3913d0014	[megatron] fix: remove redundant return value for hf_config (#722 )	2025-03-28 21:53:54 -07:00
Blue Space	50cba4aab9	docs: update checkpoint doc (#800 ) Also fix some APIs.	2025-03-28 21:27:01 -07:00
frederrx	4f32b32c99	ci/cd: add pylint to CI (#811 ) * add a workflow to run pylint * add a section to `pyproject.toml` that blacklists all rules which would trigger given the current code * pin a version of pylint in `requirements.txt` for reproducability In a followup PR I will remove some rules from the blacklist and fix some bugs.	2025-03-28 14:59:38 -07:00
Jac Zhao	093e9599dd	[trainer] fix: skip the update step when encountering gradient overflow (#789 ) due to issues such as mixed precision updates or corrupted data, model training may crash. to prevent abnormal updates, you can check grad_norm when updating the model, which might be a temporarily effective solution. however, if similar issues occur frequently, it is necessary to further investigate the data and loss design for a more thorough troubleshooting cover: #637 #747 #751	2025-03-28 09:48:20 -07:00
GeLee	52e80fc143	Fix padding length for sglang rollout in veRL (#773 ) Fixed a portion of the issues encountered during VLM GPTO training as mentioned in the article. https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/veRL-VLM.md When do_sample=False, different models under DP output sequences of inconsistent lengths, which may be padded to different lengths, ultimately causing shape inconsistencies during output and resulting in errors in collect_dp_compute_data_proto. The following situation occurred: ``` DataProto(batch=TensorDict( fields={ attention_mask: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False), input_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False), position_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False), prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False), responses: Tensor(shape=torch.Size([151, 4096]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([151]), device=cpu, is_shared=False), non_tensor_batch={}, meta_info={}), DataProto(batch=TensorDict( fields={ attention_mask: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False), input_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False), position_ids: Tensor(shape=torch.Size([151, 5120]), device=cpu, dtype=torch.int64, is_shared=False), prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False), responses: Tensor(shape=torch.Size([151, 4096]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([151]), device=cpu, is_shared=False), non_tensor_batch={}, meta_info={}), DataProto(batch=TensorDict( fields={ attention_mask: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False), input_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False), position_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False), prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False), responses: Tensor(shape=torch.Size([151, 2048]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([151]), device=cpu, is_shared=False), non_tensor_batch={}, meta_info={}), DataProto(batch=TensorDict( fields={ attention_mask: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False), input_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False), position_ids: Tensor(shape=torch.Size([151, 3072]), device=cpu, dtype=torch.int64, is_shared=False), prompts: Tensor(shape=torch.Size([151, 1024]), device=cpu, dtype=torch.int64, is_shared=False), responses: Tensor(shape=torch.Size([151, 2048]), device=cpu, dtype=torch.int64, is_shared=False)}, batch_size=torch.Size([151]), device=cpu, is_shared=False), non_tensor_batch={}, meta_info={})] ``` This modification resolves this issue. --------- Co-authored-by: GeLee-Q <8650386969@qq.com>	2025-03-28 22:28:02 +08:00
PzySeere	36a0f06d8a	Update README.md (#797 ) For bolding some key words in description of MetaSpatial.	2025-03-28 15:18:09 +08:00

1106 changed files with 130713 additions and 32788 deletions

									
										10

.gemini/config.yaml
									
										Normal file
									
												View File
												
				@ -0,0 +1,10 @@

				have_fun: false

				code_review:

				  disable: false

				  comment_severity_threshold: HIGH

				  max_review_comments: -1

				  pull_request_opened:

				    help: false

				    summary: false

				    code_review: true

				ignore_patterns: []

30

.github/CODEOWNERS vendored Normal file

View File

 @ -0,0 +1,30 @@
 /docs @eric-haibin-lin @zhaochenyang20 @hongpeng-guo
 /docs/amd_tutorial @yushengsu-thu
 /docs/slang_multiturn @zhaochenyang20 @SwordFaith
 /docs/ascend_tutorial @FightingZhen
 /recipe/dapo @tongyx361 @PeterSH6 @vermouth1992 @tardis-key @FightingZhen @ji-huazhong
 /recipe/spin @zhaochenyang20
 /recipe/sppo @zhaochenyang20
 /third_party/sglang @zhaochenyang20 @SwordFaith
 /third_party/vllm @PeterSH6 @wuxibin89
 /examples/grpo_trainer @vermouth1992 @PeterSH6 @tardis-key @FightingZhen @ji-huazhong
 /verl/single_controller @zw0610 @wuxibin89 @hongpeng-guo
 /verl/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
 /verl/models/mcore @ISEEKYAN @vermouth1992
 /verl/models/transformers @vermouth1992 @PeterSH6 @tardis-key @FightingZhen @ji-huazhong
 /verl/workers/engine @eric-haibin-lin @vermouth1992 @ZihengJiang
 /verl/workers/roles @eric-haibin-lin @vermouth1992 @ZihengJiang
 /verl/workers/engine/fsdp @eric-haibin-lin @vermouth1992 @ZihengJiang
 /verl/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
 /verl/workers/rollout/sglang_rollout @zhaochenyang20 @SwordFaith @chenhaiq
 /verl/workers/actor/megatron_actor.py @ISEEKYAN @vermouth1992
 /verl/workers/critic/megatron_critic.py @ISEEKYAN @vermouth1992
 /verl/workers/megatron_workers.py @ISEEKYAN @vermouth1992
 /tests/single_controller @zw0610 @wuxibin89
 /tests/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
 /tests/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq

									
										65

.github/ISSUE_TEMPLATE/bug-report.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,65 @@

				# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/bug-report.yml?plain=1

				name: "\U0001F41B Bug Report"

				description: Submit a bug report to help us improve verl

				labels: [ "bug" ]

				body:

				  - type: markdown

				    attributes:

				      value: |

				        Thanks for taking the time to fill out this bug report! 🤗

				  - type: textarea

				    id: system-info

				    attributes:

				      label: System Info

				      description: Please share your system info with us. You can run the command `python scripts/diagnose.py` and copy-paste its output below.

				      placeholder: verl version, platform, python version, ...

				    validations:

				      required: true

				  - type: checkboxes

				    id: information-scripts-examples

				    attributes:

				      label: Information

				      description: 'The problem arises when using:'

				      options:

				        - label: "The official example scripts"

				        - label: "My own modified scripts"

				  - type: checkboxes

				    id: information-tasks

				    attributes:

				      label: Tasks

				      description: "The tasks I am working on are:"

				      options:

				        - label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"

				        - label: "My own task or dataset (give details below)"

				  - type: textarea

				    id: reproduction

				    validations:

				      required: true

				    attributes:

				      label: Reproduction

				      description: |

				        Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.

				        Please include relevant config information with your code.

				        If you have code snippets, error messages, stack traces please provide them here as well.

				        Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting

				        Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.

				      placeholder: |

				        Steps to reproduce the behavior:

				          1.

				          2.

				          3.

				  - type: textarea

				    id: expected-behavior

				    validations:

				      required: true

				    attributes:

				      label: Expected behavior

				      description: "A clear and concise description of what you would expect to happen."

									
										2

.github/ISSUE_TEMPLATE/config.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				blank_issues_enabled: true

				version: 0.1

									
										32

.github/ISSUE_TEMPLATE/feature-request.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,32 @@

				# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/feature-request.yml?plain=1

				name: "\U0001F680 Feature request"

				description: Submit a proposal/request for a new verl feature

				labels: [ "Feature request" ]

				body:

				  - type: textarea

				    id: feature-request

				    validations:

				      required: true

				    attributes:

				      label: Feature request

				      description: |

				        A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.

				  - type: textarea

				    id: motivation

				    validations:

				      required: true

				    attributes:

				      label: Motivation

				      description: |

				        Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.

				  - type: textarea

				    id: contribution

				    validations:

				      required: true

				    attributes:

				      label: Your contribution

				      description: |

				        Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)

									
										40

.github/PULL_REQUEST_TEMPLATE.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,40 @@

				### What does this PR do?

				> Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

				### Checklist Before Starting

				- [ ] Search for similar PRs. Paste at least one query link here: ...

				- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)

				  - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`

				  - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`

				  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`

				  - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.

				  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

				### Test

				> For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

				### API and Usage Example

				> Demonstrate how the API changes if any, and provide usage example(s) if possible.

				```python

				# Add code snippet or script demonstrating how to use this

				```

				### Design & Code Changes

				> Demonstrate the high-level design if this PR is complex, and list the specific changes.

				### Checklist Before Submitting

				> [!IMPORTANT]

				> Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

				- [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).

				- [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`

				- [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs).

				- [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ...

				- [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

									
										147

.github/workflows/.deprecate/e2e_eval_aime24.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,147 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_eval_aime24

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!*.md"

				      - "!docker/**"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      - "!recipe/**"

				      - "recipe/r1"

				      - "!recipe/r1/README.md"

				  pull_request:

				    branches:

				      - main

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!*.md"

				      - "!docker/**"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Home

				      - "recipe/r1"

				      - "!recipe/r1/README.md"

				      # Other recipes

				      - "!recipe/**"

				      # Entrypoints

				      - ".github/workflows/e2e_eval_aime24.yml"

				      - "tests/special_e2e/run_r1_distill_qwen_aime24_eval.sh"

				      - "verl/trainer/main_generation.py"

				      - "verl/trainer/config/generation.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1 

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_eval_aime24:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu,math]

				          pip3 install math-verify transformers==4.56.2

				      - name: Prepare aime24 dataset

				        run: |

				          ray stop --force

				          python3 recipe/r1/data_process.py --task aime2024

				      - name: Running generation and evaluation in AIME 2024

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_r1_distill_qwen_aime24_eval.sh

				  cleanup:

				      runs-on: ubuntu-latest

				      needs: [setup, e2e_eval_aime24]

				      if: always()

				      steps:

				        - id: destroy-runner

				          uses: volcengine/vemlp-github-runner@v1

				          with:

				            mode: "destroy"

				            faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				            mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										133

.github/workflows/.deprecate/e2e_ppo_trainer.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,133 @@

				name: e2e_ppo_trainer_deprecate

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - disabled_ci

				  pull_request:

				    branches:

				      - disabled_ci

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!**/*.md"

				      - "!docker/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Docs

				      - "!docs/**"

				      # Recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Entrypoints

				      - ".github/workflows/e2e_ppo_trainer.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/ppo_trainer"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				jobs:

				  pre_commit_for_ppo:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        python-version: ["3.12"]

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Set up Python ${{ matrix.python-version }}

				        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Install the current repository

				        run: |

				          pip install -e .

				      - name: Set ruff --output-format=github

				        run: |

				          sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml

				          git add .pre-commit-config.yaml

				      - uses: pre-commit/action@v3.0.1

				        with:

				          extra_args: "" # Overriding default "--all-files"

				  e2e_ppo_trainer_sglang_multiturn_with_tool:

				    runs-on: [L20x8]

				    needs: pre_commit_for_ppo

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    container:

				      image: verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test,gpu,sglang]

				      - name: Prepare gsm8k dataset with tool

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k_multiturn_w_tool.py --local_save_dir $HOME/data/gsm8k_verl_sgl_multi_turn_preprocessed

				      - name: Running GSM8K with tool E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt with sglang

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh

				      - name: Running GSM8K with tool E2E training tests with FSDP2

				        run: |

				          ray stop --force

				          FSDP_STRATEGY=fsdp2 bash tests/special_e2e/run_gsm8k_fsdp_sgl_multiturn_w_tool.sh

				  e2e_ppo_trainer_sglang_vlm_multiturn_with_tool:

				    runs-on: [L20x8]

				    needs: pre_commit_for_ppo

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    container:

				      image: verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test,geo,gpu,sglang]

				      - name: Prepare geo3k dataset with tool

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/geo3k_multiturn_w_tool.py --local_dir $HOME/data/geo3k_verl_sgl_multi_turn_preprocessed

				      - name: Running GEO3K with tool E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt with sglang

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_geo3k_fsdp_sgl_multiturn_w_tool.sh

				      - name: Running GEO3K with tool E2E training tests with FSDP2

				        run: |

				          ray stop --force

				          FSDP_STRATEGY=fsdp2 bash tests/special_e2e/run_geo3k_fsdp_sgl_multiturn_w_tool.sh

									
										155

.github/workflows/.deprecate/e2e_ppo_trainer_megatron_sglang.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,155 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_ppo_trainer_megatron_sglang_deprecate

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch.

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - disabled_ci

				  pull_request:

				    branches:

				      - disabled_ci

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!docker/**"

				      # Docs

				      - "!**/*.md"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Entrypoints

				      - ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/run_ppo_trainer_megatron.sh"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_megatron_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_ppo_trainer_megatron-qwen3:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) with validation and saving

				        run: |

				          ray stop --force

				          ENGINE=sglang ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler

				        run: |

				          ray stop --force

				          ENGINE=sglang LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Test Megatron checkpoints merging function (Qwen3 Actor and Critic)

				        run: |

				          exp_name="qwen3-0.6b-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_ppo_trainer_megatron-deepseek,

				        e2e_ppo_trainer_megatron-qwen3,

				        e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,

				        e2e_ppo_trainer_megatron-qwen-override-transformer-config,

				        e2e_ppo_trainer_megatron-deepseek-override-transformer-config,

				        e2e_ppo_trainer_megatron-moe-expert-parallel,

				        e2e_ppo_trainer_megatron-qwen2_5vl-3b,

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										66

.github/workflows/.deprecate/e2e_prime.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,66 @@

				name: e2e_prime_deprecate

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - disabled_ci

				  pull_request:

				    branches:

				      - disabled_ci

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/prime"

				      # Entrypoints

				      - ".github/workflows/e2e_prime.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_prime.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				jobs:

				  e2e_prime:

				    runs-on: [L20x8]

				    timeout-minutes: 50 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    container:

				      image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu]

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running GSM8K E2E with prime alg

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_prime.sh

									
										119

.github/workflows/.deprecate/e2e_spin.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,119 @@

				name: e2e_spin

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/spin"

				      # Entrypoints

				      - ".github/workflows/e2e_spin.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_spin.sh"

				      - "!examples"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/spin"

				      # Entrypoints

				      - ".github/workflows/e2e_spin.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_spin.sh"

				      - "!examples"

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_spin:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test,gpu,sglang]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running the E2E test with the spin algorithm

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_spin.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_spin

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										118

.github/workflows/.deprecate/e2e_sppo.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,118 @@

				name: e2e_sppo

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/sppo"

				      # Entrypoints

				      - ".github/workflows/e2e_sppo.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_sppo.sh"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/sppo"

				      # Entrypoints

				      - ".github/workflows/e2e_sppo.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_sppo.sh"

				# Declare permissions just read content.

				permissions:

				  contents: read

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				  TRANSFORMERS_VERSION: "4.56.2"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_sppo:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test,gpu,sglang]

				      - name: Prepare MATH dataset

				        run: |

				          python3 examples/data_preprocess/math_dataset.py --local_dataset_path $HOME/models/hf_data/DigitalLearningGmbH/MATH-lighteval

				      - name: Running the E2E test with the SPPO algorithm

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_sppo.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_sppo

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										73

.github/workflows/README.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,73 @@

				### Adding a New Workflow

				When adding a new workflow for continuous integration (CI), you have two runner options: a fixed runner or a machine from the vemlp.

				- **Fixed Runner**: To use a fixed runner, specify it in your workflow using the `runs-on` keyword, like `runs-on: [L20x8]`. 

				- **Vemlp Runner**: Opting for a Vemlp machine allows you to launch tasks elastically. 

				Here is a template to assist you. This template is designed for using Vemlp machines. Currently, for each workflow, you need to create a `setup` and a `cleanup` job. When using this template, the main parts you need to modify are the `IMAGE` environment variable and the specific `job steps`.

				```yaml

				name: Your Default Workflow

				on:

				  push:

				    branches:

				      - main

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - ".github/workflows/template.yml"

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				permissions:

				  contents: read

				env:

				  IMAGE: "your vemlp image" # e.g. "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.2"

				  DYNAMIC_RUNNER_URL: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner" # public veFaas api

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      task-id: ${{ steps.create-runner.outputs.task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1 

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"

				          image: "${{ env.DEFAULT_IMAGE }}"

				  your_job:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'default-runner' }}"]

				    steps:

				      xxxx # your jobs

				  cleanup:

				    runs-on: ubuntu-latest

				    needs: [setup, your_job]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"

				          task-id: "${{ needs.setup.outputs.task-id }}"

				```

				### Model and Dataset

				To avoid CI relies on network, we pre-download dataset on a NFS on the CI machine. The path for models are \${HOME}/models and the path for dataset is \${HOME}/models/hf_data.

									
										58

.github/workflows/check-pr-title.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,58 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				on:

				  pull_request:

				    types: [opened, edited, synchronize]

				jobs:

				  check-title:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@v4

				      - name: Set up Python

				        uses: actions/setup-python@v5

				        with:

				          python-version: '3.11'

				      - name: Run PR title checker

				        run: python3 tests/special_sanity/check_pr_title.py

				        env:

				          PR_TITLE: ${{ github.event.pull_request.title }}

				      - name: Run PR description checker

				        run: python3 tests/special_sanity/check_pr_description.py

				        env:

				          PR_TITLE: ${{ github.event.pull_request.title }}

				          GITHUB_EVENT_PATH: ${{ github.event_path }}

									
										175

.github/workflows/checkpoint_converter.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,175 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: checkpoint_converter

				# latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Entrypoints

				      - ".github/workflows/checkpoint_converter.yml"

				      - ".github/workflows/e2e_ppo_trainer_megatron.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_ppo_trainer_megatron.sh"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_megatron_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  checkpoint_converter:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test]

				#      - name: Download Model to Use

				#        run: |

				#          huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B

				#          huggingface-cli download deepseek-ai/deepseek-coder-1.3b-instruct --local-dir ${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct

				#          export HF_HUB_OFFLINE=1

				      - name: Running Huggingface to Megatron dist_ckpt converter (Qwen/Qwen2.5-0.5B)

				        run: |

				          ray stop --force

				          python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/Qwen/Qwen2.5-0.5B --test

				      - name: Running Huggingface to Megatron dist_ckpt converter (deepseek-ai/deepseek-coder-1.3b-instruct)

				        run: |

				          ray stop --force

				          python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/deepseek-ai/deepseek-coder-1.3b-instruct --output_path checkpoints/deepseek-ai/deepseek-coder-1.3b-instruct --test

				      - name: Clean up

				        run: |

				          rm -rf checkpoints

				  checkpoint_converter_large_moe_models:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 30 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				      HF_ENDPOINT: "https://hf-mirror.com"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test]

				#      - name: Download Model to Use

				#        run: |

				#          huggingface-cli download Qwen/Qwen1.5-MoE-A2.7B-Chat --local-dir ${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat

				#          export HF_HUB_OFFLINE=1

				      - name: Running Huggingface to Megatron dist_ckpt CPU converter (Qwen/Qwen1.5-MoE-A2.7B-Chat)

				        run: |

				          ray stop --force

				          python scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat --use_cpu_initialization

				      - name: Running distributed Huggingface to Megatron dist_ckpt CPU converter (Qwen/Qwen1.5-MoE-A2.7B-Chat)

				        run: |

				          ray stop --force

				          torchrun --nproc_per_node 8 --nnodes 1 scripts/converter_hf_to_mcore.py --hf_model_path=${HOME}/models/Qwen/Qwen1.5-MoE-A2.7B-Chat --output_path checkpoints/Qwen/Qwen1.5-MoE-A2.7B-Chat_dist --use_cpu_initialization

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        checkpoint_converter,

				        checkpoint_converter_large_moe_models

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										64

.github/workflows/checkpoints.yml
									
										vendored
									
												View File
											
				@ -1,64 +0,0 @@

				name: checkpoints

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/checkpoints.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/checkpoints.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_gsm8k_megatron:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test]

				      - name: Prepare gsm8k dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running Checkpoint Integration Test (Qwen Megatron)

				        run: |

				          ray stop --force

				          export PYTHONPATH=$PYTHONPATH:/opt/nvidia/Megatron-LM

				          bash tests/checkpoint/run_qwen_megatron_ckpt.sh

				      - name: Running Checkpoint Integration Test (Deepseek Megatron)

				        run: |

				          ray stop --force

				          export PYTHONPATH=$PYTHONPATH:/opt/nvidia/Megatron-LM

				          bash tests/checkpoint/run_deepseek_megatron_ckpt.sh

									
										89

.github/workflows/cpu_unit_tests.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,89 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: cpu_unit_tests

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - .github/workflows/cpu_unit_tests.yml

				      - "!recipe/**/*.py"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				jobs:

				  cpu_unit_tests:

				    if: github.repository_owner == 'volcengine'

				    runs-on: [L20x8]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    container:

				      image: verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip install -e .[test,prime,geo]

				          pip install --upgrade "ray>=2.40.0" pillow

				      - name: Download datasets

				        run: |

				          huggingface-cli download verl-team/gsm8k-v0.4.1 --repo-type dataset --local-dir ~/verl-data/gsm8k

				          python3 examples/data_preprocess/geo3k.py

				      - name: Running CPU unit tests

				        run: |

				          echo '[pytest]' > pytest.ini

				          echo 'python_files = *_on_cpu.py' >> pytest.ini

				          pytest -s -x --asyncio-mode=auto tests/

									
										61

.github/workflows/dataset.yml
									
										vendored
									
												View File
											
				@ -1,61 +0,0 @@

				name: dataset

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/dataset.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/dataset.yml

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  ray:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 10 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip install hf_transfer

				          pip install -e .[test]

				          pip install --upgrade "ray>=2.40.0"

				          pip install cupy-cuda12x

				      - name: Running dataset tests

				        run: |

				          [ ! -d "$HOME/verl-data" ] && git clone --depth 1 https://github.com/eric-haibin-lin/verl-data ~/verl-data

				          pytest -s -x tests/verl/utils/dataset/test_rl_dataset.py

				          pytest -s -x tests/verl/utils/dataset/test_sft_dataset.py

				#          pytest -s -x tests/verl/utils/dataset/test_rm_dataset.py

				      - name: Running ray test using cupy (move it to L20 when dockerfile ready)

				        run: |

				          cd tests/ray

				          pytest -s -x test_rvdz.py

									
										100

.github/workflows/doc.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,100 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: doc_test

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - "docs/**"

				      - .github/workflows/doc.yml

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read      # for checkout

				  pages: write        # for deploy-pages

				  id-token: write     # for deploy-pages

				jobs:

				  doc_test:

				    runs-on: ubuntu-latest

				    timeout-minutes: 5 # Increase this timeout value as needed

				    strategy:

				      matrix:

				        python-version: ["3.10"]

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Set up Python ${{ matrix.python-version }}

				        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Install the current repository

				        run: |

				          pip install -e .[test] --no-deps

				          pip install -r docs/requirements-docs.txt

				      - name: Run doc make html

				        run: |

				          cd docs 

				          make clean

				          make html SPHINXOPTS="--keep-going -w _build/sphinx.log"

				          if grep -q ": ERROR:" _build/sphinx.log; then

				            echo "🚨 Sphinx doc build contained ERRORs - see _build/sphinx.log"

				            exit 1

				          fi

				          if grep -q "WARNING: document isn't included in any toctree" _build/sphinx.log; then

				            echo "🚨 Sphinx doc build contained WARNING. Please include newly added docs in index.rst. See _build/sphinx.log for details"

				            exit 1

				          fi

				          if grep -q "WARNING: Inline emphasis" _build/sphinx.log; then

				            echo "🚨 Sphinx doc build contained WARNING. Please check inline emphasis is correct. See _build/sphinx.log for details"

				            exit 1

				          fi

				          if grep -q "WARNING: Definition list ends without a blank line" _build/sphinx.log; then

				            echo "🚨 Sphinx doc build contained WARNING. Please check if the indentation is correct. See _build/sphinx.log for details"

				            exit 1

				          fi

									
										129

.github/workflows/e2e_ascend.yml
									
										vendored
									
												View File
												
				@ -1,3 +1,35 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_ascend

				on:

				@ -6,34 +38,47 @@ on:

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_ascend.yml

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - ".github/workflows/e2e_ascend.yml"

				      - "**/*.py"

				      - .github/workflows/e2e_ascend.yml

				      - "docs/ascend_tutorial/**"

				      - "examples/**"

				      - "recipe/**"

				      - "tests/special_npu/**"

				      - "tests/special_sanity/**"

				      - "verl/**"

				      - "pyproject.toml"

				      - "requirements-npu.txt"

				      - "setup.py"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				permissions:

				  contents: read

				jobs:

				  test:

				    if: github.repository_owner == 'volcengine'

				    name: verl Ascend test (self-host)

				    runs-on: [self-hosted, npu-0]

				    timeout-minutes: 5 # Increase this timeout value as needed

				    env:

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    timeout-minutes: 40 # Increase this timeout value as needed

				    container:

				      image: quay.io/ascend/cann:8.0.0-910b-ubuntu22.04-py3.10

				      image: crispig/verl_npu:cann8.1rc1-py3.10-torch2.5.1-vllm-ascend0.7.3.post1-mindspeed0121-250731

				      volumes:

				        - /usr/local/dcmi:/usr/local/dcmi

				        - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi

				        - /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/

				        - /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info

				        - /etc/ascend_install.info:/etc/ascend_install.info

				        - /data00/dataset:/github/home/dataset

				        - /data00/models:/github/home/models

				        # Use self-host cache speed up pip and model download

				        # - /home/action/actions-runner/_work/cache:/github/home/.cache/

				      options: >-

				@ -41,8 +86,15 @@ jobs:

				        --device /dev/davinci_manager

				        --device /dev/devmm_svm

				        --device /dev/hisi_hdc

				        --network host

				        --privileged

				        --network "host"

				        --shm-size 16g

				    env: 

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - name: Check npu and CANN info

				        run: |

				@ -50,6 +102,55 @@ jobs:

				          npu-smi info

				      - name: Checkout volcengine/verl repo

				        uses: actions/checkout@v4

				      - name: Run test

				      - name: Install the current repository

				        run: |

				          lscpu

				          pip3 install hf_transfer peft

				          pip3 install -r requirements-npu.txt

				          pip install -e .

				      - name: Install torchvision

				        run: |

				          pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu

				      - name: Uninstall Triton

				        run: |

				          pip uninstall -y triton

				      - name: Preprocess gsm8k dataset

				        run: |

				          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/dataset/openai/gsm8k

				      - name: Preprocess geo3k dataset

				        run: |

				          python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/dataset/hiyouga/geometry3k

				      - name: Running gsm8k e2e qwen3 training tests with PPO on ASCEND NPU

				        run: |

				          ray stop --force

				          bash tests/special_npu/run_qwen3_06b_ppo.sh

				          rm -rf $HOME/ckpts

				      - name: Running gsm8k e2e training tests with peft sft on ASCEND NPU

				        run: |

				          ray stop --force

				          bash tests/special_npu/run_qwen2_5_05b_sft_peft_sp2.sh

				          rm -rf $HOME/ckpts

				      - name: Running gsm8k e2e training tests with GRPO on ASCEND NPU

				        run: |

				          ray stop --force

				          bash tests/special_npu/run_qwen2_5_05b_grpo.sh

				          rm -rf $HOME/ckpts

				      - name: Running geo3k e2e training tests with GRPO on ASCEND NPU

				        run: |

				          ray stop --force

				          bash tests/special_npu/run_qwen2_5_vl_3b_npu.sh

				          rm -rf $HOME/ckpts

				      - name: Running gsm8k e2e training tests with DAPO on ASCEND NPU

				        run: |

				          ray stop --force

				          bash tests/special_npu/run_qwen2_5_05b_dapo.sh

				          rm -rf $HOME/ckpts

				      - name: Running gsm8k e2e training tests with GRPO MindSpeed on ASCEND NPU

				        run: |

				          ray stop --force

				          USE_DIST_CKPT=True bash tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh

				          rm -rf $HOME/dist_ckpt/qwen2_5_05b_grpo_mindspeed

				          rm -rf $HOME/ckpts

				      - name: Running NPU profiling unit tests

				        run: |

				          ray stop --force

				          pytest -s -x tests/utils/test_special_mstx_profile.py

									
										145

.github/workflows/e2e_dapo.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,145 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_dapo

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "verl/*.py"

				      # Other entrypoints

				      - "!examples/*trainer*"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      - "!recipe/**"

				      - "recipe/dapo"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/dapo"

				      # Entrypoints

				      - ".github/workflows/e2e_dapo.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_dapo.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_dapo:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running the E2E test with the DAPO algorithm

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_dapo.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_dapo

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										55

.github/workflows/e2e_digit_completion.yml
									
										vendored
									
												View File
											
				@ -1,55 +0,0 @@

				name: e2e_digit_completion

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_digit_completion.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/e2e_digit_completion.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_digit_completion:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test]

				      - name: Running digit completon e2e training tests on 8 L20 GPUs

				        run: |

				          ray stop --force

				          bash tests/e2e/run_ray_trainer.sh

									
										47

.github/workflows/e2e_digit_completion_fire.yml
									
										vendored
									
												View File
											
				@ -1,47 +0,0 @@

				name: e2e_digit_completion_fire

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_digit_completion_fire.yml

				  pull_request:

				    branches:

				      - main

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_digit_completion_fire.yml

				      - "tests/e2e/*.sh"

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_digit_completion:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test]

				      - name: Running digit completon e2e training tests on 8 L20 GPUs

				        run: |

				          ray stop --force

				          bash tests/e2e/run_ray_trainer_fire_sampling.sh

									
										141

.github/workflows/e2e_genrm_remote.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,141 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_genrm_remote

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - "tests/**"

				      - "!recipe/**"

				      - "recipe/genrm_remote"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Home

				      - "recipe/genrm_remote"

				      - "!recipe/genrm_remote/README.md"

				      # Entrypoints

				      - ".github/workflows/e2e_genrm_remote.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_genrm_remote.sh"

				      - "tests/special_e2e/generation/run_gen_qwen05_server.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_genrm_remote:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running the E2E test with the Generative Reward Model

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_genrm_remote.sh

				          ray stop --force

				          bash tests/special_e2e/generation/run_gen_qwen05_server.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_genrm_remote

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										70

.github/workflows/e2e_grpo.yml
									
										vendored
									
												View File
											
				@ -1,70 +0,0 @@

				name: e2e_grpo

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_grpo.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/e2e_grpo.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_gsm8k_megatron:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test]

				      - name: Prepare gsm8k dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running GRPO gsm8k e2e training tests with FSDP on 8 L20 GPUs (Deepseek)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_deepseek_grpo.sh

				      - name: Running GRPO gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_deepseek_grpo_megatron.sh

				      - name: Running GRPO gsm8k e2e training tests with FSDP on 8 L20 GPUs (Qwen)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_grpo.sh

				      - name: Running GRPO gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_grpo_megatron.sh

									
										97

.github/workflows/e2e_gsm8k.yml
									
										vendored
									
												View File
											
				@ -1,97 +0,0 @@

				name: e2e_gsm8k

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_gsm8k.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/e2e_gsm8k.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_gsm8k:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test,gpu]

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm and save ckpt

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_function_rm.sh

				      - name: Running gsm8k e2e without rmpad using function rm and load ckpt from previous step

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh

				          rm -rf ~/ckpt/*

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm (GRPO)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_function_rm_grpo.sh

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm (ReMax)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_function_rm_remax.sh

				      - name: Running gsm8k e2e with rmpad using model rm

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_model_rm.sh

				      - name: Running gsm8k e2e without rmpad using model rm

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh

				      - name: Running gsm8k e2e with rmpad using model rm and ulysses sp=2

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_model_rm_ulysses.sh

				      - name: Running gsm8k e2e with rmpad using model rm and dynamic batch size

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_model_rm_seq_balance.sh

				      - name: Running gsm8k e2e with rmpad using model rm with Liger Kernel enabled

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using customized reward function

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_custom_function_rm.sh

									
										63

.github/workflows/e2e_gsm8k_megatron.yml
									
										vendored
									
												View File
											
				@ -1,63 +0,0 @@

				name: e2e_gsm8k_megatron

				# latest version: Megatron-LM core_r0.11.0 https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_gsm8k_megatron.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/e2e_gsm8k_megatron.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_gsm8k_megatron:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test]

				      - name: Prepare gsm8k dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_deepseek_megatron_parallelism.sh

				      - name: Running gsm8k e2e training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_megatron_parallelism.sh

									
										54

.github/workflows/e2e_gsm8k_prime.yml
									
										vendored
									
												View File
											
				@ -1,54 +0,0 @@

				name: e2e_gsm8k_prime

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_gsm8k_prime.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/e2e_gsm8k_prime.yml

				      - "tests/e2e/*.sh"

				# Declare permissions just read content.

				permissions:

				  contents: read

				jobs:

				  e2e_gsm8k:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test,gpu]

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running gsm8k e2e with prime alg

				        run: | 

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_prime.sh

									
										59

.github/workflows/e2e_lora.yml
									
										vendored
									
												View File
											
				@ -1,59 +0,0 @@

				name: e2e_lora

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_lora.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_lora.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_lora:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 5 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer peft

				          pip3 install -e .[test]

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running gsm8k e2e training tests with LoRA

				        run: |

				          ray stop --force

				          bash tests/sft/run_sft_qwen05_peft.sh 8 $HOME/ckpts/

				          rm -rf $HOME/ckpts/*

									
										178

.github/workflows/e2e_one_step_off_policy.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,178 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_one_step_off_policy

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - "!**/*.md"

				      - "!**/*.sh"

				      # Other entrypoints

				      - "!examples/*trainer*"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      - "!recipe/**"

				      - "recipe/one_step_off_policy"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - "!**/*.md"

				      - "!**/*.sh"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Other recipes

				      - "!recipe/**"

				      # Home

				      - "recipe/one_step_off_policy"

				      # Entrypoints

				      - ".github/workflows/e2e_one_step_off_policy.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/run_one_step_off_policy.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				  TRANSFORMERS_VERSION: "4.56.2"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  # Test FSDP2 strategy

				  e2e_one_step_off_policy_fsdp2:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 10 # Increase timeout for async training

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				      ACTOR_STRATEGY: "fsdp2"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu]

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running the E2E test with one_step_off_policy algorithm (FSDP2)

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_one_step_off_policy.sh

				  # Test Megatron strategy

				  e2e_one_step_off_policy_megatron:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 10 # Increase timeout for async training

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				      ACTOR_STRATEGY: "megatron"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu]

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running the E2E test with one_step_off_policy algorithm (Megatron)

				        run: |

				          ray stop --force

				          bash tests/special_e2e/run_one_step_off_policy.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_one_step_off_policy_fsdp2,

				        e2e_one_step_off_policy_megatron

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										79

.github/workflows/e2e_ppo_trainer.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,79 @@

				name: e2e_ppo_trainer

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!**/*.md"

				      - "!docker/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Docs

				      - "!docs/**"

				      # Recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Entrypoints

				      - ".github/workflows/e2e_ppo_trainer.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/ppo_trainer"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				jobs:

				  pre_commit_for_ppo:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        python-version: ["3.12"]

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Set up Python ${{ matrix.python-version }}

				        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Install the current repository

				        run: |

				          pip install -e .

				      - name: Set ruff --output-format=github

				        run: |

				          sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml

				          git add .pre-commit-config.yaml

				      - uses: pre-commit/action@v3.0.1

				        with:

				          extra_args: "" # Overriding default "--all-files"

									
										281

.github/workflows/e2e_ppo_trainer_megatron_sglang.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,281 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_ppo_trainer_megatron_sglang

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch.

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!docker/**"

				      # Docs

				      - "!**/*.md"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Entrypoints

				      - "verl/worksers/rollout/sglang_rollout/*"

				      - ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/run_ppo_trainer_megatron.sh"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_megatron_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_ppo_trainer_megatron-deepseek:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          OPTIM_MEMORY_EFFICIENT=True ENGINE=sglang SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          export VLLM_USE_V1=1

				          ray start --head

				          ENGINE=sglang MODE=async RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)

				        run: |

				          exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: Profiling GRPO GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)

				        run: |

				          ray stop --force

				          PROFILE_ENABLE=True ENGINE=sglang ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh

				          if [ -z "$( ls -A '/tmp/ray/session_latest/logs/nsight/' )" ]; then

				            echo "[ERROR] not found any profiling files"

				            exit 1

				          else

				            echo "[SUCCESS] profile success"

				          fi

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with tie-embedding Megatron (Qwen) with train tp > infer tp

				        run: |

				          ray stop --force

				          ENGINE=sglang VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=2 INFER_TP=1 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen) with  train tp < infer tp

				        run: |

				          ray stop --force

				          ENGINE=sglang VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=1 INFER_TP=2 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-qwen-override-transformer-config:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				#      - name: Download Model to Use

				#        run: |

				#          huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B

				#          export HF_HUB_OFFLINE=1

				      - name: Prepare dist_ckpt of Qwen2.5-0.5B, uneven layer distribution only supports dist_ckpt

				        run: |

				          python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/verl-test/qwen2.5-0.5b-megatron

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)

				        run: |

				          ray stop --force

				          ENGINE=sglang SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=8 +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=4 actor_rollout_ref.actor.megatron.use_dist_checkpointing=true actor_rollout_ref.actor.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron actor_rollout_ref.ref.megatron.use_dist_checkpointing=true actor_rollout_ref.ref.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron critic.megatron.use_dist_checkpointing=true critic.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron reward_model.megatron.use_dist_checkpointing=true reward_model.megatron.dist_checkpointing_path=checkpoints/verl-test/qwen2.5-0.5b-megatron

				          cp -r checkpoints checkpoints-dut

				          ENGINE=sglang SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Test Megatron checkpoints merging function (Qwen Actor and Critic)

				        run: |

				          exp_name="qwen2.5-0.5b-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-deepseek-override-transformer-config:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          ENGINE=sglang SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=2 COMMON_VPP=null bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_embedding_in_pipeline_split=true +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_loss_in_pipeline_split=true

				      - name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)

				        run: |

				          exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_ppo_trainer_megatron-deepseek,

				        e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,

				        e2e_ppo_trainer_megatron-qwen-override-transformer-config,

				        e2e_ppo_trainer_megatron-deepseek-override-transformer-config,

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										275

.github/workflows/e2e_ppo_trainer_megatron_sglang_2.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,275 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_ppo_trainer_megatron_sglang_2

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch.

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!docker/**"

				      # Docs

				      - "!**/*.md"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Entrypoints

				      - "verl/worksers/rollout/sglang_rollout/*"

				      - ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/run_ppo_trainer_megatron.sh"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_megatron_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_ppo_trainer_megatron-moe-expert-parallel:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          MEGATRON_CI_DISABLE_EXPANDABLE_SEGMENTS=1 \

				          ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \

				          PPO_MAX_TOKEN_LEN=512 FWD_MAX_TOKEN_LEN=512 \

				          MAX_PROMPT_LENGTH=256 MAX_RESPONSE_LENGTH=256 \

				          MODEL_ID=Qwen/Qwen1.5-MoE-A2.7B-Chat \

				          ENGINE=sglang COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=8 \

				          USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-qwen2_5vl-3b:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				      - name: Prepare Geo3k dataset

				        run: |

				          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/

				      - name: Prepare dist_ckpt of Qwen2.5-VL-3B, only supports dist_ckpt

				        run: |

				          python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct --output_path checkpoints/verl-test/qwen2.5-vl-3b-megatron

				      - name: Running Geo3k E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)

				        run: |

				          ray stop --force

				          ENGINE=sglang TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/geo3k/test.parquet MAX_PROMPT_LENGTH=1024 MAX_RESPONSE_LENGTH=2048  MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False SKIP_SAVE_HF_MODEL=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 COMMON_TP=2 USE_DIST_CKPT=true DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-vl-3b-megatron bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_sglang:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test,gpu,sglang]

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt

				        run: |

				          ray stop --force

				          ENGINE=sglang bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on sglang async

				        run: |

				          ray stop --force

				          TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				  e2e_ppo_trainer_sglang_vlm:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test,geo,gpu,sglang] --no-deps

				      # Geo3k

				      - name: Prepare GEO3K dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/

				      - name: Running GEO3K VLM E2E training tests on 8 L20 GPUs with rmpad using function rm

				        run: |

				          ray stop --force

				          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \

				            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \

				            MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \

				            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \

				            ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \

				            ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \

				            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GEO3K VLM E2E with rmpad using torch fused kernel (Qwen2.5-VL)

				        run: |

				          ray stop --force

				          FUSED_KERNELS=True TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \

				            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \

				            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \

				            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \

				            ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \

				            ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \

				            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GEO3K VLM E2E with rmpad using triton fused kernel (Qwen2.5-VL)

				        run: |

				          ray stop --force

				          FUSED_KERNELS=True FUSED_KERNEL_BACKEND=triton \

				            TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \

				            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \

				            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \

				            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \

				            ENGINE=sglang GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \

				            ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \

				            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_ppo_trainer_megatron-moe-expert-parallel,

				        e2e_ppo_trainer_megatron-qwen2_5vl-3b,

				        e2e_ppo_trainer_sglang,

				        e2e_ppo_trainer_sglang_vlm

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										292

.github/workflows/e2e_ppo_trainer_megatron_vllm.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,292 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_ppo_trainer_megatron_vllm

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch.

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!docker/**"

				      # Docs

				      - "!**/*.md"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Entrypoints

				      - ".github/workflows/e2e_ppo_trainer_megatron_vllm.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/run_ppo_trainer_megatron.sh"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_megatron_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				  TRANSFORMERS_VERSION: "4.56.2"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_ppo_trainer_megatron-deepseek:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install math-verify transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use mbridge e2e to pre-load and save (Deepseek)

				        run: |

				          ray stop --force

				          ALL_OFFLOAD=True SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True USE_DIST_CKPT=False \

				          bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use mbridge e2e to pre-load and save (Deepseek)

				        run: |

				          ray stop --force

				          RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True USE_DIST_CKPT=False \

				          bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          export VLLM_USE_V1=1

				          ray start --head

				          MODE=async USE_FUSED_KERNELS=True MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 SAVE_FREQ=2 bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)

				        run: |

				          exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_2/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_2/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_2/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_2/critic/huggingface

				      - name: Test Megatron distributed checkpoints merging function (DeepSeek)

				        run: |

				          exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"

				          torchrun --nproc_per_node 4 --nnodes 1  -m verl.model_merger merge --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_2/actor --target_dir checkpoints/verl-test/${exp_name}/global_step_2/actor/hf_model

				      - name: Running GRPO GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-qwen3:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install math-verify transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) with validation and saving

				        run: |

				          ray stop --force

				          ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler

				        run: |

				          ray stop --force

				          LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Test Megatron checkpoints merging function (Qwen3 Actor and Critic)

				        run: |

				          exp_name="qwen3-0.6b-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install math-verify transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with tie-embedding Megatron (Qwen) with train tp > infer tp

				        run: |

				          ray stop --force

				          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=2 INFER_TP=1 MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen) with  train tp < infer tp

				        run: |

				          ray stop --force

				          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 TRAIN_TP=1 INFER_TP=2 ALL_OFFLOAD=True MODEL_ID=Qwen/Qwen2.5-1.5B bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-qwen-override-transformer-config:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install math-verify transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				#      - name: Download Model to Use

				#        run: |

				#          huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B

				#          export HF_HUB_OFFLINE=1

				      - name: Prepare dist_ckpt of Qwen2.5-0.5B, uneven layer distribution only supports dist_ckpt

				        run: |

				          python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-0.5B --output_path checkpoints/verl-test/qwen2.5-0.5b-megatron

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)

				        run: |

				          ray stop --force

				          SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 SKIP_SAVE_HF_MODEL=1 USE_DIST_CKPT=True DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-0.5b-megatron \

				          bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_first_pipeline_stage=8 +actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=4

				          cp -r checkpoints checkpoints-dut

				          SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: Test Megatron checkpoints merging function (Qwen Actor and Critic)

				        run: |

				          exp_name="qwen2.5-0.5b-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --tie-word-embedding --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints-dut/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_ppo_trainer_megatron-deepseek,

				        e2e_ppo_trainer_megatron-qwen3,

				        e2e_ppo_trainer_megatron-different-train-infer-tp-qwen-tie-embedding,

				        e2e_ppo_trainer_megatron-qwen-override-transformer-config,

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										420

.github/workflows/e2e_ppo_trainer_megatron_vllm_2.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,420 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_ppo_trainer_megatron_vllm_2

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch.

				  # For push, for now only anti-patterns are specified so it is more conservative

				  # and achieves higher coverage.

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!docker/**"

				      # Docs

				      - "!**/*.md"

				      - "!docs/**"

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Entrypoints

				      - ".github/workflows/e2e_ppo_trainer_megatron_vllm.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "examples/data_preprocess/geo3k.py"

				      - "tests/special_e2e/run_ppo_trainer_megatron.sh"

				      - "verl/trainer/main_ppo.py"

				      - "verl/trainer/config/ppo_megatron_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				  TRANSFORMERS_VERSION: "4.56.2"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  e2e_ppo_trainer_megatron-deepseek-override-transformer-config:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=2 COMMON_VPP=null bash tests/special_e2e/run_ppo_trainer_megatron.sh +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_embedding_in_pipeline_split=true +actor_rollout_ref.actor.megatron.override_transformer_config.account_for_loss_in_pipeline_split=true

				      - name: Test Megatron checkpoints merging function (DeepSeek Actor and Critic)

				        run: |

				          exp_name="deepseek-coder-1.3b-instruct-megatron-gsm8k-minimal"

				          python -m verl.model_merger test --backend megatron --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				          python -m verl.model_merger test --backend megatron --is-value-model --local_dir checkpoints/verl-test/${exp_name}/global_step_1/critic --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/critic/huggingface

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-moe-expert-parallel:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install mbridge

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \

				          PPO_MAX_TOKEN_LEN=512 FWD_MAX_TOKEN_LEN=512 \

				          MAX_PROMPT_LENGTH=256 MAX_RESPONSE_LENGTH=256 \

				          MODEL_ID=Qwen/Qwen1.5-MoE-A2.7B-Chat USE_MBRIDGE=True \

				          COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=8 \

				          USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_megatron-qwen2_5vl-3b:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      - name: Prepare Geo3k dataset

				        run: |

				          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/

				      - name: Prepare dist_ckpt of Qwen2.5-VL-3B, only supports dist_ckpt

				        run: |

				          python3 scripts/converter_hf_to_mcore.py --hf_model_path ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct --output_path checkpoints/verl-test/qwen2.5-vl-3b-megatron

				      - name: Running Geo3k E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)

				        run: |

				          ray stop --force

				          TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/geo3k/test.parquet \

				          MAX_PROMPT_LENGTH=1024 MAX_RESPONSE_LENGTH=2048  MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct ADV_ESTIMATOR=grpo \

				          USE_DYNAMIC_BSZ=False USE_FUSED_KERNELS=True SKIP_SAVE_HF_MODEL=1 \

				          COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 COMMON_TP=2 USE_DIST_CKPT=true \

				          DIST_CKPT_PATH=checkpoints/verl-test/qwen2.5-vl-3b-megatron bash tests/special_e2e/run_ppo_trainer_megatron.sh

				      - name: clean up

				        run: |

				          rm -rf checkpoints

				  e2e_ppo_trainer_vllm:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,vllm]

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      - name: Prepare GSM8K dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      # HF sanity

				#      - name: Running GSM8K E2E training tests on 1 L20 GPU with hf for sanity

				#        run: |

				#          ray stop --force

				#          bash tests/special_e2e/ppo_trainer/run_single_gpu.sh

				#      # HF sanity

				#      - name: Running GSM8K E2E training tests on 1 L20 GPU with engine interface for sanity.

				#        run: |

				#          ray stop --force

				#          bash tests/special_e2e/ppo_trainer/run_single_gpu_with_engine.sh

				      # Function RM

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP_SIZE=8)

				        run: |

				          ray stop --force

				          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm after resuming

				        run: |

				          ray stop --force

				          RESUME_MODE=auto VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Test merging FSDP checkpoints (Qwen Actor)

				        run: |

				          exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp-size8"

				          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (DDP_SIZE=2, FSDP_SIZE=4)

				        run: |

				          ray stop --force

				          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4" bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Test merging DDP+FSDP checkpoints (Qwen Actor)

				        run: |

				          exp_name="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4"

				          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP2)

				        run: |

				          ray stop --force

				          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8" STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Test merging FSDP2 checkpoints (Qwen Actor)

				        run: |

				          exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8"

				          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface

				      - name: Running GSM8K E2E without rmpad using function rm

				        run: |

				          ray stop --force

				          RM_PAD=False bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (GRPO)

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=grpo USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (ReMax)

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=remax USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using customized reward function

				        run: |

				          ray stop --force

				          CUSTOM_REWARD_FN=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with in-reward kl and kl loss

				        run: |

				          ray stop --force

				          USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      # LoRA tests

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True TOTAL_TRAIN_STEPS=1 SAVE_FREQ=1 FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal" bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Test GRPO LoRA checkpoints merging function

				        run: |

				          export EXP_NAME="qwen2.5-0.5b-function-reward-minimal"

				          ls checkpoints/verl-test/${EXP_NAME}/global_step_1/actor

				          cat checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface/config.json

				          python3 -m verl.model_merger merge --backend fsdp --local_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/ --target_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon with fsdp2

				        run: |

				          ray stop --force

				          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      # Model RM

				      - name: Running GRPO GSM8K E2E training tests with FSDP on 8 L20 GPUs (DeepSeek)

				        run: |

				          ray stop --force

				          MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GSM8K E2E with rmpad using model rm

				        run: |

				          ray stop --force

				          bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E without rmpad using model rm

				        run: |

				          ray stop --force

				          RM_PAD=False bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E with rmpad using model rm and ulysses sp=2

				        run: |

				          ray stop --force

				          SP_SIZE=2 bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E with rmpad using model rm and dynamic batch size

				        run: |

				          ray stop --force

				          SEQ_BALANCE=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E with rmpad using model rm with Liger Kernel enabled

				        run: |

				          ray stop --force

				          LIGER=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled

				        run: |

				          ray stop --force

				          FUSED_KERNELS=True bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E with rmpad using model rm with Fused Kernel enabled

				        run: |

				          ray stop --force

				          FUSED_KERNEL=True FUSED_KERNEL_BACKEND=triton bash tests/special_e2e/ppo_trainer/run_model_reward.sh

				      - name: Running GSM8K E2E training tests on vllm async

				        run: |

				          ray stop --force

				          export VLLM_USE_V1=1

				          ray start --head

				          TOTAL_TRAIN_STEPS=2 ENGINE=vllm ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				  e2e_ppo_trainer_vllm_vlm:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test,gpu,vllm,geo,trl]

				          pip3 install transformers==$TRANSFORMERS_VERSION

				      # Geo3k

				      - name: Prepare GEO3K dataset

				        run: |

				          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/

				      - name: Running GEO3K VLM GRPO E2E training tests on 8 L20 GPUs with rmpad using function rm

				        run: |

				          ray stop --force

				          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \

				            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \

				            MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \

				            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \

				            SP_SIZE=2 \

				            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GEO3K VLM PPO E2E training tests on 8 L20 GPUs with rmpad using function rm

				        run: |

				          ray stop --force

				          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \

				            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \

				            MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \

				            ADV_ESTIMATOR=gae RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \

				            SP_SIZE=2 \

				            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				      - name: Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with rmpad using function rm

				        run: |

				          ray stop --force

				          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \

				            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \

				            MODEL_ID=Qwen/Qwen2-VL-2B-Instruct \

				            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \

				            SP_SIZE=2 \

				            LORA_RANK=32 LORA_EXCLUDE=".*visual.*" \

				            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        e2e_ppo_trainer_megatron-deepseek-override-transformer-config,

				        e2e_ppo_trainer_megatron-moe-expert-parallel,

				        e2e_ppo_trainer_megatron-qwen2_5vl-3b,

				        e2e_ppo_trainer_vllm,

				        e2e_ppo_trainer_vllm_vlm

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										144

.github/workflows/e2e_sft.yml
									
										vendored
									
												View File
												
				@ -1,3 +1,34 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: e2e_sft

				on:

				@ -6,18 +37,28 @@ on:

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_sft.yml

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				      - v0.*

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_sft.yml

				      - "tests/e2e/*.sh"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # Entrypoints

				      - ".github/workflows/e2e_sft.yml"

				      - "examples/data_preprocess/gsm8k.py"

				      - "tests/special_e2e/sft"

				      - "verl/trainer/fsdp_sft_trainer.py"

				      - "verl/trainer/config/sft_trainer.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				@ -25,47 +66,96 @@ concurrency:

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				      if: github.repository_owner == 'volcengine'

				      runs-on: ubuntu-latest

				      outputs:

				        runner-label: ${{ steps.create-runner.outputs.runner-label }}

				        mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				      steps:

				        - uses: actions/checkout@v4

				        - id: create-runner

				          uses: volcengine/vemlp-github-runner@v1 

				          with:

				            mode: "create"

				            faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				            mlp-image: "${{ env.IMAGE }}"

				  e2e_sft:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 5 # Increase this timeout value as needed

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 30 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test,gpu]

				          pip3 install peft

				          pip3 install --no-deps -e .[test,gpu]

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm

				        run: |

				          ray stop --force

				          bash tests/sft/run_sft.sh

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with sequence parallism

				          bash tests/special_e2e/sft/run_sft.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs w/o rmpad using function rm

				        run: |

				          ray stop --force

				          bash examples/sft/gsm8k/run_qwen_05_sp2.sh 8 $HOME/ckpts/

				          RM_PAD=False bash tests/special_e2e/sft/run_sft.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism

				        run: |

				          ray stop --force

				          SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh

				      - name: Check loss difference between sequence parallel vs. default implementation

				        run: |

				          ray stop --force

				          bash tests/sft/run_sft_sp_loss_match.sh

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with sequence parallism and liger

				          ENTRYPOINT="tests/special_e2e/sft/test_sp_loss_match.py" SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh

				      - name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism and liger

				        run: |

				          ray stop --force

				          bash tests/sft/run_sft_qwen05_sp2_liger.sh 8 $HOME/ckpts/

				          rm -rf $HOME/ckpts/

				          SP_SIZE=2 LIGER=True bash tests/special_e2e/sft/run_sft.sh

				      - name: Running GSM8K E2E training tests with LoRA

				        run: |

				          ray stop --force

				          LORA_RANK=32 bash tests/special_e2e/sft/run_sft.sh

				      - name: Run GSM8K E2E training and resume tests resuming from the checkpoint manager

				        run: |

				          ray stop --force

				          LORA_RANK=32 RESUME_MODE=auto TOTAL_TRAIN_STEP=2 bash tests/special_e2e/sft/run_sft.sh

				      # TODO: multiturn

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Running GSM8K E2E training tests with multiturn and various configs and compare results

				        run: |

				          bash tests/special_e2e/sft/test_sft_engine_all.sh

				  cleanup:

				    runs-on: ubuntu-latest

				    needs: [setup, e2e_sft]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										60

.github/workflows/e2e_sglang_gsm8k.yml
									
										vendored
									
												View File
											
				@ -1,60 +0,0 @@

				name: e2e_sglang_gsm8k

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_sglang_gsm8k.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/e2e_sglang_gsm8k.yml

				      - "tests/e2e/*.sh"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_sglang_gsm8k:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 40 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: ocss884/verl-sglang:ngc-th2.5.1-cu126-sglang0.4.3.post3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test,gpu,sglang] --no-deps

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py

				      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm and save ckpt

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen_gsm8k_function_rm.sh sglang

									
										54

.github/workflows/e2e_vlm_geo3k.yml
									
										vendored
									
												View File
											
				@ -1,54 +0,0 @@

				name: e2e_vlm_geo3k

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_vlm_geo3k.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/e2e_vlm_geo3k.yml

				      - "tests/e2e/*.sh"

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  e2e_vlm_geo3k:

				    runs-on: [self-hosted, l20-1]

				    timeout-minutes: 10 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2

				      options: --gpus all --shm-size=40g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test,geo,vllm]

				          python -c "import transformers; print(transformers.__version__)"

				      - name: Prepare geo3k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/geo3k.py

				      - name: Running geo3k vlm e2e training tests on 8 L20 GPUs with rmpad using function rm

				        run: |

				          ray stop --force

				          bash tests/e2e/run_qwen2vl_geo3k_function_rm.sh

									
										113

.github/workflows/gpu_unit_tests.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,113 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: GPU unit tests

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.4.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/gpu_unit_tests.yml

				  pull_request:

				    branches:

				      - main

				      - v0.4.x

				    paths:

				      # The order that you define paths patterns matters:

				      # A matching negative pattern (prefixed with !) after a positive match will exclude the path.

				      # A matching positive pattern after a negative match will include the path again.

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      - "!recipe/**"

				      # Entrypoints

				      - .github/workflows/gpu_unit_tests.yml

				      - "tests/**test_*.py"

				      # Ignore CPU tests

				      - "!tests/*_on_cpu.py"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  gpu_unit_tests:

				    if: github.repository_owner == 'volcengine'

				    runs-on: [L20x8]

				    timeout-minutes: 60 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install --no-deps -e .[test]

				          pip3 install --upgrade "ray>=2.40.0"

				          pip3 install cupy-cuda12x

				      - name: Download Model to Use

				        run: |

				          huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct

				          huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct

				          export HF_HUB_OFFLINE=1

				        # Disable requests to avoid network errors

				      - name: Run all GPU unit tests

				        run: |

				          pytest -s -x --ignore-glob="*test_special_*.py" --ignore-glob='*on_cpu.py' --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob='tests/special*' --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" tests/

				      - name: Testing LinearCrossEntropyTP Correctness, Computation Time and Memory Consumption

				        run: |

				          LOW_MEMORY=True torchrun --standalone --nnodes=1 --nproc-per-node=8 tests/utils/test_special_linear_cross_entropy_tp.py

				      - name: Testing FSDP2 actor functionality

				        run: |

				          torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/actor/test_special_dp_actor.py

				      - name: Testing FSDP2 critic functionality

				        run: |

				          torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/critic/test_special_dp_critic.py

									
										234

.github/workflows/model.yml
									
										vendored
									
												View File
												
				@ -1,4 +1,36 @@

				name: model_rmpad

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				# name: Check PR Title

				name: model

				on:

				  # Trigger the workflow on push or pull request,

				@ -6,76 +38,194 @@ on:

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/model.yml

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				      - v0.*

				    paths:

				      - "**/*.py"

				      - .github/workflows/model.yml

				      - "verl/**/*.py"

				      # Entrypoints

				      - ".github/workflows/model.yml"

				      - "tests/special_distributed/test_fsdp_ckpt.py"

				      - "tests/special_distributed/test_mcore_config_converter.py"

				      - "tests/special_distributed/test_tensor_dict.py"

				      - "tests/models/**"

				      - "tests/special_distributed/run_all.sh"

				# Declare permissions just read content.

				permissions: 

				permissions:

				  contents: read

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  model_rmpad:

				    runs-on: [self-hosted, l20-1]

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository and upgrade to latest transformers/flash_attn

				          fetch-depth: 0

				      - name: Install the current repository and upgrade to latest transformers(4.54.0)/flash_attn, transformers 4.55.0 has strange behavior with model backward

				        run: |

				          pip3 install -e .[test]

				          pip3 install --no-deps -e .[test]

				          pip3 install --upgrade transformers

				      - name: Running rmpad model tests on 8 L20 GPUs + flash_attn 2.5.8

				        run: |

				          pytest -s tests/model/test_transformer.py

				          pytest -s tests/models/test_transformer.py

				      - name: Running rmpad model tests on 8 L20 GPUs + latest flash_attn

				        run: |

				          pip3 install --upgrade flash_attn --no-build-isolation

				          pytest -s tests/model/test_transformer.py

				          pytest -s tests/models/test_transformer.py

				      - name: Running FSDP rmpad model tests on 8 L20 GPUs + latest flash_attn

				        run: |

				          pip3 install hf_transfer

				          torchrun --nproc_per_node=8 tests/checkpoint/test_fsdp_ckpt.py

				          STRATEGY=fsdp torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + latest transformers

				        run: |

				          torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.49.0

				          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.54.1

				        run: |

				          pip3 install transformers==4.49.0

				          torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.48.0

				          pip3 install transformers==4.54.1

				          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.53.2

				        run: |

				          pip3 install transformers==4.48.0

				          torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.47.0

				          pip3 install transformers==4.53.2

				          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.52.0

				        run: |

				          pip3 install transformers==4.47.0

				          torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.46.0

				        run: |

				          pip3 install transformers==4.46.0

				          torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py

				      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.45.0

				        run: |

				          pip3 install transformers==4.45.0

				          torchrun --nproc_per_node=8 -m pytest tests/model/test_transformers_ulysses.py

				          pip3 install transformers==4.52.0

				          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py

				      - name: Run distributed test

				        run: |

				          bash tests/distributed/run_all.sh

				          bash tests/special_distributed/run_all.sh

				  # TODO: Move this back to model_rmpad once FSDP2 is stable.

				  # NOTE: List as an independent job to make rerun easier.

				  model_rmpad_fsdp2_unstable:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository and upgrade to latest transformers/flash_attn

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install --upgrade transformers

				      - name: Running FSDP2 rmpad model tests on 8 L20 GPUs + latest flash_attn

				        run: |

				          STRATEGY=fsdp2 torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py

				  mcore_config_converter:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip install --upgrade "huggingface_hub[cli]"

				#      - name: Download model config files

				#        run: |

				#          hf download Qwen/Qwen2.5-7B config.json --local-dir $HOME/configs/Qwen/Qwen2.5-7B

				#          hf download Qwen/Qwen3-8B config.json --local-dir $HOME/configs/Qwen/Qwen3-8B

				#          hf download deepseek-ai/deepseek-coder-1.3b-instruct config.json --local-dir $HOME/configs/deepseek-ai/deepseek-coder-1.3b-instruct

				#          hf download Qwen/Qwen2-57B-A14B config.json --local-dir $HOME/configs/Qwen/Qwen2-57B-A14B

				#          hf download Qwen/Qwen3-30B-A3B config.json --local-dir $HOME/configs/Qwen/Qwen3-30B-A3B

				#          hf download deepseek-ai/DeepSeek-V3-Base config.json --local-dir $HOME/configs/deepseek-ai/DeepSeek-V3-Base

				      - name: Running mcore config converter tests on 8 L20 GPUs

				        run: |

				          torchrun --nproc_per_node=8 tests/special_distributed/test_mcore_config_converter.py

				  model_engine:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install --no-deps -e .[test]

				          pip3 install --upgrade tensordict transformers

				          pip install --upgrade "huggingface_hub[cli]"

				      - name: Download model config files

				        run: |

				          hf download Qwen/Qwen2.5-0.5B-Instruct --local-dir $HOME/models/Qwen/Qwen2.5-0.5B-Instruct

				      - name: Running mcore engine tests on 8 L20 GPUs

				        run: |

				          ray stop --force

				          pytest -s -x tests/models/test_engine.py

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        model_rmpad,

				        model_rmpad_fsdp2_unstable,

				        mcore_config_converter,

				        model_engine

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										40

.github/workflows/pre-commit.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,40 @@

				# c.f. https://github.com/pre-commit/action?tab=readme-ov-file#using-this-action

				name: pre-commit

				# No need to avoid / cancel lightweight pre-commit jobs

				on:

				  schedule:

				    - cron: "0 0 * * 0"

				  pull_request:

				  push:

				    branches:

				      - main

				      - v0.*

				  # Allow manual triggering

				  workflow_dispatch:

				# Declare permissions just read content.

				permissions:

				  contents: read

				jobs:

				  pre-commit:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        python-version: ["3.12"]

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Set up Python ${{ matrix.python-version }}

				        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Install the current repository

				        run: |

				          pip install -e .

				      - name: Set ruff --output-format=github

				        run: |

				          sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml

				          git add .pre-commit-config.yaml

				      # Check "--all-files" by default

				      - uses: pre-commit/action@v3.0.1

									
										54

.github/workflows/ray_test.yml
									
										vendored
									
												View File
											
				@ -1,54 +0,0 @@

				name: ray

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/ray_test.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/ray_test.yml

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  ray:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 5 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip install hf_transfer

				          pip install -e .[test]

				          pip install --upgrade "ray>=2.40.0"

				      - name: Running ray tests that need 8 GPUs

				        run: |

				          cd tests/ray

				          pytest -s -x --ignore=test_check_worker_alive.py --ignore=test_rvdz.py .

									
										131

.github/workflows/reward_model.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,131 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests 

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				# name: Check PR Title

				name: reward_model

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "verl/**/*.py"

				      # Entrypoints

				      - ".github/workflows/reward_model.yml"

				      - "tests/workers/reward_model/**"

				# Declare permissions just read content.

				permissions:

				  contents: read

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				  TRANSFORMERS_VERSION: "4.56.2"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  reward_model:

				    needs: setup

				    runs-on: [ "${{ needs.setup.outputs.runner-label || 'L20x8' }}" ]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				      SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK: "True"

				      NCCL_SHM_DISABLE: "1"

				      NCCL_P2P_DISABLE: "1"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install -e .[test]

				#      - name: Download model config files

				#        run: |

				#          hf download Skywork/Skywork-Reward-V2-Llama-3.2-1B --local-dir $HOME/models/Skywork/Skywork-Reward-V2-Llama-3.2-1B

				#          hf download verl-team/GenRM-CI-Test-1.5B --local-dir $HOME/models/verl-team/GenRM-CI-Test-1.5B

				      - name: Running discriminative reward model tests on 8 L20 GPUs

				        run: |

				          unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY

				          pytest -s -x tests/workers/reward_model/test_discriminative_reward_model.py

				      - name: Running generative reward model tests on 8 L20 GPUs

				        run: |

				          unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY

				          pytest -s -x tests/workers/reward_model/test_generative_reward_model.py

				  cleanup:

				    runs-on: ubuntu-latest

				    needs:

				      [

				        setup,

				        reward_model

				      ]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										54

.github/workflows/sandbox.yml
									
										vendored
									
												View File
											
				@ -1,54 +0,0 @@

				name: sandbox

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/sandbox.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/sandbox.yml

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  sandbox:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 3 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test,prime]

				          pip3 install vllm==0.5.4

				      - name: Running sandbox tests on 8 L20 GPUs

				        run: |

				          cd tests/sandbox

				          pytest -s -x .

									
										77

.github/workflows/sanity.yml
									
										vendored
									
												View File
												
				@ -1,3 +1,35 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				# name: Check PR Title

				name: sanity

				on:

				@ -6,17 +38,15 @@ on:

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/sanity.yml

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				      - v0.*

				    paths:

				      - "**/*.py"

				      - .github/workflows/sanity.yml

				      - "tests/special_sanity/**"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				@ -24,7 +54,7 @@ concurrency:

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				permissions:

				  contents: read

				jobs:

				@ -42,13 +72,38 @@ jobs:

				          python-version: ${{ matrix.python-version }}

				      - name: Install the current repository

				        run: |

				          pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu

				          pip3 install -r requirements.txt

				          pip install -e .[test]

				      - name: Run sanity test

				        run: |

				          pytest -s -x tests/sanity

				      - name: Run utility test

				        run: |

				          pytest -s -x tests/utility

				          pytest -s -x tests/special_sanity

				      - name: Run license test

				        run: |

				          python3 tests/sanity/check_license.py --directory .

				          python3 tests/special_sanity/check_license.py --directories .

				      - name: Assert naming convention

				        run: |

				          if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ 'veRL' .; then

				            echo "Please use verl instead of veRL in the codebase"

				            exit 1

				          fi

				      - name: Assert SGLang naming convention

				        run: |

				          if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ -E 'Sglang|sgLang|sglAng|sglaNg|sglanG' .; then

				            echo "Please use SGLang or sglang as the formal name of SGLang rollout engine"

				            exit 1

				          fi

				      - name: Validate test folder structure

				        run: python3 tests/special_sanity/validate_structure.py

				      - name: Assert documentation requirement for functions

				        run: python3 tests/special_sanity/validate_imported_docs.py

				      - name: Assert device api usage in verl/recipe

				        run: python3 tests/special_sanity/check_device_api_usage.py --directory ./recipe

				      - name: Assert device api usage in verl/verl

				        run: python3 tests/special_sanity/check_device_api_usage.py --directory ./verl

				      - name: Assert documentation time info

				        run: python3 tests/special_sanity/check_docs_time_info.py

				      - name: Check docstrings for specified files

				        run: python3 tests/special_sanity/check_docstrings.py

				      - name: Check DataProto for specified folders

				        run: python3 tests/special_sanity/check_dataproto_usage.py -d ./verl/workers/engine

									
										6

.github/workflows/scorecard.yml
									
										vendored
									
												View File
												
				@ -10,9 +10,11 @@ on:

				  # To guarantee Maintained check is occasionally updated. See

				  # https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained

				  schedule:

				    - cron: '27 7 * * 1'

				    - cron: "27 7 * * 1"

				  push:

				    branches: [ "main" ]

				    branches:

				      - main

				      - v0.*

				# Declare default permissions as read only.

				permissions: read-all

									
										17

.github/workflows/secrets_scan.yml
									
										vendored
									
												View File
												
				@ -2,6 +2,7 @@ on:

				  push:

				    branches:

				      - main

				      - v0.*

				  pull_request:

				permissions:

				@ -11,11 +12,11 @@ jobs:

				  test:

				    runs-on: ubuntu-latest

				    steps:

				    - name: Checkout code

				      uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

				      with:

				        fetch-depth: 0

				    - name: Secret Scanning

				      uses: trufflesecurity/trufflehog@7dc056a193116ba8d82154bf0549381c8fb8545c # v3.88.14

				      with:

				        extra_args: --results=verified,unknown

				      - name: Checkout code

				        uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

				        with:

				          fetch-depth: 0

				      - name: Secret Scanning

				        uses: trufflesecurity/trufflehog@7dc056a193116ba8d82154bf0549381c8fb8545c # v3.88.14

				        with:

				          extra_args: --results=verified,unknown

									
										178

.github/workflows/sgl.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,178 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: sgl

				on:

				#  workflow_dispatch: # Manual

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      - .github/workflows/sgl.yml

				  pull_request:

				    branches:

				      - main

				      - v0.*

				    paths:

				      - "**/*.py"

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # vLLM

				      - "!**/*vllm*"

				      # Recipes

				      - "!recipe/**"

				      # Entrypoints

				      - ".github/workflows/sgl.yml"

				      - "tests/rollout/*sglang*"

				      - "tests/rollout/async_rollout_utils.py"

				      - "tests/workers/rollout/*interaction*"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.6-transformers4.56.1-sglang0.5.2-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  sgl:

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 35 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				      SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK: "True"

				      NCCL_SHM_DISABLE: "1"

				      NCCL_P2P_DISABLE: "1"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer fastmcp

				          pip3 install -e .[test]

				#      - name: Download Model to Use

				#        run: |

				#          huggingface-cli download Qwen/Qwen2.5-0.5B --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B

				#          huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-1.5B-Instruct

				#          huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct

				#          export HF_HUB_OFFLINE=1

				      - name: Prepare gsm8k dataset

				        run: |

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Test the latest SGLang Rollout async with agent loop

				        run: |

				          ROLLOUT_NAME=sglang pytest -svvv tests/experimental/agent_loop

				#          huggingface-cli download verl-team/gsm8k-v0.4.1 --repo-type dataset --local-dir ~/verl-data/gsm8k

				      - name: Test the latest SGLang

				        run: |

				          cd tests/workers/rollout

				          torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_spmd.py

				      - name: Test the latest SGLang Rollout async with interaction

				        run: |

				          cd tests/workers/rollout

				          torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_async_rollout_w_interaction.py

				      - name: Test the latest SGLang Multi Interaction

				        run: |

				          cd tests/workers/rollout

				          torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_multi_interaction.py

				      - name: Test the latest SGLang Rollout async with tool

				        run: |

				          cd tests/workers/rollout

				          torchrun --nnodes=1 --nproc_per_node=2 $(which pytest) -s test_sglang_async_rollout_w_tools.py

				      - name: Test the latest SGLang Rollout async with sandbox fusion tool

				        run: |

				          cd tests/workers/rollout

				          pytest -s test_sglang_async_rollout_sf_tools.py

				      - name: Test the latest SGLang Rollout async with search tool

				        run: |

				          cd tests/workers/rollout

				          pytest -s test_sglang_async_rollout_search_tools.py

				      - name: Test the latest SGLang Rollout async with mcp search tool

				        run: |

				          cd tests/workers/rollout

				          pytest -s test_sglang_async_rollout_mcp_tools.py

				      # Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests

				      - name: Test the latest SGLang Rollout async with multimodal delta

				        run: |

				          cd tests/workers/rollout

				          pytest -s test_sglang_async_rollout_multimodal_delta.py

				  cleanup:

				    runs-on: ubuntu-latest

				    needs: [setup, sgl]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										31

.github/workflows/type-coverage-check.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				name: Type Annotation and Docstring Coverage

				on:

				  pull_request:

				    paths:

				      - '**/*.py'

				      - '.github/workflows/type-coverage-check.yml'

				jobs:

				  type-coverage-check:

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          fetch-depth: 0  # 🚨 Important: fetch full history so `origin/main` is available

				      - name: Set up Python

				        uses: actions/setup-python@v5

				        with:

				          python-version: '3.10'

				      - name: Install dependencies

				        run: |

				          pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu

				          pip3 install -r requirements.txt

				          pip3 install -e . --no-deps

				      - name: Run type annotation coverage check

				        run: |

				          python3 tests/special_sanity/type_coverage_check.py

				      - name: Run docstring coverage check

				        run: |

				          python3 tests/special_sanity/check_api_docs.py verl

									
										141

.github/workflows/vllm.yml
									
										vendored
									
												View File
												
				@ -1,3 +1,34 @@

				# # Tests layout

				# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:

				# - `tests/trainer` for testing functionality related to `verl/trainer`

				# - `tests/models` for testing functionality related to `verl/models`

				# - ...

				# There are a few folders with `special_` prefix, created for special purposes:

				# - `special_distributed`: unit tests that must run with multiple GPUs

				# - `special_e2e`: end-to-end tests with training/generation scripts

				# - `special_npu`: tests for NPUs

				# - `special_sanity`: a suite of quick sanity tests

				# - `special_standalone`: a set of test that are designed to run in dedicated environments

				# Accelerators for tests

				# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.

				# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

				# # Workflow layout

				# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:

				# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`

				# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`

				# 3. End-to-end tests: `e2e_*.yml`

				# 4. Unit tests

				#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`

				#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.

				#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when

				#     - new workflow yaml is added to `.github/workflows`

				#     - new tests are added to workflow mentioned in 2.

				name: vllm

				on:

				@ -6,18 +37,32 @@ on:

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/vllm.yml

				      - v0.*

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				      - v0.*

				    paths:

				      - "**/*.py"

				      - "verl/trainer/config/*.yaml"

				      - .github/workflows/vllm.yml

				      # Other entrypoints

				      - "!examples/**"

				      - "!tests/**"

				      - "!verl/trainer/main_*.py"

				      - "!verl/trainer/fsdp_sft_trainer.py"

				      # Recipes

				      - "!recipe/**"

				      # FSDP

				      - "!verl/workers/**/*dp_*.py"

				      # Megatron

				      - "!verl/workers/**/megatron_*.py"

				      # SGLang

				      - "!**/*sglang*"

				      # Entrypoints

				      - ".github/workflows/vllm.yml"

				      - "tests/special_e2e/generation"

				      - "tests/workers/rollout"

				      - "verl/trainer/main_generation.py"

				      - "verl/trainer/config/generation.yaml"

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				@ -25,46 +70,76 @@ concurrency:

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				permissions:

				  contents: read

				env:

				  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2"

				  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

				jobs:

				  setup:

				    if: github.repository_owner == 'volcengine'

				    runs-on: ubuntu-latest

				    outputs:

				      runner-label: ${{ steps.create-runner.outputs.runner-label }}

				      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}

				    steps:

				      - uses: actions/checkout@v4

				      - id: create-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "create"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-image: "${{ env.IMAGE }}"

				  vllm:

				    runs-on: [self-hosted, l20-0]

				    timeout-minutes: 20 # Increase this timeout value as needed

				    needs: setup

				    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]

				    timeout-minutes: 35 # Increase this timeout value as needed

				    env:

				      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}

				      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}

				      NO_PROXY: "localhost,127.0.0.1"

				      HF_HUB_ENABLE_HF_TRANSFER: 1

				    container:

				      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				      options: --gpus all --shm-size=10g

				      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"

				      HF_ENDPOINT: "https://hf-mirror.com"

				      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				            fetch-depth: 0

				          fetch-depth: 0

				      - name: Install the current repository

				        run: |

				          pip3 install hf_transfer

				          pip3 install -e .[test]

				          pip3 install vllm==0.5.4

				      - name: Running vllm tests on 8 L20 GPUs

				#      - name: Download Model to Use

				#        run: |

				#          huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct

				#          huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-1.5B-Instruct

				#          huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct

				#          huggingface-cli download OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN --local-dir ${HOME}/models/OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN

				#          export HF_HUB_OFFLINE=1

				      - name: Prepare gsm8k dataset

				        run: |

				          cd tests/rollout

				          torchrun --standalone --nnodes=1 --nproc_per_node=8 $(which pytest) -s test_vllm_hf_loader.py

				          ray stop --force

				          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

				      - name: Test the latest vLLM Rollout async with agent loop

				        run: |

				          ROLLOUT_NAME=vllm pytest -svvv tests/experimental/agent_loop

				      - name: Test the latest vLLM

				        run: |

				          pip3 install --upgrade vllm==0.7.3

				          cd tests/rollout

				          torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s test_vllm_spmd.py

				      - name: Run Qwen 0.5B generation test

				          torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s tests/workers/rollout/rollout_vllm/test_vllm_spmd.py

				      - name: Test the latest vLLM on model with rope scaling

				        run: |

				          cd tests/generation

				          bash ./run_gen_qwen05.sh 4 $HOME/data/gen/qwen_05_gen_test.parquet 2

				          rm -rf $HOME/data/gen/qwen_05_gen_test.parquet

				      - name: Run Qwen 0.5B generation test when world_size == 1

				        run: |

				          cd tests/generation

				          bash ./run_gen_qwen05.sh 1 $HOME/data/gen/qwen_05_gen_test.parquet 1

				          rm -rf $HOME/data/gen/qwen_05_gen_test.parquet

				          torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s tests/workers/rollout/rollout_vllm/test_vllm_model_rope_scaling.py

				      # Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests

				  cleanup:

				    runs-on: ubuntu-latest

				    needs: [setup, vllm]

				    if: always()

				    steps:

				      - id: destroy-runner

				        uses: volcengine/vemlp-github-runner@v1

				        with:

				          mode: "destroy"

				          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"

				          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"

									
										56

.github/workflows/yapf_format.yml
									
										vendored
									
												View File
											
				@ -1,56 +0,0 @@

				name: yapf

				on:

				  # Trigger the workflow on push or pull request,

				  # but only for the main branch

				  push:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/yapf_format.yml

				  pull_request:

				    branches:

				      - main

				      - v0.2.x

				    paths:

				      - "**/*.py"

				      - .github/workflows/yapf_format.yml

				# Cancel jobs on the same ref if a new one is triggered

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

				# Declare permissions just read content.

				permissions: 

				  contents: read

				jobs:

				  yapf:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        python-version: ["3.12"]

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      # - name: checkout

				      #   run: |

				      #     commits=${{ github.event.pull_request.commits }}

				      #     if [[ -n "$commits" ]]; then

				      #       # Prepare enough depth for diffs with main

				      #       git fetch --depth="$(( commits + 1 ))"

				      #     fi

				      - name: Set up Python ${{ matrix.python-version }}

				        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Install dependencies

				        run: |

				          python -m pip install --upgrade pip

				          pip install --upgrade yapf

				          pip install toml==0.10.2

				      - name: Running yapf

				        run: |

				          yapf -r -vv -d --style=./.style.yapf verl tests examples

7

.gitignore vendored

View File

 @ -33,6 +33,7 @@ lib64/
 parts/
 sdist/
 var/
 tmp/
 *.egg-info/
 .installed.cfg
 *.egg
 @ -57,6 +58,8 @@ nosetests.xml
 coverage.xml
 *,cover
 .hypothesis/
 pytest.ini
 output.txt
 # Translations
 *.mo
 @ -108,9 +111,6 @@ ENV/
 # Mac
 .DS_Store
 # output logs
 tests/e2e/toy_examples/deepspeed/synchronous/output.txt
 # vim
 *.swp
 @ -125,3 +125,4 @@ tests/e2e/toy_examples/deepspeed/synchronous/output.txt
 logs
 log
 outputs
 .history

									
										37

.pre-commit-config.yaml
									
										Normal file
									
												View File
												
				@ -0,0 +1,37 @@

				repos:

				  - repo: https://github.com/astral-sh/ruff-pre-commit

				    rev: "v0.12.2"

				    hooks:

				      - id: ruff

				        args: ["--fix", "--show-fixes", "--output-format=full"]

				        exclude: ^.*\.(ipynb)$

				      - id: ruff-format

				  - repo: https://github.com/pre-commit/mirrors-mypy

				    rev: 'v1.17.0'

				    hooks:

				      - id: mypy

				  - repo: local

				    hooks:

				      - id: autogen-trainer-cfg

				        name: Generate and verify verl/trainer/config/_generated_*.yaml

				        entry: scripts/generate_trainer_config.sh

				        language: script

				        pass_filenames: false

				  - repo: local

				    hooks:

				      - id: check-docstrings

				        name: Check doc string coverage

				        entry: python3 tests/special_sanity/check_docstrings.py

				        language: python

				        pass_filenames: false

				  - repo: local

				    hooks:

				      - id: check-license

				        name: Check license

				        entry: python3 tests/special_sanity/check_license.py --directories examples recipe scripts tests verl setup.py

				        language: python

				        pass_filenames: false

5

.style.yapf

View File

 @ -1,5 +0,0 @@
 [style]
 based_on_style = google
 column_limit = 120
 indent_width = 4
 split_arguments_when_comma_terminated: true

									
										15

.vscode/settings.json
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,15 @@

				{

				    "[python]": {

				        "editor.defaultFormatter": "charliermarsh.ruff",

				        "editor.codeActionsOnSave": {

				            "source.organizeImports": "always",

				        }

				    },

				    "files.associations": {

				        "array": "cpp",

				        "string_view": "cpp",

				        "initializer_list": "cpp",

				        "utility": "cpp"

				    },

				    "iis.configDir": ""

				}

									
										89

CONTRIBUTING.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,89 @@

				# Contributing to verl

				Thank you for considering a contribution to verl! We welcome contributions of any kind - bug fixes, enhancements, documentation improvements, or even just feedback. Whether you're an experienced developer or this is your first open-source project, your help is invaluable.

				Your support can take many forms:

				- Report issues or unexpected behaviors.

				- Suggest or implement new features.

				- Improve or expand documentation.

				- Review pull requests and assist other contributors.

				- Spread the word: share verl in blog posts, social media, or give the repo a ⭐.

				## Finding Issues to Contribute

				Looking for ways to dive in? Check out these issues:

				- [Good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)

				- [Call for contribution](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22call%20for%20contribution%22)

				Furthermore, you can learn the development plan and roadmap via [RFC](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3ARFC) and [Roadmap](https://github.com/volcengine/verl/issues?q=state%3Aopen%20label%3A%22roadmap%22).

				## Developing

				- **Python-only**: install verl via `pip install -e .[test,vllm]` or `pip install -e .[test,sglang]` and iterate quickly. For full dependency setup, check out the verl [installation doc](https://verl.readthedocs.io/en/latest/start/install.html).

				## Code Linting and Formatting

				We rely on pre-commit to keep our code consistent. To set it up:

				```bash

				pip install pre-commit

				pre-commit install

				# for staged changes

				pre-commit run

				# for all files in the repo

				pre-commit run --all-files

				# run a specific hook with pre-commit

				# pre-commit run --all-files --show-diff-on-failure --color=always <hood-id>

				pre-commit run --all-files --show-diff-on-failure --color=always ruff

				pre-commit run --all-files --show-diff-on-failure --color=always autogen-trainer-cfg

				```

				## Testing

				Our test suites run on GitHub Actions. Check these workflows for details:

				- [GPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/gpu_unit_tests.yml)

				- [CPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/cpu_unit_tests.yml)

				- [vLLM tests](https://github.com/volcengine/verl/blob/main/.github/workflows/vllm.yml)

				- [SGLang tests](https://github.com/volcengine/verl/blob/main/.github/workflows/sgl.yml)

				### Adding CI tests

				If possible, please add CI test(s) for your new feature:

				1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).

				2. Add related path patterns to the `paths` section if not already included.

				3. Minimize the workload of the test script(s) (see existing scripts for examples).

				## Building the Docs

				```

				# Ensure verl is on your PYTHONPATH, e.g.:

				pip install -e .[test]

				# Install documentation dependencies

				pip install -r requirements-docs.txt

				# Generate HTML docs

				make clean

				make html

				# Preview locally

				python -m http.server -d _build/html/

				```

				Open your browser at http://localhost:8000 to explore the docs.

				## Pull Requests & Code Reviews

				Thanks for submitting a PR! To streamline reviews:

				- Follow our Pull Request Template for title format and checklist.

				- Adhere to our pre-commit lint rules and ensure all checks pass.

				- Update docs for any user-facing changes.

				- Add or update tests in the CI workflows, or explain why tests aren't applicable.

				## License

				See the [LICENSE](https://github.com/volcengine/verl/blob/main/LICENSE) file for full details.

				## Thank You

				We appreciate your contributions to verl. Your efforts help make the project stronger and more user-friendly. Happy coding!

									
										227

README.md
									
												View File
												
				@ -1,14 +1,25 @@

				<h1 style="text-align: center;">verl: Volcano Engine Reinforcement Learning for LLM</h1>

				<div align="center">

				 👋 Hi, everyone! 

				    verl is a RL training library initiated by <b>ByteDance Seed team</b> and maintained by the verl community.

				    <br>

				    <br>

				</div>

				<div align="center">

				<a href="https://deepwiki.com/volcengine/verl"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>

				[![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)](https://github.com/volcengine/verl/stargazers)

				![GitHub forks](https://img.shields.io/github/forks/volcengine/verl)

				[![Twitter](https://img.shields.io/twitter/follow/verl_project)](https://twitter.com/verl_project)

				<a href="https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA"><img src="https://img.shields.io/badge/Slack-verl-blueviolet?logo=slack&amp"></a>

				<a href="https://join.slack.com/t/verl-project/shared_invite/zt-3c6mc2khw-v0lo6NfDPuFP6OnkrZwfqw"><img src="https://img.shields.io/badge/Slack-verl-blueviolet?logo=slack&amp"></a>

				<a href="https://arxiv.org/pdf/2409.19256"><img src="https://img.shields.io/static/v1?label=EuroSys&message=Paper&color=red"></a>

				![GitHub contributors](https://img.shields.io/github/contributors/volcengine/verl)

				[![Documentation](https://img.shields.io/badge/documentation-blue)](https://verl.readthedocs.io/en/latest/)

				<a href="https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>

				</div>

				![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)

				<h1 style="text-align: center;">verl: Volcano Engine Reinforcement Learning for LLMs</h1>

				verl is a flexible, efficient and production-ready RL training library for large language models (LLMs).

				@ -16,7 +27,7 @@ verl is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF

				verl is flexible and easy to use with:

				- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex Post-Training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.

				- **Easy extension of diverse RL algorithms**: The hybrid-controller programming model enables flexible representation and efficient execution of complex post-training dataflows. Build RL dataflows such as GRPO, PPO in a few lines of code.

				- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as FSDP, Megatron-LM, vLLM, SGLang, etc

				@ -24,7 +35,6 @@ verl is flexible and easy to use with:

				- Ready integration with popular HuggingFace models

				verl is fast with:

				- **State-of-the-art throughput**: SOTA LLM training and inference engine integrations and SOTA RL throughput.

				@ -34,86 +44,144 @@ verl is fast with:

				</p>

				## News

				- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is [publicly available](https://github.com/volcengine/verl/tree/gm-tyx/puffin/main/recipe/dapo) now.

				- [2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!

				- [2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [LMSys Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid March.

				- [2025/02] verl v0.2.0.post2 is released! See [release note](https://github.com/volcengine/verl/releases/) for details.

				- [2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).

				- [2025/08] verl is presented in the [PyTorch Expert Exchange Webinar](https://www.youtube.com/watch?v=Vd79NmmqY3Q&t=2s). [Slides](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl_talk_pytorch_2025_08.pdf) available.

				- [2025/07] The [ReTool](https://arxiv.org/pdf/2504.11536) recipe is fully open sourced. [Blog](https://www.notion.so/verl-reTool-recipe-Using-multi-round-conversations-and-code-sandboxing-to-improve-the-math-of-large-23a8b5b7feba80b386b2e5b5e3c1cde0)

				- [2025/07] The first verl meetup will be held at ICML Vancouver on July 16th! Please [join us](https://lu.ma/0ek2nyao) if you are at ICML! (onsite only)

				- [2025/06] verl with Megatron backend enables large MoE models such as [DeepSeek-671B and Qwen3-235B](https://verl.readthedocs.io/en/latest/perf/dpsk.html).

				- [2025/03] [DAPO](https://dapo-sia.github.io/) is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO (DeepSeek-R1-Zero-Qwen-32B). DAPO's training is fully powered by verl and the reproduction code is available in `recipe/dapo` now.

				<details><summary> more... </summary>

				<ul>

				  <li>[2025/04] [Seed-Thinking-v1.5](https://github.com/ByteDance-Seed/Seed-Thinking-v1.5/blob/main/seed-thinking-v1.5.pdf) tech report is released! Trained with verl, Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains.</li>

				  <li>[2025/07] verl keynote at [AWS AI Hours Singapore](https://pages.awscloud.com/aws-ai-hours-sg.html#agenda) on 7/8, verl & verl-agent project updates at [Agent for SWE meetup](https://lu.ma/e498qhsi) by LF AI & Data Singapore on 7/11.</li>

				  <li>[2025/06] verl team will provide latest project updates at [PyTorch Day China](https://www.lfasiallc.com/pytorch-day-china/) on June 7th. Meet our dev team in Beijing!</li>

				  <li> [2025/04] [VAPO](https://arxiv.org/pdf/2504.05118) (value-based augmented PPO) paper covers our latest RL method for reasoning models. Trained from Qwen-32B-base model, VAPO achieves 60.4 on AIME 2024, outperforming DAPO-32B.</li>

				  <li>[2025/05] [PF-PPO](https://arxiv.org/abs/2409.06957), accepted to ICML 2025, is now supported in verl! PF-PPO enhances policy learning efficiency and robustness by filtering potentially noisy reward signals and reusing high-quality experiences via a replay buffer.</li>

				  <li>[2025/04] We will give a tutorial about latest post-training techniques and programming guide for verl at [ICLR 2025 Expo](https://iclr.cc/virtual/2025/calendar?filter_events=Expo+Talk+Panel&filter_rooms=), [SCI-FM workshop](https://open-foundation-model.github.io/) and [LMSys afterparty](https://lu.ma/d23nyynm). Talk materials available [here](https://github.com/eric-haibin-lin/verl-community/tree/main/iclr25). </li>

				  <li>[2025/03] verl v0.3.0.post1 is released! See [release note](https://github.com/volcengine/verl/releases/) for details. It achieves [~1.4x speedup](https://tongyx361.github.io/blogs/posts/verl-intro/#/verl-flexible-and-efficient-rl-for-llms) compared to prev versions.</li>

				  <li>[2025/05] verl will be presented at [A2M Shanghai](https://a2m.msup.com.cn/home/?aid=4488&city=shanghai) on 5/16 - 5/17.</li>

				  <li>[2025/05] verl will be presented at [GOSIM x PyTorch Day 2025](https://paris2025.gosim.org/). See you in Paris! </li>

				  <li>[2025/03] We introduced the programming model of verl at the [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/n77GibL2corAtQHtVEAzfg) and [verl intro and updates](https://github.com/eric-haibin-lin/verl-community/blob/main/slides/verl-lmsys-meetup.pdf) at the [SGLang-LMSYS Org Meetup](https://lu.ma/ntjrr7ig) in Sunnyvale mid-March.</li>

				  <li>[2025/03] We will present verl(HybridFlow) at EuroSys 2025. See you in Rotterdam!</li>

				  <li>[2025/02] verl v0.2.0.post2 is released!</li>

				  <li>[2025/02] We presented verl in the <a href="https://lu.ma/ji7atxux">Bytedance/NVIDIA/Anyscale Ray Meetup</a>. See you in San Jose!</li>

				  <li>[2025/01] [Doubao-1.5-pro](https://team.doubao.com/zh/special/doubao_1_5_pro) is released with SOTA-level performance on LLM & VLM. The RL scaling preview model is trained using verl, reaching OpenAI O1-level performance on math benchmarks (70.0 pass@1 on AIME).</li>

				  <li>[2024/12] verl is presented at Ray Forward 2024. Slides available <a href="https://github.com/eric-haibin-lin/verl-community/blob/main/slides/Ray_Forward_2024_%E5%B7%AB%E9%94%A1%E6%96%8C.pdf">here</a></li>

				  <li>[2024/10] verl is presented at Ray Summit. <a href="https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37">Youtube video</a> available.</li>

				  <li>[2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024. <a href="https://github.com/eric-haibin-lin/verl-data/tree/neurips">Slides</a> and <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">video</a> available.</li>

				  <li>[2024/10] verl is presented at Ray Summit. <a href="https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37">Youtube video</a> available.</li>

				  <li>[2024/08] HybridFlow (verl) is accepted to EuroSys 2025.</li>

				</ul>   

				</details>

				## Key Features

				- **FSDP** and **Megatron-LM** for training.

				- **vLLM**, **SGLang**(experimental) and **HF Transformers** for rollout generation.

				- Compatible with Hugging Face Transformers and Modelscope Hub: Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc

				- **FSDP**, **FSDP2** and **Megatron-LM** for training.

				- **vLLM**, **SGLang** and **HF Transformers** for rollout generation.

				- Compatible with Hugging Face Transformers and Modelscope Hub: [Qwen-3](https://github.com/volcengine/verl/blob/main/examples/grpo_trainer/run_qwen3-8b.sh), Qwen-2.5, Llama3.1, Gemma2, DeepSeek-LLM, etc

				- Supervised fine-tuning.

				- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [ReMax](examples/remax_trainer/), [Reinforce++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), etc.

				  - Support model-based reward and function-based reward (verifiable reward)

				  - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh)

				- Reinforcement learning with [PPO](examples/ppo_trainer/), [GRPO](examples/grpo_trainer/), [GSPO](recipe/gspo/), [ReMax](examples/remax_trainer/), [REINFORCE++](https://verl.readthedocs.io/en/latest/examples/config.html#algorithm), [RLOO](examples/rloo_trainer/), [PRIME](recipe/prime/), [DAPO](recipe/dapo/), [DrGRPO](recipe/drgrpo), [KL_Cov & Clip_Cov](recipe/entropy) etc.

				  - Support model-based reward and function-based reward (verifiable reward) for math, [coding](https://github.com/volcengine/verl/tree/main/recipe/dapo), etc

				  - Support vision-language models (VLMs) and [multi-modal RL](examples/grpo_trainer/run_qwen2_5_vl-7b.sh) with Qwen2.5-vl, Kimi-VL

				  - [Multi-turn with tool calling](https://github.com/volcengine/verl/tree/main/examples/sglang_multiturn)

				- LLM alignment recipes such as [Self-play preference optimization (SPPO)](https://github.com/volcengine/verl/tree/main/recipe/sppo)

				- Flash attention 2, [sequence packing](examples/ppo_trainer/run_qwen2-7b_seq_balance.sh), [sequence parallelism](examples/ppo_trainer/run_deepseek7b_llm_sp2.sh) support via DeepSpeed Ulysses, [LoRA](examples/sft/gsm8k/run_qwen_05_peft.sh), [Liger-kernel](examples/sft/gsm8k/run_qwen_05_sp2_liger.sh).

				- Scales up to 70B models and hundreds of GPUs.

				- Scales up to 671B models and hundreds of GPUs with [expert parallelism](https://github.com/volcengine/verl/pull/1467)

				- Multi-gpu [LoRA RL](https://verl.readthedocs.io/en/latest/advance/ppo_lora.html) support to save memory.

				- Experiment tracking with wandb, swanlab, mlflow and tensorboard.

				## Upcoming Features

				- DeepSeek 671b optimizations with Megatron v0.11

				- Multi-turn rollout optimizations

				## Upcoming Features and Changes

				- Q3 Roadmap https://github.com/volcengine/verl/issues/2388

				- DeepSeek 671b optimizations with Megatron https://github.com/volcengine/verl/issues/1033

				- Multi-turn rollout and tools using optimizations https://github.com/volcengine/verl/issues/1882

				- [Agent integration](https://github.com/volcengine/verl/tree/main/verl/experimental/agent_loop)

				- Async and off-policy architecture https://github.com/volcengine/verl/pull/2231

				- List of breaking changes since v0.4 https://github.com/volcengine/verl/discussions/2270

				## Getting Started

				<a href="https://verl.readthedocs.io/en/latest/index.html"><b>Documentation</b></a>

				**Quickstart:**

				- [Installation](https://verl.readthedocs.io/en/latest/start/install.html)

				- [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html)

				- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html)

				- [Programming Guide](https://verl.readthedocs.io/en/latest/hybrid_flow.html) & [Tech Talk](https://hcqnc.xetlk.com/sl/3vACOK) (in Chinese)

				- [PPO in verl](https://verl.readthedocs.io/en/latest/algo/ppo.html)

				- [GRPO in verl](https://verl.readthedocs.io/en/latest/algo/grpo.html)

				**Running a PPO example step-by-step:**

				- Data and Reward Preparation

				  - [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)

				  - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)

				- Understanding the PPO Example

				  - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)

				  - [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)

				  - [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html)

				- [Prepare Data for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)

				- [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)

				- [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)

				- [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)

				**Reproducible algorithm baselines:**

				- [PPO, GRPO, ReMax](https://verl.readthedocs.io/en/latest/experiment/ppo.html)

				- [RL performance on coding, math](https://verl.readthedocs.io/en/latest/algo/baseline.html)

				**For code explanation and advance usage (extension):**

				- PPO Trainer and Workers

				  - [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html)

				  - [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)

				  - [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html)

				- Advance Usage and Extension

				  - [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)

				  - [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)

				- Advanced Usage and Extension

				  - [Add Models with the FSDP Backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)

				  - [Add Models with the Megatron-LM Backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)

				  - [Multi-turn Rollout Support](https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html)

				  - [Search Tool Integration](https://verl.readthedocs.io/en/latest/sglang_multiturn/search_tool_example.html)

				  - [Sandbox Fusion Integration](https://verl.readthedocs.io/en/latest/examples/sandbox_fusion_example.html)

				  - [Deployment using Separate GPU Resources](https://github.com/volcengine/verl/tree/main/examples/split_placement)

				  - [Extend to Other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)

				  - [Ray API design tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)

				**Blogs from the community**

				- [使用verl进行GRPO分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)

				- [HybridFlow veRL 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)

				- [最高提升20倍吞吐量！豆包大模型团队发布全新 RLHF 框架，现已开源！](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)

				- [When Reasoning Models Break Tokenization: The Hidden Complexity of Multiturn Training](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/fast_tokenization/multiturn_tokenization_and_masking.md)

				- [verl deployment on AWS SageMaker](https://medium.com/@kaige.yang0110/run-verl-on-sagemaker-using-4x8-l40s-gpus-8e6d5c3c61d3)

				- [verl x SGLang Multi-turn Code Walkthrough](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/code-walk-through/readme_EN.md)

				- [Optimizing SGLang Memory Usage in verl](https://hebiao064.github.io/rl-memory-management)

				- [SGLang, verl, OpenBMB and Tsinghua University: Pioneering End-to-End Multi-Turn RLHF](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/verl-multiturn-rollout-Release.md)

				- [Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration](https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html)

				- [veMLP x verl ：玩转强化学习训练](https://mp.weixin.qq.com/s/7nbqxk4knMGd-hQE9ls2tA)

				- [使用 verl 进行 GRPO 分布式强化学习训练最佳实践](https://www.volcengine.com/docs/6459/1463942)

				- [HybridFlow verl 原文浅析](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/readme.md)

				- [最高提升 20 倍吞吐量！豆包大模型团队发布全新 RLHF 框架，现已开源！](https://team.doubao.com/en/blog/%E6%9C%80%E9%AB%98%E6%8F%90%E5%8D%8720%E5%80%8D%E5%90%9E%E5%90%90%E9%87%8F-%E8%B1%86%E5%8C%85%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%9B%A2%E9%98%9F%E5%8F%91%E5%B8%83%E5%85%A8%E6%96%B0-rlhf-%E6%A1%86%E6%9E%B6-%E7%8E%B0%E5%B7%B2%E5%BC%80%E6%BA%90)

				## Performance Tuning Guide

				The performance is essential for on-policy RL algorithm. We have written a detailed [performance tuning guide](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) to help you optimize performance.

				## Use vLLM v0.8

				veRL now supports vLLM>=0.8.0 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for installation guide and more information.

				## Upgrade to vLLM >= v0.8.2

				verl now supports vLLM>=0.8.2 when using FSDP as the training backend. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.8.md) for the installation guide and more information. Please avoid vllm 0.7.x, which contains bugs that may lead to OOMs and unexpected errors.

				## Use Latest SGLang

				SGLang is fully supported with verl, and SGLang RL Group is working extensively on building unique features, including multi-turn agentic RL, VLM RLHF, server-based RL, and partial rollout. Please refer to [this document](https://verl.readthedocs.io/en/latest/workers/sglang_worker.html) for the installation guide and more information.

				## Upgrade to FSDP2

				verl is fully embracing FSDP2! FSDP2 is recommended by torch distributed team, providing better throughput and memory usage, and is composible with other features (e.g. torch.compile). To enable FSDP2, simply use verl main and set the following options:

				```

				actor_rollout_ref.ref.strategy=fsdp2

				actor_rollout_ref.actor.strategy=fsdp2

				critic.strategy=fsdp2 

				reward_model.strategy=fsdp2 

				```

				Furthermore, FSDP2 cpu offloading is compatible with gradient accumulation. You can turn it on to save memory with `actor_rollout_ref.actor.fsdp_config.offload_policy=True`. For more details, see https://github.com/volcengine/verl/pull/1026

				## AMD Support (ROCm Kernel)

				verl now supports FSDP as the training engine (Megatron support coming soon) and both integrates with vLLM and SGLang as inference engines. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst) for the installation guide and more information, and [this document](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_vllm_page.rst) for the vLLM performance tuning for ROCm.

				## Citation and acknowledgement

				If you find the project helpful, please cite:

				- [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)

				- [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf)

				@ -126,41 +194,74 @@ If you find the project helpful, please cite:

				}

				```

				verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, and many more.

				verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and contributed by Bytedance, Anyscale, LMSys.org, [Alibaba Qwen team](https://github.com/QwenLM/), Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, University of Hong Kong, ke.com, [All Hands AI](https://www.all-hands.dev/), [ModelBest](http://modelbest.cn/), JD AI Lab, Microsoft Research, [StepFun](https://www.stepfun.com/), Amazon, LinkedIn, Meituan, [Camel-AI](https://www.camel-ai.org/), [OpenManus](https://github.com/OpenManus), Xiaomi, NVIDIA research, [Baichuan](https://www.baichuan-ai.com/home), [RedNote](https://www.xiaohongshu.com/), [SwissAI](https://www.swiss-ai.org/), [Moonshot AI (Kimi)](https://www.moonshot-ai.com/), Baidu, Snowflake, Skywork.ai, JetBrains, [IceSword Lab](https://www.iceswordlab.com), and many more.

				## Awesome work using verl

				- [TinyZero](https://github.com/Jiayi-Pan/TinyZero): a reproduction of **DeepSeek R1 Zero** recipe for reasoning tasks ![GitHub Repo stars](https://img.shields.io/github/stars/Jiayi-Pan/TinyZero)

				- [DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B ![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)

				- [SkyThought](https://github.com/NovaSky-AI/SkyThought): RL training for Sky-T1-7B by NovaSky AI team. ![GitHub Repo stars](https://img.shields.io/github/stars/NovaSky-AI/SkyThought)

				- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason): SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild ![GitHub Repo stars](https://img.shields.io/github/stars/hkust-nlp/simpleRL-reason)

				- [Easy-R1](https://github.com/hiyouga/EasyR1): **Multi-modal** RL training framework ![GitHub Repo stars](https://img.shields.io/github/stars/hiyouga/EasyR1)

				- [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL): LLM Agents RL tunning framework for multiple agent environments. ![GitHub Repo stars](https://img.shields.io/github/stars/OpenManus/OpenManus-RL)

				- [deepscaler](https://github.com/agentica-project/deepscaler): iterative context scaling with GRPO ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/deepscaler)

				- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME)

				- [rllm](https://github.com/agentica-project/rllm): async RL training with [verl-pipeline](https://github.com/agentica-project/verl-pipeline) ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/rllm)

				- [RAGEN](https://github.com/ZihanWang314/ragen): a general-purpose reasoning **agent** training framework ![GitHub Repo stars](https://img.shields.io/github/stars/ZihanWang314/ragen)

				- [Logic-RL](https://github.com/Unakar/Logic-RL): a reproduction of DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset. ![GitHub Repo stars](https://img.shields.io/github/stars/Unakar/Logic-RL)

				- [Search-R1](https://github.com/PeterGriffinJin/Search-R1): RL with reasoning and **searching (tool-call)** interleaved LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/PeterGriffinJin/Search-R1)

				- [ReSearch](https://github.com/Agent-RL/ReSearch): Learning to **Re**ason with **Search** for LLMs via Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Agent-RL/ReSearch)

				- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): Hacking **Real Search Engines** and **retrievers** with LLMs via RL for **information retrieval** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)

				- [cognitive-behaviors](https://github.com/kanishkg/cognitive-behaviors): Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs ![GitHub Repo stars](https://img.shields.io/github/stars/kanishkg/cognitive-behaviors)

				- [MetaSpatial](https://github.com/PzySeere/MetaSpatial): Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse ![GitHub Repo stars](https://img.shields.io/github/stars/PzySeere/MetaSpatial)

				- [DeepEnlighten](https://github.com/DolbyUUU/DeepEnlighten): Reproduce R1 with **social reasoning** tasks and analyze key findings ![GitHub Repo stars](https://img.shields.io/github/stars/DolbyUUU/DeepEnlighten)

				- [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): Skywork open reaonser series ![GitHub Repo stars](https://img.shields.io/github/stars/SkyworkAI/Skywork-OR1)

				- [ToRL](https://github.com/GAIR-NLP/ToRL): Scaling tool-integrated RL ![GitHub Repo stars](https://img.shields.io/github/stars/GAIR-NLP/ToRL)

				- [Absolute Zero Reasoner](https://github.com/LeapLabTHU/Absolute-Zero-Reasoner): [A no human curated data self-play framework for reasoning](https://arxiv.org/abs/2505.03335) ![GitHub Repo stars](https://img.shields.io/github/stars/LeapLabTHU/Absolute-Zero-Reasoner)

				- [verl-agent](https://github.com/langfengQ/verl-agent): A scalable training framework for **long-horizon LLM/VLM agents**, along with a new algorithm **GiGPO** ![GitHub Repo stars](https://img.shields.io/github/stars/langfengQ/verl-agent)

				- [RL-Factory](https://github.com/Simple-Efficient/RL-Factory): An easy and efficient RL post-training framework for Agentic Learning ![GitHub Repo stars](https://img.shields.io/github/stars/Simple-Efficient/RL-Factory)

				- [ReTool](https://retool-rl.github.io/): ReTool: reinforcement learning for strategic tool use in LLMs. Code release is in progress...

				- [verl-tool](https://github.com/TIGER-AI-Lab/verl-tool): An unified and easy-to-extend tool-agent training framework based on verl![GitHub Repo stars](https://img.shields.io/github/stars/TIGER-AI-Lab/verl-tool)

				- [PRIME](https://github.com/PRIME-RL/PRIME): Process reinforcement through implicit rewards ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/PRIME)

				- [MemAgent](https://github.com/BytedTsinghua-SIA/MemAgent): MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent ![GitHub Repo stars](https://img.shields.io/github/stars/BytedTsinghua-SIA/MemAgent)

				- [POLARIS](https://github.com/ChenxinAn-fdu/POLARIS): A Post-training recipe for scaling RL on Advanced Reasoning models ![GitHub Repo stars](https://img.shields.io/github/stars/ChenxinAn-fdu/POLARIS)

				- [GUI-R1](https://github.com/ritzz-ai/GUI-R1): **GUI-R1**: A Generalist R1-style Vision-Language Action Model For **GUI Agents** ![GitHub Repo stars](https://img.shields.io/github/stars/ritzz-ai/GUI-R1)

				- [DeepRetrieval](https://github.com/pat-jj/DeepRetrieval): RL Training of **Search Agent** with **Search/Retrieval Outcome** ![GitHub Repo stars](https://img.shields.io/github/stars/pat-jj/DeepRetrieval)

				- [Code-R1](https://github.com/ganler/code-r1): Reproducing R1 for **Code** with Reliable Rewards ![GitHub Repo stars](https://img.shields.io/github/stars/ganler/code-r1)

				- [self-rewarding-reasoning-LLM](https://arxiv.org/pdf/2502.19613): self-rewarding and correction with **generative reward models** ![GitHub Repo stars](https://img.shields.io/github/stars/RLHFlow/Self-rewarding-reasoning-LLM)

				- [critic-rl](https://github.com/HKUNLP/critic-rl): LLM critics for code generation ![GitHub Repo stars](https://img.shields.io/github/stars/HKUNLP/critic-rl)

				- [DQO](https://arxiv.org/abs/2410.09302): Enhancing multi-Step reasoning abilities of language models through direct Q-function optimization

				- [FIRE](https://arxiv.org/abs/2410.21236): Flaming-hot initiation with regular execution sampling for large language models

				- [DeepResearcher](https://github.com/GAIR-NLP/DeepResearcher): Scaling deep research via reinforcement learning in real-world environments ![GitHub Repo stars](https://img.shields.io/github/stars/GAIR-NLP/DeepResearcher)

				- [VAGEN](https://github.com/RAGEN-AI/VAGEN): Training VLM agents with multi-turn reinforcement learning ![GitHub Repo stars](https://img.shields.io/github/stars/RAGEN-AI/VAGEN)

				- [RM-R1](https://arxiv.org/abs/2505.02387): RL training of reasoning reward models ![GitHub Repo stars](https://img.shields.io/github/stars/RM-R1-UIUC/RM-R1)

				- [LUFFY](https://arxiv.org/pdf/2504.14945): Learning to Reason under Off-Policy Guidance![GitHub Repo stars](https://img.shields.io/github/stars/ElliottYan/LUFFY)

				- [DeepMath](https://github.com/zwhe99/DeepMath): DeepMath-103K data and series models for math reasoning![GitHub Repo stars](https://img.shields.io/github/stars/zwhe99/DeepMath)

				- [PACS](https://github.com/ritzz-ai/PACS): Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR ![GitHub Repo stars](https://img.shields.io/github/stars/ritzz-ai/PACS)

				- [Entropy Mechanism of RL](https://github.com/PRIME-RL/Entropy-Mechanism-of-RL): The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/Entropy-Mechanism-of-RL)

				- [LLaSA-TTS-GRPO](https://github.com/channel-io/ch-tts-llasa-rl-grpo): TTS fine-tuning with GRPO optimization based on LLASA models ![GitHub Repo stars](https://img.shields.io/github/stars/channel-io/ch-tts-llasa-rl-grpo)

				- [PF-PPO](https://arxiv.org/abs/2409.06957): Policy Filtration for PPO based on the reliability of reward signals for more efficient and robust RLHF.

				- [RACRO](https://github.com/gyhdog99/RACRO2): Build multi-modal reasoning models via decoupling it into query-conditioned captioning and text-only reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/gyhdog99/RACRO2)

				- [Agent Lightning](https://github.com/microsoft/agent-lightning): A flexible and extensible framework that enables seamless agent optimization for any existing agent framework. ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/agent-lightning)

				- [VTool-R1](https://github.com/VTOOL-R1/vtool-r1): VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use. ![GitHub Repo stars](https://img.shields.io/github/stars/VTOOL-R1/vtool-r1)

				- [Kimina-Prover-RL](https://github.com/project-numina/kimina-prover-rl/tree/main/recipe/kimina_prover_rl): Training pipeline for formal theorem proving, based on a paradigm inspired by DeepSeek-R1.

				- [RL-PLUS](https://github.com/YihongDong/RL-PLUS): Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization.

				- [rStar2-Agent](https://github.com/microsoft/rStar): Using reinforcement learning with multi-step tool-calling for math tasks, rStar2-Agent-14B reaches frontier-level math reasoning in just 510 RL training steps ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/rStar)

				- [Vision-SR1](https://github.com/zli12321/Vision-SR1): Self-Rewarding Vision-Language Model via Reasoning Decomposition ![GitHub Repo stars](https://img.shields.io/github/stars/zli12321/Vision-SR1)

				- [SimpleVLA-RL](https://github.com/PRIME-RL/SimpleVLA-RL): SimpleVLA-RL: A Simple yet Effective Vision-Language Action Model for Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/PRIME-RL/SimpleVLA-RL)

				- [Table-R1](https://github.com/Table-R1/Table-R1): Table-R1: Inference-Time Scaling for Table Reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/Table-R1/Table-R1)

				- [Revisual-R1](https://github.com/CSfufu/Revisual-R1): Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning ![GitHub Repo stars](https://img.shields.io/github/stars/CSfufu/Revisual-R1)

				- [ARES](https://github.com/shawn0728/ARES): ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping ![GitHub Repo stars](https://img.shields.io/github/stars/shawn0728/ARES)

				- [Meta-Bandit-LLM](https://github.com/sanxing-chen/meta-bandit-llm): Meta-Bandit-LLM: Long-horizon multiturn interactive training for meta-bandit agents ![GitHub Repo stars](https://img.shields.io/github/stars/sanxing-chen/meta-bandit-llm)

				and many more awesome work listed in [recipe](recipe/README.md).

				## Contribution Guide

				Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/22) and [release plan](https://github.com/volcengine/verl/issues/354) to see where you can contribute.

				### Code formatting

				We use yapf (Google style) to enforce strict code formatting when reviewing PRs. To reformat your code locally, make sure you have installed the **latest** version of `yapf`

				```bash

				pip3 install yapf --upgrade

				```

				Then, make sure you are at top level of verl repo and run

				```bash

				bash scripts/format.sh

				```

				We are HIRING! Send us an [email](mailto:haibin.lin@bytedance.com) if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment.

				See [contributions guide](CONTRIBUTING.md)

				## About [ByteDance Seed Team](https://team.doubao.com/)

				Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society. You can get to know Bytedance Seed better through the following channels👇

				<div>

				  <a href="https://team.doubao.com/">

				    <img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>

				  <a href="https://github.com/user-attachments/assets/469535a8-42f2-4797-acdf-4f7a1d4a0c3e">

				    <img src="https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white"></a>

				 <a href="https://www.xiaohongshu.com/user/profile/668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search">

				    <img src="https://img.shields.io/badge/Xiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white"></a>

				  <a href="https://www.zhihu.com/org/dou-bao-da-mo-xing-tuan-dui/">

				    <img src="https://img.shields.io/badge/zhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white"></a>

				</div>

				---

				We are HIRING! Send us an [email](mailto:the.verl.project@gmail.com) if you are interested in internship/FTE opportunities in RL for agents.

57

docker/Apptainerfile.rocm Normal file

View File

 @ -0,0 +1,57 @@
 Bootstrap: docker
 # Support - Traing: fsdp; Inference: vllm
 # FROM: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
 # Support - Traing: fsdp; Inference: vllm, sglang
 FROM lmsysorg/sglang:v0.4.5-rocm630
 %environment
     export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
     export HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
     export CFLAGS="-D__HIP_PLATFORM_AMD__"
     export CXXFLAGS="-D__HIP_PLATFORM_AMD__"
 %post
     # Create source directory
     mkdir -p /opt/src
     # Uninstall and reinstall vllm
     pip uninstall -y vllm
     cd /opt/src
     git clone -b v0.6.3 https://github.com/vllm-project/vllm.git
     cd vllm
     MAX_JOBS=$(nproc) python3 setup.py install
     cd /opt
     rm -rf /opt/src/vllm
     # Install dependencies
     pip install "tensordict<0.6" --no-deps
     pip install accelerate \
         codetiming \
         datasets \
         dill \
         hydra-core \
         liger-kernel \
         numpy \
         pandas \
         peft \
         "pyarrow>=15.0.0" \
         pylatexenc \
         "ray[data,train,tune,serve]" \
         torchdata \
         transformers \
         wandb \
         orjson \
         pybind11
     # Clone and install verl from GitHub
     cd /opt
     git clone https://github.com/volcengine/verl.git
     cd verl
     # Uncomment to use a specific version
     # git checkout v0.3.0.post0
     pip install -e . --no-deps
     # Install torch_memory_saver
     pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps

									
										55

docker/Dockerfile.extention.awsefa
									
										Normal file
									
												View File
												
				@ -0,0 +1,55 @@

				# Base Image support aws EFA

				# Build Image with frameworks based on this

				FROM verlai/verl:app-verl0.5-sglang0.4.6.post5-mcore0.12.2

				# For aws instances with EFA net interface (Sagemaker AI Pod)

				#     install EFA driver:

				######## AWS EFA ############

				ENV NCCL_VERSION=2.25.1-1

				ENV DEBIAN_FRONTEND=noninteractive

				ENV EFA_INSTALLER_VERSION=1.40.0

				ENV AWS_OFI_NCCL_VERSION=1.14.2

				ENV FI_EFA_SET_CUDA_SYNC_MEMOPS=0

				ENV FI_PROVIDER=efa

				RUN apt update && apt install -y linux-image-generic libhwloc-dev

				RUN cd /tmp && \

				    curl -O https://efa-installer.amazonaws.com/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz  && \

				    tar -xf aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz && \

				    cd aws-efa-installer && \

				    ./efa_installer.sh -y -g --skip-kmod --skip-limit-conf --no-verify && \

				    ldconfig && \

				    rm -rf /tmp/aws-efa-installer /var/lib/apt/lists/*

				# NCCL EFA Plugin

				RUN cd /tmp && \

				    curl -LO https://github.com/aws/aws-ofi-nccl/archive/refs/tags/v${AWS_OFI_NCCL_VERSION}.tar.gz && \

				    tar -xzf /tmp/v${AWS_OFI_NCCL_VERSION}.tar.gz && \

				    rm /tmp/v${AWS_OFI_NCCL_VERSION}.tar.gz && \

				    mv aws-ofi-nccl-${AWS_OFI_NCCL_VERSION} aws-ofi-nccl && \

				    cd /tmp/aws-ofi-nccl && \

				    ./autogen.sh && \

				    ./configure --prefix=/opt/amazon/efa \

				    --with-libfabric=/opt/amazon/efa \

				    --with-cuda=/usr/local/cuda \

				    --enable-platform-aws \

				    --with-mpi=/opt/amazon/openmpi && \

				    make -j$(nproc) install && \

				    rm -rf /tmp/aws-ofi/nccl

				# NCCL

				RUN echo "/usr/local/lib"      >> /etc/ld.so.conf.d/local.conf && \

				    echo "/opt/amazon/openmpi/lib" >> /etc/ld.so.conf.d/efa.conf && \

				    ldconfig

				ENV OMPI_MCA_pml=^cm,ucx            \

				    OMPI_MCA_btl=tcp,self           \

				    OMPI_MCA_btl_tcp_if_exclude=lo,docker0,veth_def_agent \

				    OPAL_PREFIX=/opt/amazon/openmpi \

				    NCCL_SOCKET_IFNAME=^docker,lo,veth_def_agent  \

				    FI_EFA_USE_HUGE_PAGE=0

				# docker build -t verl:awsefa --label "commit=$(git rev-parse --short HEAD)" .

				# on aws:

				# docker run --ipc=host --privileged --name verldev --gpus all --network=host --shm-size=1800gb -itd verl:awsefa

									
										9

docker/Dockerfile.megatron
									
												View File
											
				@ -1,9 +0,0 @@

				FROM verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3

				RUN pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

				RUN cd /opt/nvidia && git clone --single-branch --branch core_r0.11.0 https://github.com/NVIDIA/Megatron-LM.git Megatron-LM

				# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed

				# unset for now

				RUN cd /opt/nvidia/Megatron-LM && pip3 install --no-deps -e .

									
										17

docker/Dockerfile.ngc.vllm
									
												View File
												
				@ -3,12 +3,12 @@ FROM nvcr.io/nvidia/pytorch:24.05-py3

				# uninstall nv-pytorch fork

				RUN pip3 uninstall pytorch-quantization \

				     pytorch-triton \

				     torch \

				     torch-tensorrt \

				     torchvision \

				     xgboost transformer_engine flash_attn \

				     apex megatron-core -y

				    pytorch-triton \

				    torch \

				    torch-tensorrt \

				    torchvision \

				    xgboost transformer_engine flash_attn \

				    apex megatron-core -y

				RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

				@ -35,10 +35,11 @@ RUN pip3 install --no-cache-dir \

				    'tensordict<0.6' \

				    'transformers' \

				    'vllm==0.6.3.post1' \

				    'wandb'

				    'wandb' \

				    'tensorboard'

				# full dependencies

				RUN pip3 install pytest yapf py-spy pyext liger-kernel

				RUN pip3 install pytest pre-commit py-spy pyext liger-kernel

				# =============== Megatron dependencies (optional) =================

				# install Transformer Engine, which requires FA 2.5.8. Do it in a separate step for docker cache

									
										43

docker/Dockerfile.ngc.vllm0.8
									
												View File
												
				@ -1,17 +1,13 @@

				# Start from the NVIDIA official image (ubuntu-22.04 + python-3.10)

				# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:24.08-py3

				# uninstall nv-pytorch fork

				RUN pip3 uninstall -y pytorch-quantization \

				    pytorch-triton torch torch-tensorrt torchvision \

				    xgboost transformer_engine flash_attn apex megatron-core

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Define installation arguments

				@ -42,21 +38,34 @@ RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Install torch-2.6.0 + vllm-0.8.2

				RUN pip install --no-cache-dir vllm==0.8.2 torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata \

				    transformers>=4.49.0 accelerate datasets peft hf-transfer \

				    ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \

				    pytest yapf py-spy pyext pre-commit ruff

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    xgboost transformer_engine flash_attn apex megatron-core grpcio

				# Install flash_attn-2.7.4.post1

				RUN pip uninstall -y transformer-engine flash-attn && \

				    wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \

				# Install torch-2.6.0+cu124 + vllm-0.8.3

				# torch-2.6.0+cu124: cxx11abi=False

				# torch-2.6.0+cu126: cxx11abi=True

				# see https://github.com/flashinfer-ai/flashinfer/issues/911

				RUN pip install --no-cache-dir "vllm==0.8.3" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata \

				    "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=15.0.0" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \

				    pytest py-spy pyext pre-commit ruff tensorboard

				# Install flash-attn-2.7.4.post1 (cxx11abi=False)

				RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \

				    pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

				# Fix cv2

				# Install flashinfer-0.2.2.post1+cu124 (cxx11abi=False)

				# vllm-0.8.3 does not support flashinfer>=0.2.3

				# see https://github.com/vllm-project/vllm/pull/15777

				RUN wget -nv https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl

				# Fix packages

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6 && \

				    pip install --no-cache-dir --upgrade optree>=0.13.0

				    pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				# Install verl

				RUN pip install --no-cache-dir verl[vllm] -U

									
										2

docker/Dockerfile.ngc.vllm0.8.sagemaker
									
												View File
												
				@ -27,7 +27,7 @@ RUN apt-get update && \

				RUN pip install --no-cache-dir vllm==0.8.2 torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata==0.11.0 \

				    transformers>=4.49.0 accelerate datasets peft hf-transfer \

				    ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \

				    pytest yapf py-spy pyext pre-commit ruff

				    pytest pre-commit py-spy pyext ruff tensorboard

				# Install flash_attn-2.7.4.post1

				RUN pip uninstall -y transformer-engine flash-attn && \

									
										321

docker/Dockerfile.rocm
									
												View File
												
				@ -1,30 +1,296 @@

				#  Build the docker in the repo dir:

				# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .

				# docker images # you can find your built docker

				# FROM "compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.4:94_ubuntu22.04_py3.10_pytorch_release-2.7_575e247"

				# FROM "rlfoundation.azurecr.io/rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04"

				FROM "rlsys/rocm-6.3.4-patch:rocm6.3.4-numa-patch_ubuntu-22.04"

				SHELL ["/bin/bash", "-ceuxo", "pipefail"]

				ENV MAX_JOBS=512

				ENV PATH="/usr/local/python3.12/bin:$PATH"

				RUN ln -sf /usr/bin/python3.12 /usr/bin/python && \

				    ln -sf /usr/bin/pip3.12 /usr/bin/pip

				############################################

				############################################

				RUN apt-get update

				RUN apt-get install -y pkg-config liblzma-dev

				############################################

				############################################

				FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

				###########################################

				##########Install TransformerEngine########

				###########################################

				WORKDIR /workspace/

				# transformer-engine install

				# https://github.com/ROCm/TransformerEngine

				# Set working directory

				# WORKDIR $PWD/app

				RUN rm -rf TransformerEngine 

				RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git

				WORKDIR /workspace/TransformerEngine

				RUN git checkout 236178e5

				# git checkout bb061ade

				# git checkout 864405c

				ENV NVTE_FRAMEWORK=pytorch 

				ENV NVTE_ROCM_ARCH=gfx942 

				ENV NVTE_USE_HIPBLASLT=1

				ENV NVTE_USE_ROCM=1  

				# export CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr:${CMAKE_PREFIX_PATH:-}"

				ENV CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr"

				# ENV NVTE_BUILD_MAX_JOBS=$(MAX_JOBS)

				RUN MAX_JOBS=$(MAX_JOBS) pip install . -vvv 

				WORKDIR /workspace/

				###########################################

				###########################################

				###########################################

				####################################################################################

				################Install vllm - sglang require vllm 0.6.7 dependency#################

				####################################################################################

				#### Require vllm 0.6.7 - checkout 113274a0

				WORKDIR /workspace/

				RUN rm -rf vllm

				RUN pip uninstall -y vllm

				# Refer to here (down-grade vllm to 0.6.3): https://docs.vllm.ai/en/v0.6.3/getting_started/amd-installation.html

				RUN git clone https://github.com/ROCm/vllm.git

				# git clone https://github.com/vllm-project/vllm.git

				WORKDIR /workspace/vllm

				RUN git checkout 113274a0

				ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"

				#ENV MAX_JOBS=512

				ENV MAX_JOBS=${MAX_JOBS}

				RUN pip install "boto3>=1.26.0"

				RUN pip install setuptools_scm

				# will add src into py. You can delete the repo

				RUN python3 setup.py install

				WORKDIR /workspace/

				####################################################################################

				####################################################################################

				####################################################################################

				###########################################

				############For hack docker################

				###########################################

				RUN pip install setuptools==75.8.0

				###########################################

				###########################################

				###########################################

				###########################################

				############build sgalng###################

				###########################################

				# Set environment variables

				ENV BASE_DIR=/sgl-workspace

				ENV BUILD_TYPE=all

				ENV SGL_REPO=https://github.com/sgl-project/sglang

				ENV SGL_BRANCH=v0.4.6.post5

				ENV TRITON_REPO=https://github.com/ROCm/triton.git

				ENV TRITON_COMMIT=improve_fa_decode_3.0.0

				ENV AITER_REPO=https://github.com/ROCm/aiter.git

				ENV AITER_COMMIT=v0.1.2

				# v0.1.2 version - commit id: 9d11f47

				# ENV AITER_COMMIT=9d11f47

				ENV HIP_FORCE_DEV_KERNARG=1

				ENV HSA_NO_SCRATCH_RECLAIM=1

				ENV SGLANG_SET_CPU_AFFINITY=1

				ENV SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1

				ENV NCCL_MIN_NCHANNELS=112

				ENV MOE_PADDING=1

				ENV VLLM_FP8_PADDING=1

				ENV VLLM_FP8_ACT_PADDING=1

				ENV VLLM_FP8_WEIGHT_PADDING=1

				ENV VLLM_FP8_REDUCE_CONV=1

				ENV TORCHINDUCTOR_MAX_AUTOTUNE=1

				ENV TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1

				ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"

				ENV AMDGPU_TARGETS=gfx942

				ENV ROCM_ARCH=gfx942

				ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"

				# Install vllm

				RUN pip uninstall -y vllm && \

				    rm -rf vllm && \

				    git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \

				    cd vllm && \

				    MAX_JOBS=$(nproc) python3 setup.py install && \

				    cd .. && \

				    rm -rf vllm

				# Switch to working directory

				WORKDIR /sgl-workspace

				# Copy the entire project directory

				COPY . .

				# Clean and create directory

				RUN rm -rf /sgl-workspace && mkdir -p /sgl-workspace

				# Install dependencies

				RUN pip install "tensordict<0.6" --no-deps && \

				# Clone and build sglang

				RUN git clone ${SGL_REPO} \

				    && cd sglang \

				    && git checkout ${SGL_BRANCH} || echo "Using default branch" \

				    && cd sgl-kernel \

				    && rm -f pyproject.toml \

				    && mv pyproject_rocm.toml pyproject.toml \

				    && python setup_rocm.py install \

				    && cd .. \

				    && if [ "$BUILD_TYPE" = "srt" ]; then \

				         python -m pip --no-cache-dir install -e "python[srt_hip]"; \

				       else \

				         python -m pip --no-cache-dir install -e "python[all_hip]"; \

				       fi \

				    && cd /sgl-workspace \

				    && cp -r /sgl-workspace/sglang /sglang \

				    && python -m pip cache purge

				# Install common Python packages

				RUN pip install IPython orjson python-multipart torchao pybind11

				# Rebuild Triton

				RUN pip uninstall -y triton || true \

				    && git clone ${TRITON_REPO} \

				    && cd triton \

				    && git checkout ${TRITON_COMMIT} \

				    && cd python \

				    && python3 setup.py install \

				    && cd /sgl-workspace

				# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942 --amdgpu-lower-module-lds-strategy=1"

				# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"

				# Build aiter

				#version: Commit 9d11f47

				    # && git checkout ${AITER_COMMIT} \

				RUN pip uninstall -y aiter || true

				RUN git clone ${AITER_REPO} \

				    && cd aiter \

				    && git checkout ${AITER_COMMIT} \

				    && git submodule sync \

				    && git submodule update --init --recursive \

				    && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py install \

				    && cd /sgl-workspace

				    # && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \

				    # && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \

				# Copy MI300X config 

				RUN find /sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/ \

				         /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/ \

				         -type f -name '*MI300X*' | \

				         xargs -I {} sh -c 'vf_config=$(echo "$1" | sed "s/MI300X/MI300X_VF/"); cp "$1" "$vf_config"' -- {}

				# Environment setup complete.

				RUN echo "Environment setup complete."

				WORKDIR /workspace/

				###########################################

				###########################################

				###########################################

				###########################################

				###############vllm v0.8.5#################

				###########################################

				# ENV GITHUB_USERNAME=yushengsu-thu

				# ENV GITHUB_MAIL=yushengsu@gmail.com

				# RUN git config --global user.name "${GITHUB_USERNAME}" \

				#     && git config --global user.email "${GITHUB_MAIL}" 

				WORKDIR /workspace/

				ENV VLLM_TARGET_DEVICE=rocm 

				ENV ROCM_PATH=/opt/rocm 

				ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev

				# Find the repo path in: DockerFile/Dockerfile.rocm_yang

				# RUN git clone https://github.com/RLFoundation/vllm-patch.git

				RUN pip uninstall -y vllm || true

				RUN rm -rf vllm-patch

				RUN git clone https://github.com/RLFoundation/vllm-patch.git \

				    && cd vllm-patch \

				    && git checkout v0.8.5-sleep-numa \

				    && rm -rf build/ dist/ *.egg-info \

				    && ln -sf /opt/rocm/lib/libamdhip64.so /usr/lib/libamdhip64.so \

				    && SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py install

				    # RUN SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py develop

				WORKDIR /workspace/

				###########################################

				###########################################

				###########################################

				#########################################

				#### Install megatron-core###############

				#########################################

				RUN pip uninstall -y megatron-core && \

				    git clone https://github.com/yushengsu-thu/Megatron-LM-amd_version.git && \

				    cd Megatron-LM-amd_version && \

				    pip install -vvv -e . && \

				    cd /workspace/

				#########################################

				#########################################

				#########################################

				#######################################

				################apex###################

				#######################################

				WORKDIR /workspace/

				RUN pip uninstall -y apex && \

				    git clone https://github.com/ROCm/apex.git && \

				    cd apex && \

				    python setup.py install && \

				    cd /workspace/ 

				#######################################

				#######################################

				#######################################

				################################################################################

				###########################Add torch_memory_saver###############################

				################################################################################

				# Set environment variables

				ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"

				ENV CFLAGS="-D__HIP_PLATFORM_AMD__"

				ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"

				RUN pip install "git+https://github.com/YangWang92/torch_memory_saver_numa.git@numa"

				################################################################################

				################################################################################

				################################################################################

				########################################

				######Install ray#######################

				########################################

				# need to add this patch: https://github.com/ray-project/ray/pull/53531/files

				RUN pip uninstall ray -y

				RUN pip install "ray[data,train,tune,serve]>=2.47.0" 

				########################################

				########################################

				########################################

				##########################################

				#######Install other dependencies#########

				##########################################

				RUN pip install "tensordict==0.6.2" --no-deps && \

				    pip install accelerate \

				    codetiming \

				    datasets \

				@ -36,10 +302,21 @@ RUN pip install "tensordict<0.6" --no-deps && \

				    peft \

				    "pyarrow>=15.0.0" \

				    pylatexenc \

				    "ray[data,train,tune,serve]" \

				    torchdata \

				    transformers \

				    wandb \

				    orjson \

				    pybind11 && \

				    pip install -e . --no-deps

				    pybind11

				WORKDIR /workspace/

				RUN git clone https://github.com/volcengine/verl.git && \

				    cd verl && \

				    pip install -e . 

				##########################################

				##########################################

				##########################################

				WORKDIR /workspace/

				CMD ["/usr/bin/bash"]

									
										141

docker/Dockerfile.rocm7
									
										Normal file
									
												View File
												
				@ -0,0 +1,141 @@

				# default base image

				ARG REMOTE_VLLM="1"

				ARG COMMON_WORKDIR=/app

				ARG BASE_IMAGE=rocm/vllm-dev:base_rocm7_0930_rc1_20250916_tuned_20250917

				FROM ${BASE_IMAGE} AS base

				ARG ARG_PYTORCH_ROCM_ARCH

				ENV PYTORCH_ROCM_ARCH=${ARG_PYTORCH_ROCM_ARCH:-${PYTORCH_ROCM_ARCH}}

				# Install some basic utilities

				RUN apt-get update -q -y && apt-get install -q -y \

				    sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev \

				    apt-transport-https ca-certificates wget curl

				# Remove sccache

				RUN python3 -m pip install --upgrade pip

				RUN apt-get purge -y sccache; python3 -m pip uninstall -y sccache; rm -f "$(which sccache)"

				ARG COMMON_WORKDIR

				WORKDIR ${COMMON_WORKDIR}

				# -----------------------

				# vLLM fetch stages

				FROM base AS fetch_vllm_0

				ONBUILD COPY ./ vllm/

				FROM base AS fetch_vllm_1

				#ARG VLLM_REPO="https://github.com/ROCm/vllm.git"

				#ARG VLLM_BRANCH="main"

				ARG VLLM_REPO=https://github.com/HollowMan6/vllm.git

				ARG VLLM_BRANCH="sleep_amd"

				ONBUILD RUN git clone ${VLLM_REPO} \

				            && cd vllm \

				            && git checkout ${VLLM_BRANCH}

				FROM fetch_vllm_${REMOTE_VLLM} AS fetch_vllm

				# -----------------------

				# vLLM build stages

				FROM fetch_vllm AS build_vllm

				# Build vLLM

				RUN cd vllm \

				    && python3 -m pip install -r requirements/rocm.txt \

				    && python3 setup.py clean --all  \

				    && ln -sf /opt/rocm/lib/libamdhip64.so /usr/lib/libamdhip64.so \

				    && VLLM_TARGET_DEVICE=rocm ROCM_PATH=/opt/rocm/ VLLM_GPU_LANG=HIP SETUPTOOLS_SCM_PRETEND_VERSION=0.8.4.dev python3 setup.py bdist_wheel --dist-dir=dist

				    #&& python3 setup.py bdist_wheel --dist-dir=dist

				FROM scratch AS export_vllm

				ARG COMMON_WORKDIR

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/dist/*.whl /

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/requirements /requirements

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/benchmarks /benchmarks

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/tests /tests

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/examples /examples

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/.buildkite /.buildkite

				# -----------------------

				# Test vLLM image

				FROM base AS test

				RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*

				# Install vLLM

				RUN --mount=type=bind,from=export_vllm,src=/,target=/install \

				    cd /install \

				    && pip install -U -r requirements/rocm.txt \

				    && pip install -U -r requirements/rocm-test.txt \

				    && pip uninstall -y vllm \

				    && pip install *.whl

				WORKDIR /vllm-workspace

				ARG COMMON_WORKDIR

				COPY --from=build_vllm ${COMMON_WORKDIR}/vllm /vllm-workspace

				# install development dependencies (for testing)

				RUN cd /vllm-workspace \

				    && rm -rf vllm \

				    && python3 -m pip install -e tests/vllm_test_utils \

				    && python3 -m pip install lm-eval[api]==0.4.4 \

				    && python3 -m pip install pytest-shard

				# -----------------------

				# Final vLLM image

				FROM base AS final

				RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*

				# Error related to odd state for numpy 1.20.3 where there is no METADATA etc, but an extra LICENSES_bundled.txt.

				# Manually remove it so that later steps of numpy upgrade can continue

				RUN case "$(which python3)" in \

				        *"/opt/conda/envs/py_3.9"*) \

				            rm -rf /opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy-1.20.3.dist-info/;; \

				        *) ;; esac

				RUN python3 -m pip install --upgrade huggingface-hub[cli]

				# Install vLLM

				RUN --mount=type=bind,from=export_vllm,src=/,target=/install \

				    cd /install \

				    && pip install -U -r requirements/rocm.txt \

				    && pip uninstall -y vllm \

				    && pip install *.whl

				ARG COMMON_WORKDIR

				# Copy over the benchmark scripts as well

				COPY --from=export_vllm /benchmarks ${COMMON_WORKDIR}/vllm/benchmarks

				COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples

				ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1

				ENV TOKENIZERS_PARALLELISM=false

				# ENV that can improve safe tensor loading, and end-to-end time

				ENV SAFETENSORS_FAST_GPU=1

				# Performance environment variable.

				ENV HIP_FORCE_DEV_KERNARG=1

				# -----------------------

				# Install verl

				RUN pip install "tensordict==0.6.2" --no-deps && \

				    pip install accelerate \

				    codetiming \

				    datasets \

				    dill \

				    hydra-core \

				    liger-kernel \

				    numpy \

				    pandas \

				    peft \

				    "pyarrow>=15.0.0" \

				    pylatexenc \

				    torchdata \

				    wandb \

				    orjson \

				    pybind11

				WORKDIR /workspace/

				RUN git clone https://github.com/volcengine/verl.git && \

				    cd verl && \

				    pip install -e .

				CMD ["/bin/bash"]

									
										58

docker/Dockerfile.rocm_verl-0.3.0.post1
									
										Normal file
									
												View File
												
				@ -0,0 +1,58 @@

				#  Build the docker in the repo dir:

				# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .

				# docker images # you can find your built docker

				# Support - Traing: fsdp; Inference: vllm

				# FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

				# Support - Traing: fsdp; Inference: vllm, sglang

				FROM lmsysorg/sglang:v0.4.6.post5-rocm630

				# Set working directory

				# WORKDIR $PWD/app

				# Set environment variables

				ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"

				ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"

				ENV CFLAGS="-D__HIP_PLATFORM_AMD__"

				ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"

				# Install vllm

				RUN pip uninstall -y vllm && \

				    rm -rf vllm && \

				    git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \

				    cd vllm && \

				    MAX_JOBS=$(nproc) python3 setup.py install && \

				    cd .. && \

				    rm -rf vllm

				# Copy the entire project directory

				COPY . .

				# Install dependencies

				RUN pip install "tensordict==0.6.2" --no-deps && \

				    pip install accelerate \

				    codetiming \

				    datasets \

				    dill \

				    hydra-core \

				    liger-kernel \

				    numpy \

				    pandas \

				    peft \

				    "pyarrow>=15.0.0" \

				    pylatexenc \

				    "ray[data,train,tune,serve]<2.45.0" \

				    torchdata \

				    transformers \

				    wandb \

				    orjson \

				    pybind11

				RUN git clone https://github.com/volcengine/verl.git && \

				    cd verl && \

				    pip install -e . 

				# Install torch_memory_saver

				RUN pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps

									
										323

docker/Dockerfile.rocm_verl-0.4.1
									
										Normal file
									
												View File
												
				@ -0,0 +1,323 @@

				# FROM "compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.4:94_ubuntu22.04_py3.10_pytorch_release-2.7_575e247"

				# FROM "rlfoundation.azurecr.io/rocm6.3.4:vllm-0.8.5-numa-patch-ubuntu-22.04"

				FROM "rlsys/rocm-6.3.4-patch:rocm6.3.4-numa-patch_ubuntu-22.04"

				SHELL ["/bin/bash", "-ceuxo", "pipefail"]

				ENV MAX_JOBS=512

				ENV PATH="/usr/local/python3.12/bin:$PATH"

				RUN ln -sf /usr/bin/python3.12 /usr/bin/python && \

				    ln -sf /usr/bin/pip3.12 /usr/bin/pip

				############################################

				############################################

				RUN apt-get update

				RUN apt-get install -y pkg-config liblzma-dev

				############################################

				############################################

				###########################################

				##########Install TransformerEngine########

				###########################################

				WORKDIR /workspace/

				# transformer-engine install

				# https://github.com/ROCm/TransformerEngine

				RUN rm -rf TransformerEngine 

				RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git

				WORKDIR /workspace/TransformerEngine

				RUN git checkout 236178e5

				# git checkout bb061ade

				# git checkout 864405c

				ENV NVTE_FRAMEWORK=pytorch 

				ENV NVTE_ROCM_ARCH=gfx942 

				ENV NVTE_USE_HIPBLASLT=1

				ENV NVTE_USE_ROCM=1  

				# export CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr:${CMAKE_PREFIX_PATH:-}"

				ENV CMAKE_PREFIX_PATH="/opt/rocm:/opt/rocm/hip:/usr/local:/usr"

				# ENV NVTE_BUILD_MAX_JOBS=$(MAX_JOBS)

				RUN MAX_JOBS=$(MAX_JOBS) pip install . -vvv 

				WORKDIR /workspace/

				###########################################

				###########################################

				###########################################

				####################################################################################

				################Install vllm - sglang require vllm 0.6.7 dependency#################

				####################################################################################

				#### Require vllm 0.6.7 - checkout 113274a0

				WORKDIR /workspace/

				RUN rm -rf vllm

				RUN pip uninstall -y vllm

				# Refer to here (down-grade vllm to 0.6.3): https://docs.vllm.ai/en/v0.6.3/getting_started/amd-installation.html

				RUN git clone https://github.com/ROCm/vllm.git

				# git clone https://github.com/vllm-project/vllm.git

				WORKDIR /workspace/vllm

				RUN git checkout 113274a0

				ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"

				#ENV MAX_JOBS=512

				ENV MAX_JOBS=${MAX_JOBS}

				RUN pip install "boto3>=1.26.0"

				RUN pip install setuptools_scm

				# will add src into py. You can delete the repo

				RUN python3 setup.py install

				WORKDIR /workspace/

				####################################################################################

				####################################################################################

				####################################################################################

				###########################################

				############For hack docker################

				###########################################

				RUN pip install setuptools==75.8.0

				###########################################

				###########################################

				###########################################

				###########################################

				############build sgalng###################

				###########################################

				# Set environment variables

				ENV BASE_DIR=/sgl-workspace

				ENV BUILD_TYPE=all

				ENV SGL_REPO=https://github.com/sgl-project/sglang

				ENV SGL_BRANCH=v0.4.6.post5

				ENV TRITON_REPO=https://github.com/ROCm/triton.git

				ENV TRITON_COMMIT=improve_fa_decode_3.0.0

				ENV AITER_REPO=https://github.com/ROCm/aiter.git

				ENV AITER_COMMIT=v0.1.2

				# v0.1.2 version - commit id: 9d11f47

				# ENV AITER_COMMIT=9d11f47

				ENV HIP_FORCE_DEV_KERNARG=1

				ENV HSA_NO_SCRATCH_RECLAIM=1

				ENV SGLANG_SET_CPU_AFFINITY=1

				ENV SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1

				ENV NCCL_MIN_NCHANNELS=112

				ENV MOE_PADDING=1

				ENV VLLM_FP8_PADDING=1

				ENV VLLM_FP8_ACT_PADDING=1

				ENV VLLM_FP8_WEIGHT_PADDING=1

				ENV VLLM_FP8_REDUCE_CONV=1

				ENV TORCHINDUCTOR_MAX_AUTOTUNE=1

				ENV TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1

				ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"

				ENV AMDGPU_TARGETS=gfx942

				ENV ROCM_ARCH=gfx942

				ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"

				# Switch to working directory

				WORKDIR /sgl-workspace

				# Clean and create directory

				RUN rm -rf /sgl-workspace && mkdir -p /sgl-workspace

				# Clone and build sglang

				RUN git clone ${SGL_REPO} \

				    && cd sglang \

				    && git checkout ${SGL_BRANCH} || echo "Using default branch" \

				    && cd sgl-kernel \

				    && rm -f pyproject.toml \

				    && mv pyproject_rocm.toml pyproject.toml \

				    && python setup_rocm.py install \

				    && cd .. \

				    && if [ "$BUILD_TYPE" = "srt" ]; then \

				         python -m pip --no-cache-dir install -e "python[srt_hip]"; \

				       else \

				         python -m pip --no-cache-dir install -e "python[all_hip]"; \

				       fi \

				    && cd /sgl-workspace \

				    && cp -r /sgl-workspace/sglang /sglang \

				    && python -m pip cache purge

				# Install common Python packages

				RUN pip install IPython orjson python-multipart torchao pybind11

				# Rebuild Triton

				RUN pip uninstall -y triton || true \

				    && git clone ${TRITON_REPO} \

				    && cd triton \

				    && git checkout ${TRITON_COMMIT} \

				    && cd python \

				    && python3 setup.py install \

				    && cd /sgl-workspace

				# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942 --amdgpu-lower-module-lds-strategy=1"

				# ENV HIPCC_COMPILE_FLAGS_APPEND="--offload-arch=gfx942"

				# Build aiter

				#version: Commit 9d11f47

				    # && git checkout ${AITER_COMMIT} \

				RUN pip uninstall -y aiter || true

				RUN git clone ${AITER_REPO} \

				    && cd aiter \

				    && git checkout ${AITER_COMMIT} \

				    && git submodule sync \

				    && git submodule update --init --recursive \

				    && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py install \

				    && cd /sgl-workspace

				    # && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \

				    # && PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python3 setup.py develop \

				# Copy MI300X config 

				RUN find /sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/ \

				         /sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/ \

				         -type f -name '*MI300X*' | \

				         xargs -I {} sh -c 'vf_config=$(echo "$1" | sed "s/MI300X/MI300X_VF/"); cp "$1" "$vf_config"' -- {}

				# Environment setup complete.

				RUN echo "Environment setup complete."

				WORKDIR /workspace/

				###########################################

				###########################################

				###########################################

				###########################################

				###############vllm v0.8.5#################

				###########################################

				# ENV GITHUB_USERNAME=yushengsu-thu

				# ENV GITHUB_MAIL=yushengsu@gmail.com

				# RUN git config --global user.name "${GITHUB_USERNAME}" \

				#     && git config --global user.email "${GITHUB_MAIL}" 

				WORKDIR /workspace/

				ENV VLLM_TARGET_DEVICE=rocm 

				ENV ROCM_PATH=/opt/rocm 

				ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev

				# Find the repo path in: DockerFile/Dockerfile.rocm_yang

				# RUN git clone https://github.com/RLFoundation/vllm-patch.git

				RUN pip uninstall -y vllm || true

				RUN rm -rf vllm-patch

				RUN git clone https://github.com/RLFoundation/vllm-patch.git \

				    && cd vllm-patch \

				    && git checkout v0.8.5-sleep-numa \

				    && rm -rf build/ dist/ *.egg-info \

				    && ln -sf /opt/rocm/lib/libamdhip64.so /usr/lib/libamdhip64.so \

				    && SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py install

				    # RUN SETUPTOOLS_SCM_PRETEND_VERSION=0.8.5.dev PYTORCH_ROCM_ARCH="gfx90a;gfx942" MAX_JOBS=${MAX_JOBS} python3 setup.py develop

				WORKDIR /workspace/

				###########################################

				###########################################

				###########################################

				#########################################

				#### Install megatron-core###############

				#########################################

				RUN pip uninstall -y megatron-core && \

				    git clone https://github.com/yushengsu-thu/Megatron-LM-amd_version.git && \

				    cd Megatron-LM-amd_version && \

				    pip install -vvv -e . && \

				    cd /workspace/

				#########################################

				#########################################

				#########################################

				#######################################

				################apex###################

				#######################################

				WORKDIR /workspace/

				RUN pip uninstall -y apex && \

				    git clone https://github.com/ROCm/apex.git && \

				    cd apex && \

				    python setup.py install && \

				    cd /workspace/ 

				#######################################

				#######################################

				#######################################

				################################################################################

				###########################Add torch_memory_saver###############################

				################################################################################

				# Set environment variables

				ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"

				ENV CFLAGS="-D__HIP_PLATFORM_AMD__"

				ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"

				RUN pip install "git+https://github.com/YangWang92/torch_memory_saver_numa.git@numa"

				################################################################################

				################################################################################

				################################################################################

				########################################

				######Install ray#######################

				########################################

				# need to add this patch: https://github.com/ray-project/ray/pull/53531/files

				RUN pip uninstall ray -y

				RUN pip install "ray[data,train,tune,serve]>=2.47.0" 

				########################################

				########################################

				########################################

				##########################################

				#######Install other dependencies#########

				##########################################

				RUN pip install "tensordict==0.6.2" --no-deps && \

				    pip install accelerate \

				    codetiming \

				    datasets \

				    dill \

				    hydra-core \

				    liger-kernel \

				    numpy \

				    pandas \

				    peft \

				    "pyarrow>=15.0.0" \

				    pylatexenc \

				    torchdata \

				    wandb \

				    orjson \

				    pybind11

				WORKDIR /workspace/

				RUN git clone https://github.com/volcengine/verl.git && \

				    cd verl && \

				    pip install -e . 

				##########################################

				##########################################

				##########################################

				WORKDIR /workspace/

				CMD ["/usr/bin/bash"]

				CMD ["/usr/bin/bash"]

									
										55

docker/Dockerfile.sglang
									
										Normal file
									
												View File
												
				@ -0,0 +1,55 @@

				# Start from the NVIDIA official image (ubuntu-22.04 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:24.08-py3

				# Define environments

				ENV MAX_JOBS=32

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				# Define installation arguments

				ARG APT_SOURCE=https://mirrors.ustc.edu.cn/ubuntu/

				# Set apt source

				RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \

				    { \

				    echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \

				    } > /etc/apt/sources.list

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install tini

				RUN apt-get update && \

				    apt-get install -y tini && \

				    apt-get clean

				# Change pip source

				ARG PIP_INDEX=https://mirrors.aliyun.com/pypi/simple/

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Install sglang-0.4.6.post5 and torch-memory-saver

				RUN pip uninstall -y cuda-python && pip install "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir

				# Install torch-2.6.0

				RUN pip install --no-cache-dir torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata \

				    transformers>=4.49.0 accelerate datasets peft hf_transfer \

				    ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb liger-kernel \

				    pytest pre-commit py-spy pyext

				# Install flash_attn-2.7.4.post1

				RUN pip uninstall -y transformer-engine flash-attn && \

				    wget -v https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \

				    pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

				# Fix cv2

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6

									
										2

docker/Dockerfile.vemlp.vllm.te
									
												View File
												
				@ -23,7 +23,7 @@ RUN pip3 install --no-cache-dir \

				RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation

				# vllm depends on ray, and veRL does not support ray > 2.37

				# vllm depends on ray

				RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10

				# install apex

									
										115

docker/Dockerfile.vllm.sglang.megatron.deepseek
									
										Normal file
									
												View File
												
				@ -0,0 +1,115 @@

				# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:24.08-py3

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Define installation arguments

				ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

				ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

				# Set apt source

				RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \

				    { \

				    echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \

				    } > /etc/apt/sources.list

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install tini

				RUN apt-get update && \

				    apt-get install -y tini aria2 && \

				    apt-get clean

				# Change pip source

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    xgboost transformer_engine flash_attn apex megatron-core grpcio

				# Reinstall CUDA 12.4

				RUN aria2c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \

				    mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

				RUN aria2c --always-resume=true --max-tries=99999 https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \

				    dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \

				    cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cuda-toolkit-12-4 && \

				    rm cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \

				    update-alternatives --set cuda /usr/local/cuda-12.4 && \

				    rm -rf /usr/local/cuda-12.6

				# Install torch-2.6.0+cu124 + vllm-0.8.5.post1 + sglang-0.4.6.post5

				# torch-2.6.0+cu124: cxx11abi=False

				# torch-2.6.0+cu126: cxx11abi=True

				# see https://github.com/flashinfer-ai/flashinfer/issues/911

				# Install sglang-0.4.6.post1 and torch-memory-saver

				RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install --resume-retries 999 torch-memory-saver --no-cache-dir

				RUN pip install --resume-retries 999 --no-cache-dir "vllm==0.8.5.post1" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" "tensordict==0.6.2" torchdata

				RUN pip install --resume-retries 999 --no-cache-dir "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=15.0.0" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile \

				    pytest py-spy pyext pre-commit ruff

				# Install flash-attn-2.7.4.post1 (cxx11abi=False)

				RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \

				    pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

				# Fix packages

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				# Install cudnn

				RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cudnn-cuda-12 && \

				    rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install Apex

				RUN git clone https://github.com/NVIDIA/apex.git && \

				    cd apex && \

				    pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/TransformerEngine.git@v2.3

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Fix opencv

				RUN pip install opencv-python

				RUN pip install opencv-fixer && \

				    python -c "from opencv_fixer import AutoFix; AutoFix()"

				# Install verl

				# Reset pip config

				RUN pip config unset global.index-url && \

				    pip config unset global.extra-index-url

				    RUN apt-get update && \

				    apt-get install -y aria2 libfreeimage3 libfreeimage-dev zlib1g

									
										72

docker/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,72 @@

				# Dockerfiles of verl

				We provide pre-built Docker images for quick setup. And from this version, we utilize a new image release hierarchy for productivity and stability.

				The image types are divided into three large categories:

				- **Base Image**: Without inference and training frameworks, only basic dependencies are installed. Can directly install vllm or SGLang on top of it, without need of reinstall torch or CUDA.

				- **Application Image**: Stable version with inference and training frameworks installed.

				- **Preview Image**: Unstable version with the latest frameworks and features.

				The first two types of images are hosted on dockerhub [verlai/verl](https://hub.docker.com/r/verlai/verl) repository, while the preview images are hosted on community repository.

				> The image versions are mapped with verl releases, for example, image with tag ``verl0.4`` is built for verl release ``v0.4.x``.

				## Base Image

				The stable base image is ``verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4`` with different CUDA versions.

				The update of base image is not frequent, and the app image can be built on top of it without reinstalling base packages.

				## Application Image

				From this version, we divide images built for vLLM and SGLang as the divergence of dependent packages like FlashInfer.

				There are 2 types of application images available:

				- **vLLM with FSDP and Megatron**: ``verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2``

				- **SGLang with FSDP and Megatron**: `verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2`

				Docker images with Megatron backends are runnable with large language model like ``Qwen/Qwen3-235B-A22B``, ``deepseek-ai/DeepSeek-V3-0324`` post-training. Refer to the :doc:`Large Language Model Post-Training documentation<../perf/dpsk>` for more details.

				Application images can be updated frequently, and the Dockerfile can be found in ``docker/verl[version]-[packages]/Dockerfile.app.[frameworks]``. Based on the base image, it is easy to build your own application image with the desired inference and training frameworks.

				## Community Image

				For vLLM with FSDP, please refer to [hiyouga/verl](https://hub.docker.com/r/hiyouga/verl) repository and the latest version is ``hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.4-flashinfer0.2.2-cxx11abi0``.

				For SGLang with FSDP, please refer to [ocss884/verl-sglang](https://hub.docker.com/r/ocss884/verl-sglang) repository and the latest version is ``ocss884/verl-sglang:ngc-th2.6.0-cu126-sglang0.4.6.post5`` which is provided by SGLang RL Group.

				For latest vLLM with Megatron, please refer to [iseekyan/verl](https://hub.docker.com/r/iseekyan/verl) repository and the latest version is ``iseekyan/verl:nemo.gptoss_vllm0.11.0``.

				See files under ``docker/`` for NGC-based image or if you want to build your own.

				Note that For aws instances with EFA net interface (Sagemaker AI Pod), you need to install EFA driver as shown in ``docker/Dockerfile.extenstion.awsefa``

				## Installation from Docker

				After pulling the desired Docker image and installing desired inference and training frameworks, you can run it with the following steps:

				1. Launch the desired Docker image and attach into it:

				```sh

				docker create --runtime=nvidia --gpus all --net=host --shm-size="10g" --cap-add=SYS_ADMIN -v .:/workspace/verl --name verl <image:tag> sleep infinity

				docker start verl

				docker exec -it verl bash

				```

				2. If you use the images provided, you only need to install verl itself without dependencies:

				```sh

				# install the nightly version (recommended)

				git clone https://github.com/volcengine/verl && cd verl

				pip3 install --no-deps -e .

				```

				[Optional] If you hope to switch between different frameworks, you can install verl with the following command:

				```sh

				# install the nightly version (recommended)

				git clone https://github.com/volcengine/verl && cd verl

				pip3 install -e .[vllm]

				pip3 install -e .[sglang]

				```

									
										41

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12
									
										Normal file
									
												View File
												
				@ -0,0 +1,41 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.6.post5 and torch-memory-saver

				RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir

				# Some sglang operations in 0.4.6.post5 require vllm

				# [Warning] vllm can have some packages not compatible with sglang, for example, flashinfer

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Fix for transformers 4.53.0

				RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										82

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12.deepep
									
										Normal file
									
												View File
												
				@ -0,0 +1,82 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.6.post5 and torch-memory-saver

				RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir

				# Some sglang operations in 0.4.6.post5 require vllm

				# [Warning] vllm can have some packages not compatible with sglang, for example, flashinfer

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Fix for transformers 4.53.0

				RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

				# Install DeepEP

				## the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				## Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				## Build deepep-nvshmem

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

									
										82

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.13.preview
									
										Normal file
									
												View File
												
				@ -0,0 +1,82 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.6.post5 and torch-memory-saver

				RUN pip install --resume-retries 999 "sglang[all]==0.4.6.post5" --no-cache-dir --find-links https://flashinfer.ai/whl/cu124/torch2.6/flashinfer-python && pip install torch-memory-saver --no-cache-dir

				# Some sglang operations in 0.4.6.post5 require vllm

				# [Warning] vllm can have some packages not compatible with sglang, for example, flashinfer

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_r0.13.0

				# Fix for transformers 4.53.0

				RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

				# Install DeepEP

				## the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				## Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				## Build deepep-nvshmem

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

									
										47

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12
									
										Normal file
									
												View File
												
				@ -0,0 +1,47 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install torch-2.6.0+cu124 + vllm-0.8.5.post1

				# torch-2.6.0+cu124: cxx11abi=False

				# torch-2.6.0+cu126: cxx11abi=True

				# see https://github.com/flashinfer-ai/flashinfer/issues/911

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1

				# Install flashinfer-0.2.2.post1+cu126 (cxx11abi=True)

				# vllm-0.8.3 does not support flashinfer>=0.2.3

				# see https://github.com/vllm-project/vllm/pull/15777

				RUN aria2c --max-tries=9999 https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    rm flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Fix for transformers 4.53.0

				RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										88

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12.deepep
									
										Normal file
									
												View File
												
				@ -0,0 +1,88 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install torch-2.6.0+cu124 + vllm-0.8.5.post1

				# torch-2.6.0+cu124: cxx11abi=False

				# torch-2.6.0+cu126: cxx11abi=True

				# see https://github.com/flashinfer-ai/flashinfer/issues/911

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1

				# Install flashinfer-0.2.2.post1+cu126 (cxx11abi=True)

				# vllm-0.8.3 does not support flashinfer>=0.2.3

				# see https://github.com/vllm-project/vllm/pull/15777

				RUN aria2c --max-tries=9999 https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    rm flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Fix for transformers 4.53.0

				RUN pip3 install --no-cache-dir "transformers[hf_xet]<4.52.0"

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

				# Install DeepEP

				## the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				## Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				## Build deepep-nvshmem

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

									
										85

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.13.preview
									
										Normal file
									
												View File
												
				@ -0,0 +1,85 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install torch-2.6.0+cu124 + vllm-0.8.5.post1

				# torch-2.6.0+cu124: cxx11abi=False

				# torch-2.6.0+cu126: cxx11abi=True

				# see https://github.com/flashinfer-ai/flashinfer/issues/911

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.8.5.post1

				# Install flashinfer-0.2.2.post1+cu126 (cxx11abi=True)

				# vllm-0.8.3 does not support flashinfer>=0.2.3

				# see https://github.com/vllm-project/vllm/pull/15777

				RUN aria2c --max-tries=9999 https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \

				    rm flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

				# Install DeepEP

				## the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				## Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				## Build deepep-nvshmem

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

									
										113

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.base
									
										Normal file
									
												View File
												
				@ -0,0 +1,113 @@

				# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks

				# Target: verlai/verl:base-v2-cu124-cudnn9.8-torch2.6-fa2.8.0-te2.3

				# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:24.08-py3

				# Define environments

				ENV MAX_JOBS=16

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Define installation arguments

				ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

				ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

				# Set apt source

				RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \

				    { \

				    echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \

				    } > /etc/apt/sources.list

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install tini

				RUN apt-get update && \

				    apt-get install -y tini aria2 && \

				    apt-get clean

				# Change pip source

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    xgboost transformer_engine flash_attn apex megatron-core grpcio

				# Reinstall CUDA 12.4

				RUN aria2c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \

				    mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

				RUN aria2c --always-resume=true --max-tries=99999 https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \

				    dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \

				    cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cuda-toolkit-12-4 && \

				    rm cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb && \

				    update-alternatives --set cuda /usr/local/cuda-12.4 && \

				    rm -rf /usr/local/cuda-12.6

				RUN pip install --resume-retries 999 --no-cache-dir torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0

				RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				# Install flash-attn-2.7.4.post1 (cxx11abi=False)

				RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \

				    pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

				# Fix packages

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				# Install cudnn

				RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cudnn-cuda-12 && \

				    rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb

				# Install Apex

				RUN git clone https://github.com/NVIDIA/apex.git && \

				    cd apex && \

				    pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

				# Profiling tools

				RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    apt-get update && apt-get install -y libxcb-cursor0 && \

				    dpkg -i ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    rm -rf /usr/local/cuda/bin/nsys && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys  /usr/local/cuda/bin/nsys && \

				    rm -rf /usr/local/cuda/bin/nsys-ui && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \

				    rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb

				# Fix opencv

				RUN pip install --resume-retries 999 --no-cache-dir opencv-python

				RUN pip install --resume-retries 999 --no-cache-dir opencv-fixer && \

				    python -c "from opencv_fixer import AutoFix; AutoFix()"

				RUN pip install --resume-retries 999 --no-cache-dir cuda-bindings

				# Reset pip config

				RUN pip config unset global.index-url && \

				    pip config unset global.extra-index-url

				RUN apt-get update && \

				    apt-get install -y libfreeimage3 libfreeimage-dev zlib1g htop

									
										31

docker/verl0.4-cu124-torch2.6-fa2.7.4/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				# verl image with verl v0.4.x

				## Important packages version

				```txt

				cuda==12.4

				cudnn==9.8.0

				torch==2.6.0

				flash_attn=2.7.4

				sglang==0.4.6.post5

				vllm==0.8.5.post1

				vidia-cudnn-cu12==9.8.0.87

				transformer_engine==2.3

				megatron.core==core_v0.12.2

				# Preview

				transformer_engine==2.5

				megatron.core==core_r0.13.0

				```

				## Target

				- Base image: 

				    - `verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4`

				- App image:

				    - `verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.12.2-te2.2`: SGLang requires vLLM in 0.4.6.post5 version, vLLM can have some package conflicts with SGLang

				    - `verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.12.2-te2.2-deepep`: Built with deepep

				    - `verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.2-te2.2`

				    - `verlai/verl:app-verl0.4-vllm0.8.5-mcore0.12.2-te2.2-deepep`: Built with deepep

				- Preview image:

				    - `verlai/verl:app-verl0.4-sglang0.4.6.post5-vllm0.8.5-mcore0.13.0-te2.2-preview`

				    - `verlai/verl:app-verl0.4-vllm0.8.5-mcore0.13.0-te2.2-preview`

									
										37

docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.sglang0.4.10.post2.mcore0.13
									
										Normal file
									
												View File
												
				@ -0,0 +1,37 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4

				# Define environments

				ENV MAX_JOBS=8

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.10

				# Install FlashInfer Python package

				RUN pip install --upgrade pip setuptools packaging

				RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.9rc1

				RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation "sglang[all]==0.4.10.post2"

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]==4.55.4" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.13.0

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										37

docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.sglang0.4.9.post6.mcore0.13
									
										Normal file
									
												View File
												
				@ -0,0 +1,37 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4

				# Define environments

				ENV MAX_JOBS=8

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.10

				# Install FlashInfer Python package

				RUN pip install --upgrade pip setuptools packaging

				RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.9rc1

				RUN pip install --resume-retries 999  --no-cache-dir --no-build-isolation "sglang[all]==0.4.9.post6"

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]==4.55.4" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.13.0

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										38

docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.vllm.mcore0.13
									
										Normal file
									
												View File
												
				@ -0,0 +1,38 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install torch-2.7.1+cu126 + vllm-0.10.0

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.10.0

				# Fix packages

				# transformers 4.54.0 still not support

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.55.4" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.13.0

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

				# Fix qwen vl

				RUN pip3 install --no-cache-dir --no-deps trl

									
										39

docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.vllm.mcore0.15
									
										Normal file
									
												View File
												
				@ -0,0 +1,39 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM iseekyan/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4-h100

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install torch-2.7.1+cu126 + vllm-0.10.0

				RUN pip install --resume-retries 999 --no-cache-dir vllm==0.10.0

				# Fix packages

				# transformers 4.54.0 still not support

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.55.4" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.7

				RUN pip install onnxscript

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.15.0rc4

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge==v0.15.0

				# Fix qwen vl

				RUN pip3 install --no-cache-dir --no-deps trl

									
										133

docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.base.torch2.7.1
									
										Normal file
									
												View File
												
				@ -0,0 +1,133 @@

				# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks

				# Target: verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6

				# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:24.08-py3

				# Define environments

				ENV MAX_JOBS=16

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Define installation arguments

				ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

				ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

				# Set apt source

				RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \

				    { \

				    echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \

				    } > /etc/apt/sources.list

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install tini

				RUN apt-get update && \

				    apt-get install -y tini aria2 libfreeimage3 libfreeimage-dev zlib1g htop && \

				    apt-get clean

				# Change pip source

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    xgboost transformer_engine flash_attn apex megatron-core grpcio

				RUN pip install --resume-retries 999 --no-cache-dir torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1

				# Install flash-attn-2.7.4.post1, although built with torch2.6, it is compatible with torch2.7

				# https://github.com/Dao-AILab/flash-attention/issues/1644#issuecomment-2899396361

				RUN ABI_FLAG=$(python -c "import torch; print('TRUE' if torch._C._GLIBCXX_USE_CXX11_ABI else 'FALSE')") && \

				    URL="https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \

				    FILE="flash_attn-2.7.4.post1+cu12torch2.6cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \

				    wget -nv "${URL}" && \

				    pip install --no-cache-dir "${FILE}"

				# Fix packages

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				# Install cudnn

				RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cudnn-cuda-12 && \

				    rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb

				# Install Apex

				RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" --resume-retries 999 git+https://github.com/NVIDIA/apex.git

				# Profiling tools

				RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    apt-get update && apt-get install -y libxcb-cursor0

				RUN apt-get install -y ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    rm -rf /usr/local/cuda/bin/nsys && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys  /usr/local/cuda/bin/nsys && \

				    rm -rf /usr/local/cuda/bin/nsys-ui && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \

				    rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb

				RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.52.3" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas cuda-bindings \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				# Install DeepEP

				## the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				## Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				## Build deepep-nvshmem

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

				# Reset pip config

				RUN pip config unset global.index-url && \

				    pip config unset global.extra-index-url

									
										27

docker/verl0.5-cu126-torch2.7-fa2.7.4/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,27 @@

				# verl image with verl v0.5

				## Important packages version

				```txt

				cuda==12.6

				cudnn==9.8.0

				torch==2.7.1

				flash_attn=2.7.4.post1

				sglang==0.4.9.post6

				vllm==0.8.5.post1

				vidia-cudnn-cu12==9.8.0.87

				transformer_engine==2.3

				megatron.core==core_v0.12.2

				# Preview

				transformer_engine==2.5

				megatron.core==core_r0.13.0

				```

				## Target

				- Base image:

				  - `verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4`: We offer a base image with deep ep built in, for vllm/sglang

				- App image:

				  - `verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2`

				  - `verlai/verl:app-verl0.5-transformers4.55.4-sglang0.4.10.post2-mcore0.13.0-te2.2`

				  - `iseekyan/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.15.0-te2.7`

									
										37

docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.mcore0.12
									
										Normal file
									
												View File
												
				@ -0,0 +1,37 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0

				# Define environments

				ENV MAX_JOBS=8

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.8 and torch-memory-saver

				# Install FlashInfer Python package

				RUN pip install --upgrade pip setuptools packaging

				RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.6.post1

				RUN pip install --resume-retries 999  --no-cache-dir "sglang[all]==0.4.8" && pip install torch-memory-saver --no-cache-dir

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.3

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										37

docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.mcore0.13.preview
									
										Normal file
									
												View File
												
				@ -0,0 +1,37 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0

				# Define environments

				ENV MAX_JOBS=8

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.8 and torch-memory-saver

				# Install FlashInfer Python package

				RUN pip install --upgrade pip setuptools packaging

				RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.6.post1

				RUN pip install --resume-retries 999  --no-cache-dir "sglang[all]==0.4.8" && pip install torch-memory-saver --no-cache-dir

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										132

docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.base
									
										Normal file
									
												View File
												
				@ -0,0 +1,132 @@

				# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks

				# Target: verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6

				# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:24.08-py3

				# Define environments

				ENV MAX_JOBS=16

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Define installation arguments

				ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

				ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

				# Set apt source

				RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \

				    { \

				    echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \

				    } > /etc/apt/sources.list

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install tini

				RUN apt-get update && \

				    apt-get install -y tini aria2 libfreeimage3 libfreeimage-dev zlib1g htop && \

				    apt-get clean

				# Change pip source

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    xgboost transformer_engine flash_attn apex megatron-core grpcio

				RUN pip install --resume-retries 999 --no-cache-dir torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1

				# Install flash-attn-2.8.0.post2 (cxx11abi=True)

				RUN ABI_FLAG=$(python -c "import torch; print('TRUE' if torch._C._GLIBCXX_USE_CXX11_ABI else 'FALSE')") && \

				    URL="https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \

				    FILE="flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp310-cp310-linux_x86_64.whl" && \

				    wget -nv "${URL}" && \

				    pip install --no-cache-dir "${FILE}"

				# Fix packages

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				# Install cudnn

				RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cudnn-cuda-12 && \

				    rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb

				# Install Apex

				RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" --resume-retries 999 git+https://github.com/NVIDIA/apex.git

				# Profiling tools

				RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    apt-get update && apt-get install -y libxcb-cursor0

				RUN apt-get install -y ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    rm -rf /usr/local/cuda/bin/nsys && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys  /usr/local/cuda/bin/nsys && \

				    rm -rf /usr/local/cuda/bin/nsys-ui && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \

				    rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb

				RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.53" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas cuda-bindings \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pyext pre-commit ruff

				# Install DeepEP

				## the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				## Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				## Build deepep-nvshmem

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

				# Reset pip config

				RUN pip config unset global.index-url && \

				    pip config unset global.extra-index-url

									
										27

docker/verl0.5-cu126-torch2.7.1-fa2.8.0/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,27 @@

				# verl image with verl v0.5

				## Important packages version

				```txt

				cuda==12.6

				cudnn==9.8.0

				torch==2.7.1

				flash_attn=2.8.0    ##

				sglang==0.4.8

				vllm==0.8.5.post1

				vidia-cudnn-cu12==9.8.0.87

				transformer_engine==2.3

				megatron.core==core_v0.12.2

				# Preview

				transformer_engine==2.5

				megatron.core==core_r0.13.0

				```

				## Target

				- Base image:

				    - `verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0`: We offer a base image with deep ep built in

				- App image:

				    - `verlai/verl:app-verl0.5-sglang0.4.9-mcore0.12.2`

				    - `verlai/verl:app-verl0.5-sglang0.4.9-mcore0.13.0-preview`

				- vllm temporarily not support latest version

									
										36

docker/verl0.5-preview-cu128-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.megatron
									
										Normal file
									
												View File
												
				@ -0,0 +1,36 @@

				# Start from the verl base image

				# Dockerfile.base

				FROM verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6

				# Define environments

				ENV MAX_JOBS=8

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Install sglang-0.4.8 and torch-memory-saver

				# Install FlashInfer Python package

				RUN pip install --resume-retries 999 --no-cache-dir --no-build-isolation flashinfer-python==0.2.6.post1

				RUN pip install --resume-retries 999  --no-cache-dir "sglang[all]==0.4.8" && pip install torch-memory-saver --no-cache-dir

				# Fix packages

				RUN pip install --no-cache-dir "tensordict==0.6.2" "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pre-commit ruff

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --resume-retries 999 --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				RUN pip install --resume-retries 999 --no-cache-dir nvidia-cudnn-cu12==9.8.0.87

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.5

				# Install Megatron-LM

				RUN pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/Megatron-LM.git@core_r0.13.0

				# Install mbridge

				RUN pip3 install --no-cache-dir mbridge

									
										91

docker/verl0.5-preview-cu128-torch2.7.1-fa2.8.0/Dockerfile.base
									
										Normal file
									
												View File
												
				@ -0,0 +1,91 @@

				# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks

				# Target: verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8.0-fi0.2.6

				# Start from the NVIDIA official image (ubuntu-22.04 + cuda-12.6 + python-3.10)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html

				FROM nvcr.io/nvidia/pytorch:25.02-py3

				# Define environments

				ENV MAX_JOBS=16

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				# Define installation arguments

				ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

				ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

				# Set apt source

				RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \

				    { \

				    echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \

				    echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \

				    } > /etc/apt/sources.list

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install tini

				RUN apt-get update && \

				    apt-get install -y tini aria2 libfreeimage3 libfreeimage-dev zlib1g htop && \

				    apt-get clean

				# Change pip source

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    python -m pip install --upgrade pip

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    xgboost transformer_engine flash_attn apex megatron-core grpcio

				RUN pip install --resume-retries 999 --no-cache-dir torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128

				# Install flash-attn-2.8.0.post2 (cxx11abi=True)

				RUN ABI_FLAG=$(python -c "import torch; print('TRUE' if torch._C._GLIBCXX_USE_CXX11_ABI else 'FALSE')") && \

				    URL="https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp312-cp312-linux_x86_64.whl" && \

				    FILE="flash_attn-2.8.0.post2+cu12torch2.7cxx11abi${ABI_FLAG}-cp312-cp312-linux_x86_64.whl" && \

				    wget -nv "${URL}" && \

				    pip install --no-cache-dir "${FILE}"

				# Fix packages

				RUN pip uninstall -y pynvml nvidia-ml-py && \

				    pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"

				# Install cudnn

				RUN aria2c --max-tries=9999 https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb && \

				    cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ && \

				    apt-get update && \

				    apt-get -y install cudnn-cuda-12 && \

				    rm cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb

				# Install Apex

				RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" --resume-retries 999 git+https://github.com/NVIDIA/apex.git

				# Profiling tools

				RUN aria2c --always-resume=true --max-tries=99999 https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_3/nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    apt-get update && apt-get install -y libxcb-cursor0

				RUN apt-get install -y ./nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb && \

				    rm -rf /usr/local/cuda/bin/nsys && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys  /usr/local/cuda/bin/nsys && \

				    rm -rf /usr/local/cuda/bin/nsys-ui && \

				    ln -s /opt/nvidia/nsight-systems/2025.3.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \

				    rm nsight-systems-2025.3.1_2025.3.1.90-1_amd64.deb

				RUN pip install --resume-retries 999 --no-cache-dir "tensordict==0.6.2" torchdata "transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas cuda-bindings \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pre-commit ruff

				# Reset pip config

				RUN pip config unset global.index-url && \

				    pip config unset global.extra-index-url

									
										26

docker/verl0.5-preview-cu128-torch2.7.1-fa2.8.0/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,26 @@

				# verl image with verl v0.5

				## Important packages version

				```txt

				cuda==12.8

				cudnn==9.8.0

				torch==2.7.1

				flash_attn=2.8.0    ##

				sglang==0.4.8

				transformer_engine==2.5

				megatron.core==core_r0.13.0

				vidia-cudnn-cu12==9.8.0.87

				```

				## Target

				- Base image:

				    - `verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8.0`: We offer a base image with flash infer 0.2.6.post1 built in

				- App image:

				    - `verlai/verl:app-verl0.5-preview-sglang0.4.8-mcore0.13.0-preview`

				- vllm temporarily not support latest version

				## !!!Notice!!!

				- pyext is lack of maintainace and cannot work with python 3.12, consider using replacement and deprecating this package.

									
										4

docker/verl0.6-cu128-torch2.8.0-fa2.7.4/Dockerfile.app.sglang
									
										Normal file
									
												View File
												
				@ -0,0 +1,4 @@

				FROM verlai/verl:base-verl0.6-cu128-cudnn9.8-torch2.8.0-fa2.7.4

				RUN pip install --no-cache-dir "sglang[all]==0.5.2"

				RUN pip install --no-cache-dir "torch-memory-saver==0.0.9rc1"

									
										108

docker/verl0.6-cu128-torch2.8.0-fa2.7.4/Dockerfile.base
									
										Normal file
									
												View File
												
				@ -0,0 +1,108 @@

				# Start from the NVIDIA official image (ubuntu-24.04 + cuda-12.8 + python-3.12)

				# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-03.html

				FROM nvcr.io/nvidia/pytorch:25.03-py3

				# Define environments

				ENV MAX_JOBS=32

				ENV VLLM_WORKER_MULTIPROC_METHOD=spawn

				ENV DEBIAN_FRONTEND=noninteractive

				ENV NODE_OPTIONS=""

				ENV PIP_ROOT_USER_ACTION=ignore

				ENV HF_HUB_ENABLE_HF_TRANSFER="1"

				ENV PIP_CONSTRAINT=""

				ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

				# Change pip source

				RUN pip config set global.index-url "${PIP_INDEX}" && \

				    pip config set global.extra-index-url "${PIP_INDEX}" && \

				    pip config set global.no-cache-dir "true" && \

				    python -m pip install --upgrade pip

				# Install systemctl

				RUN apt-get update && \

				    apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \

				    apt-get clean

				# Install libxml2

				RUN apt-get update && \

				    apt-get install -y libxml2 aria2 && \

				    apt-get clean

				# Uninstall nv-pytorch fork

				RUN pip uninstall -y torch torchvision torchaudio \

				    pytorch-quantization pytorch-triton torch-tensorrt \

				    transformer_engine flash_attn apex megatron-core \

				    xgboost opencv grpcio

				# Fix packages

				RUN pip install --no-cache-dir tensordict torchdata "transformers[hf_xet]==4.55.4" accelerate datasets peft hf-transfer \

				    "numpy<2.0.0" "pyarrow>=19.0.1" pandas \

				    ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler blobfile xgrammar \

				    pytest py-spy pre-commit ruff

				# Fix cv2

				RUN rm -rf /usr/local/lib/python3.11/dist-packages/cv2

				# Install torch

				RUN pip install --no-cache-dir torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128

				# Install flash-attn

				RUN pip install --no-cache-dir --no-build-isolation flash_attn==2.7.4.post1

				# Install DeepEP

				# the dependency of IBGDA

				RUN ln -s /usr/lib/x86_64-linux-gnu/libmlx5.so.1 /usr/lib/x86_64-linux-gnu/libmlx5.so

				# Clone and build deepep and deepep-nvshmem

				RUN git clone -b v2.3.1 https://github.com/NVIDIA/gdrcopy.git && \

				    git clone https://github.com/deepseek-ai/DeepEP.git  && \

				    cd DeepEP && git checkout a84a248

				# Prepare nvshmem

				RUN wget https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz && \

				    tar -xvf nvshmem_src_3.2.5-1.txz && mv nvshmem_src deepep-nvshmem && \

				    cd deepep-nvshmem && git apply ../DeepEP/third-party/nvshmem.patch

				## Build deepep-nvshmem

				RUN apt-get install -y ninja-build cmake

				ENV CUDA_HOME=/usr/local/cuda

				### Set MPI environment variables. Having errors when not set.

				ENV CPATH=/usr/local/mpi/include:$CPATH

				ENV LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH

				ENV LD_LIBRARY_PATH=/usr/local/x86_64-linux-gnu:$LD_LIBRARY_PATH

				ENV GDRCOPY_HOME=/workspace/gdrcopy

				ENV GDRCOPY_INCLUDE=/workspace/gdrcopy/include

				RUN cd deepep-nvshmem && \

				    NVSHMEM_SHMEM_SUPPORT=0 \

				    NVSHMEM_UCX_SUPPORT=0 \

				    NVSHMEM_USE_NCCL=0 \

				    NVSHMEM_MPI_SUPPORT=0 \

				    NVSHMEM_IBGDA_SUPPORT=1 \

				    NVSHMEM_PMIX_SUPPORT=0 \

				    NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \

				    NVSHMEM_USE_GDRCOPY=1 \

				    cmake -G Ninja -S . -B build/ -DCMAKE_INSTALL_PREFIX=/workspace/deepep-nvshmem/install && cmake --build build/ --target install

				ENV NVSHMEM_DIR=/workspace/deepep-nvshmem/install

				ENV LD_LIBRARY_PATH=$NVSHMEM_DIR/lib:$LD_LIBRARY_PATH

				ENV PATH=$NVSHMEM_DIR/bin:$PATH

				## Build deepep

				RUN cd DeepEP && \

				    python setup.py install

				# Install Apex

				RUN pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

				# Install TransformerEngine

				RUN export NVTE_FRAMEWORK=pytorch && pip3 install --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.2.1

				# Install Megatron-LM

				RUN git clone -b core_v0.13.0 https://github.com/NVIDIA/Megatron-LM.git && \

				    cd Megatron-LM && pip3 install --no-deps -e .

				# Install mbridge

				RUN pip3 install --no-cache-dir git+https://github.com/ISEEKYAN/mbridge.git

									
										15

docker/verl0.6-cu128-torch2.8.0-fa2.7.4/Dockerfile.vllm011.mcore_gpt-oss
									
										Normal file
									
												View File
												
				@ -0,0 +1,15 @@

				FROM nvcr.io/nvidia/nemo:25.07.gpt_oss

				RUN git clone -b v0.11.0 --depth 1 https://github.com/vllm-project/vllm.git /opt/vllm

				RUN pip install setuptools_scm

				RUN cd /opt/vllm && pip install --no-deps --no-build-isolation --no-cache-dir -e .

				RUN pip install cbor2 setproctitle blake3 openai_harmony pybase64 msgspec partial_json_parser py-cpuinfo diskcache gguf

				RUN pip install --upgrade transformers tokenizers

				RUN pip install codetiming tensordict mathruler pylatexenc

				RUN pip3 install --no-cache-dir mbridge

									
										9

docs/README.md
									
												View File
												
				@ -1,9 +1,12 @@

				# verl documents

				# verl documentations

				## Build the docs

				```bash

				# Install dependencies.

				# If you want to view auto-generated API docstring, please make sure verl is available in python path. For instance, install verl via:

				# pip install .. -e[test]

				# Install dependencies needed for building docs.

				pip install -r requirements-docs.txt

				# Build the docs.

				@ -16,4 +19,4 @@ make html

				```bash

				python -m http.server -d _build/html/

				```

				Launch your browser and navigate to http://localhost:8000 to view the documentation.

				Launch your browser and navigate to http://localhost:8000 to view the documentation. Alternatively you could drag the file `_build/html/index.html` to your local browser and view directly.

									
										10

docs/README_vllm0.7.md
									
												View File
												
				@ -1,8 +1,10 @@

				# Upgrading to vllm >= 0.7

				Note: verl+vllm 0.8.3 is now stable. Please see ``docs/README_vllm0.8.md`` for upgrade guide.

				## Installation

				Note: This version of veRL+vllm 0.7+ supports **FSDP** for training and **vLLM** for rollout.

				Note: At time of writing, verl+vllm 0.7.x supports **FSDP** for training and **vLLM** for rollout.

				```

				# Create the conda environment

				@ -47,11 +49,11 @@ After installation, examples using FSDP as training backends can be used. By def

				```

				actor_rollout_ref.rollout.enforce_eager=False \

				actor_rollout_ref.rollout.free_cache_engine=False \

				actor_rollout_ref.rollout.free_cache_engine=True \

				```

				For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.

				For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.

				**Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts) using vLLM's V0 Engine.

				@ -68,4 +70,4 @@ VLLM_USE_PRECOMPILED=1 pip install --editable .

				```

				Then you can enable the V1 engine by setting `export VLLM_USE_V1=1`. In some benchmark tests, the V1 engine demonstrates a 1.5x speed improvement over the vLLM V0 engine.

				The stable support of the vLLM V1 engine will come soon.

				The stable support of the vLLM V1 engine is available on verl main.

									
										18

docs/README_vllm0.8.md
									
												View File
												
				@ -1,8 +1,10 @@

				# Upgrading to vLLM >= 0.8

				Last updated: 05/04/2025.

				## Installation

				Note: This version of veRL+vLLM 0.8+ supports **FSDP** for training and **vLLM** for rollout.

				Note: This version of verl+vLLM 0.8+ supports **FSDP** for training and **vLLM** for rollout.

				```bash

				# Create the conda environment

				@ -15,34 +17,30 @@ cd verl

				pip3 install -e .

				# Install the latest stable version of vLLM

				pip3 install vllm==0.8.2

				pip3 install vllm==0.8.3

				# Install flash-attn

				pip3 install flash-attn --no-build-isolation

				```

				We have a pre-built docker image for veRL+vLLM 0.8.2. You can direct import it with the following command:

				We have a pre-built docker image for verl+vLLM 0.8.3. You can direct import it with the following command:

				```bash

				docker pull hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.2

				docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0

				```

				## Features

				vLLM 0.8+ supports cuda graph and V1 engine by default in veRL. To enable these features, remember to add the following lines to the bash script:

				vLLM 0.8+ supports cuda graph and V1 engine by default in verl. To enable these features, remember to add the following lines to the bash script:

				```bash

				actor_rollout_ref.rollout.enforce_eager=False \

				actor_rollout_ref.rollout.free_cache_engine=False \

				actor_rollout_ref.rollout.free_cache_engine=True \

				```

				and also **remove** the environment variable if it exists:

				```bash

				export VLLM_ATTENTION_BACKEND=XFORMERS

				```

				## Notes

				When you just directly upgrade vllm>=0.8, some dependency packages may undergo version changes. If you encounter the following problems:

									
										217

docs/_static/custom.css
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,217 @@

				/* Make the documentation use full screen width */

				.wy-nav-content {

				    max-width: none !important;

				    width: 100% !important;

				    padding: 1.618em 3.236em !important;

				}

				/* Adjust the content wrapper - will be set by JavaScript */

				.wy-nav-content-wrap {

				    margin-left: 300px;

				    transition: margin-left 0.2s ease;

				    width: auto !important;

				    position: relative !important;

				    background: white !important;

				    min-height: 100vh !important;

				}

				/* Make the main content area responsive */

				.rst-content {

				    max-width: none !important;

				    width: 100% !important;

				}

				/* Optional: Adjust table widths to prevent overflow */

				.rst-content table.docutils {

				    width: 100% !important;

				    table-layout: auto !important;

				}

				/* Optional: Better code block width handling */

				.rst-content .highlight {

				    width: 100% !important;

				}

				/* Content area positioning already handled above */

				/* Optional: Improve readability with some margin on very wide screens */

				@media (min-width: 1400px) {

				    .wy-nav-content {

				        max-width: none !important;

				        margin: 0 auto !important;

				    }

				}

				/* Resizable sidebar styles */

				.wy-nav-side {

				    position: fixed !important;

				    top: 0 !important;

				    bottom: 0 !important;

				    left: 0 !important;

				    width: 300px;

				    min-width: 200px;

				    max-width: 600px;

				    display: flex;

				    flex-direction: column;

				    z-index: 200 !important;

				}

				/* Ensure sidebar header (logo, search) adapts to width */

				.wy-side-nav-search {

				    width: 100% !important;

				    box-sizing: border-box !important;

				    padding: 0.809em 0.809em !important;

				}

				.wy-side-nav-search input[type="text"] {

				    width: 100% !important;

				    box-sizing: border-box !important;

				}

				/* Make logo/title area responsive */

				.wy-side-nav-search > div.version {

				    width: 100% !important;

				}

				.wy-side-nav-search > a {

				    width: 100% !important;

				    display: block !important;

				    white-space: nowrap !important;

				    overflow: hidden !important;

				    text-overflow: ellipsis !important;

				}

				/* Responsive adjustments for narrow sidebar */

				@media (max-width: 300px) {

				    .wy-side-nav-search > a {

				        font-size: 0.9em !important;

				    }

				    .wy-side-nav-search input[type="text"] {

				        font-size: 0.8em !important;

				    }

				}

				/* Ensure search input doesn't overflow */

				.wy-side-nav-search form {

				    width: 100% !important;

				    margin: 0 !important;

				}

				/* Make search icon responsive */

				.wy-side-nav-search .wy-dropdown {

				    width: 100% !important;

				}

				/* Adjust search results dropdown width */

				.wy-side-nav-search .wy-dropdown-menu {

				    width: 100% !important;

				    max-width: none !important;

				    left: 0 !important;

				    right: 0 !important;

				}

				/* Resize handle is created by JavaScript */

				/* Make sure the sidebar content doesn't overflow */

				.wy-side-scroll {

				    width: 100% !important;

				    flex: 1 !important;

				    overflow-y: auto !important;

				    overflow-x: hidden !important;

				    padding-right: 10px !important;

				    box-sizing: border-box !important;

				    scroll-behavior: auto !important; /* Prevent smooth scrolling on sidebar itself */

				}

				/* Ensure proper scroll behavior for main content area */

				html {

				    scroll-behavior: smooth !important;

				}

				/* Ensure anchor links work properly in main content */

				.wy-nav-content-wrap {

				    scroll-behavior: smooth !important;

				}

				/* Fix scroll to target for anchor links */

				.rst-content {

				    scroll-behavior: smooth !important;

				}

				/* Fix anchor scroll offset to account for fixed header */

				.rst-content .section {

				    scroll-margin-top: 60px;

				}

				/* Fix anchor scroll offset for headers */

				.rst-content h1, .rst-content h2, .rst-content h3, .rst-content h4, .rst-content h5, .rst-content h6 {

				    scroll-margin-top: 60px;

				}

				/* Fix anchor scroll offset for specific scroll targets */

				.rst-content .headerlink {

				    scroll-margin-top: 60px;

				}

				/* Fix sidebar navigation styling */

				.wy-menu-vertical {

				    width: 100% !important;

				}

				.wy-menu-vertical li {

				    width: 100% !important;

				}

				.wy-menu-vertical a {

				    width: 100% !important;

				    word-wrap: break-word !important;

				    white-space: normal !important;

				}

				/* Content area margin is handled by JavaScript */

				/* Custom drag handle (more visible) */

				.resize-handle {

				    position: absolute;

				    top: 0;

				    right: 0;

				    width: 8px;

				    height: 100%;

				    background: #ccc;

				    cursor: col-resize;

				    z-index: 1001;

				    opacity: 0.3;

				    transition: opacity 0.2s ease;

				}

				.resize-handle:hover {

				    opacity: 0.8;

				    background: #999;

				}

				.resize-handle::before {

				    content: '';

				    position: absolute;

				    top: 50%;

				    left: 50%;

				    width: 2px;

				    height: 20px;

				    background: #666;

				    transform: translate(-50%, -50%);

				    border-radius: 1px;

				}

				.resize-handle:hover::before {

				    background: #333;

				}

				/* Ensure smooth resizing */

				.wy-nav-side.resizing {

				    user-select: none;

				    pointer-events: none;

				}

				.wy-nav-side.resizing .wy-side-scroll {

				    overflow: hidden;

				}

									
										251

docs/_static/js/resizable-sidebar.js
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,251 @@

				// Resizable sidebar functionality

				document.addEventListener('DOMContentLoaded', function() {

				    const sidebar = document.querySelector('.wy-nav-side');

				    const content = document.querySelector('.wy-nav-content-wrap');

				    if (!sidebar || !content) return;

				    // Create resize handle

				    const resizeHandle = document.createElement('div');

				    resizeHandle.className = 'resize-handle';

				    sidebar.appendChild(resizeHandle);

				    let isResizing = false;

				    let startX = 0;

				    let startWidth = 0;

				    // Get initial width

				    const getInitialWidth = () => {

				        return 300; // Default width

				    };

				    // Save width to localStorage

				    const saveWidth = (width) => {

				        localStorage.setItem('sidebar-width', width);

				    };

				    // Load width from localStorage

				    const loadWidth = () => {

				        const savedWidth = localStorage.getItem('sidebar-width');

				        if (savedWidth) {

				            const width = parseInt(savedWidth, 10);

				            if (width >= 200 && width <= 600) {

				                return width;

				            }

				        }

				        return getInitialWidth();

				    };

				    // Apply width to sidebar and content

				    const applyWidth = (width) => {

				        // Update sidebar width

				        sidebar.style.width = width + 'px';

				        // Update content margin with !important to override any CSS

				        content.style.setProperty('margin-left', width + 'px', 'important');

				        // Also update any other content wrapper that might exist

				        const contentInner = document.querySelector('.wy-nav-content');

				        if (contentInner) {

				            contentInner.style.setProperty('margin-left', '0px', 'important');

				        }

				        // Force reflow and repaint

				        sidebar.offsetHeight;

				        content.offsetHeight;

				        // Trigger window resize event to notify other components

				        window.dispatchEvent(new Event('resize'));

				    };

				    // Initialize with saved width

				    const initialWidth = loadWidth();

				    applyWidth(initialWidth);

				    // Mouse down on resize handle

				    resizeHandle.addEventListener('mousedown', (e) => {

				        isResizing = true;

				        startX = e.clientX;

				        startWidth = parseInt(window.getComputedStyle(sidebar).width, 10);

				        sidebar.classList.add('resizing');

				        document.body.style.cursor = 'col-resize';

				        document.body.style.userSelect = 'none';

				        // Add overlay to prevent iframe issues

				        const overlay = document.createElement('div');

				        overlay.style.cssText = `

				            position: fixed;

				            top: 0;

				            left: 0;

				            width: 100%;

				            height: 100%;

				            z-index: 9999;

				            cursor: col-resize;

				        `;

				        overlay.id = 'resize-overlay';

				        document.body.appendChild(overlay);

				        e.preventDefault();

				    });

				    // Mouse move

				    document.addEventListener('mousemove', (e) => {

				        if (!isResizing) return;

				        const width = startWidth + e.clientX - startX;

				        const clampedWidth = Math.max(200, Math.min(600, width));

				        applyWidth(clampedWidth);

				    });

				    // Mouse up

				    document.addEventListener('mouseup', () => {

				        if (!isResizing) return;

				        isResizing = false;

				        sidebar.classList.remove('resizing');

				        document.body.style.cursor = '';

				        document.body.style.userSelect = '';

				        // Remove overlay

				        const overlay = document.getElementById('resize-overlay');

				        if (overlay) {

				            overlay.remove();

				        }

				        // Save the current width

				        const currentWidth = parseInt(window.getComputedStyle(sidebar).width, 10);

				        saveWidth(currentWidth);

				    });

				    // Handle window resize - removed to prevent infinite loop

				    // The sidebar width is fixed and managed by drag functionality, no need to recalculate on window resize

				    // Double-click to reset to default width

				    resizeHandle.addEventListener('dblclick', () => {

				        const defaultWidth = 300;

				        applyWidth(defaultWidth);

				        saveWidth(defaultWidth);

				    });

				});

				// Fix navigation issues - Using MutationObserver for reliable initialization

				document.addEventListener('DOMContentLoaded', function() {

				    let navigationFixed = false;

				    function setupNavigationFix() {

				        if (navigationFixed) return;

				        // Find all links in the sidebar

				        const sidebarLinks = document.querySelectorAll('.wy-menu-vertical a');

				        // Only proceed if we have sidebar links

				        if (sidebarLinks.length === 0) return;

				        console.log('Setting up navigation fix...');

				        sidebarLinks.forEach(function(link) {

				            const href = link.getAttribute('href');

				            // Clone the link to remove all existing event listeners

				            const newLink = link.cloneNode(true);

				            // Add our own click handler

				            newLink.addEventListener('click', function(e) {

				                console.log('Link clicked:', href);

				                // If it's an anchor link within the same page

				                if (href && href.startsWith('#') && href !== '#') {

				                    e.preventDefault();

				                    e.stopPropagation();

				                    const targetId = href.substring(1);

				                    const targetElement = document.getElementById(targetId);

				                    if (targetElement) {

				                        // Calculate offset for fixed header

				                        const headerHeight = 60;

				                        const elementPosition = targetElement.getBoundingClientRect().top;

				                        const offsetPosition = elementPosition + window.pageYOffset - headerHeight;

				                        window.scrollTo({

				                            top: offsetPosition,

				                            behavior: 'smooth'

				                        });

				                        // Update URL hash

				                        if (history.pushState) {

				                            history.pushState(null, null, '#' + targetId);

				                        } else {

				                            location.hash = '#' + targetId;

				                        }

				                    }

				                }

				                // For external links, navigate normally

				                else if (href && !href.startsWith('#') && !href.startsWith('javascript:')) {

				                    console.log('Navigating to external link:', href);

				                    window.location.href = href;

				                }

				            });

				            // Replace the old link with the new one

				            link.parentNode.replaceChild(newLink, link);

				        });

				        navigationFixed = true;

				        // Handle initial page load with hash

				        if (window.location.hash) {

				            // Use requestAnimationFrame for better timing

				            requestAnimationFrame(() => {

				                const targetId = window.location.hash.substring(1);

				                const targetElement = document.getElementById(targetId);

				                if (targetElement) {

				                    const headerHeight = 60;

				                    const elementPosition = targetElement.getBoundingClientRect().top;

				                    const offsetPosition = elementPosition + window.pageYOffset - headerHeight;

				                    window.scrollTo({

				                        top: offsetPosition,

				                        behavior: 'smooth'

				                    });

				                }

				            });

				        }

				    }

				    // Try to set up navigation fix immediately

				    setupNavigationFix();

				    // If it didn't work, use MutationObserver to watch for when sidebar links are added

				    if (!navigationFixed) {

				        const observer = new MutationObserver(function(mutations) {

				            mutations.forEach(function(mutation) {

				                if (mutation.type === 'childList' && mutation.addedNodes.length > 0) {

				                    // Check if sidebar links were added

				                    const sidebarLinks = document.querySelectorAll('.wy-menu-vertical a');

				                    if (sidebarLinks.length > 0) {

				                        setupNavigationFix();

				                        if (navigationFixed) {

				                            observer.disconnect();

				                        }

				                    }

				                }

				            });

				        });

				        // Start observing the document for changes

				        observer.observe(document.body, {

				            childList: true,

				            subtree: true

				        });

				        // Fallback timeout in case MutationObserver doesn't work

				        setTimeout(function() {

				            if (!navigationFixed) {

				                setupNavigationFix();

				            }

				            observer.disconnect();

				        }, 5000);

				    }

				});

Compare commits

1147 Commits v0.3.x ... 061535208c

10 .gemini/config.yaml Normal file Unescape Escape View File

30 .github/CODEOWNERS vendored Normal file Unescape Escape View File

65 .github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file Unescape Escape View File

2 .github/ISSUE_TEMPLATE/config.yml vendored Normal file Unescape Escape View File

32 .github/ISSUE_TEMPLATE/feature-request.yml vendored Normal file Unescape Escape View File

40 .github/PULL_REQUEST_TEMPLATE.md vendored Normal file Unescape Escape View File

147 .github/workflows/.deprecate/e2e_eval_aime24.yml vendored Normal file Unescape Escape View File

133 .github/workflows/.deprecate/e2e_ppo_trainer.yml vendored Normal file Unescape Escape View File

155 .github/workflows/.deprecate/e2e_ppo_trainer_megatron_sglang.yml vendored Normal file Unescape Escape View File

66 .github/workflows/.deprecate/e2e_prime.yml vendored Normal file Unescape Escape View File

119 .github/workflows/.deprecate/e2e_spin.yml vendored Normal file Unescape Escape View File

118 .github/workflows/.deprecate/e2e_sppo.yml vendored Normal file Unescape Escape View File

73 .github/workflows/README.md vendored Normal file Unescape Escape View File

58 .github/workflows/check-pr-title.yml vendored Normal file Unescape Escape View File

175 .github/workflows/checkpoint_converter.yml vendored Normal file Unescape Escape View File

64 .github/workflows/checkpoints.yml vendored Unescape Escape View File

89 .github/workflows/cpu_unit_tests.yml vendored Normal file Unescape Escape View File

61 .github/workflows/dataset.yml vendored Unescape Escape View File

100 .github/workflows/doc.yml vendored Normal file Unescape Escape View File

129 .github/workflows/e2e_ascend.yml vendored Unescape Escape View File

145 .github/workflows/e2e_dapo.yml vendored Normal file Unescape Escape View File

55 .github/workflows/e2e_digit_completion.yml vendored Unescape Escape View File

47 .github/workflows/e2e_digit_completion_fire.yml vendored Unescape Escape View File

141 .github/workflows/e2e_genrm_remote.yml vendored Normal file Unescape Escape View File

70 .github/workflows/e2e_grpo.yml vendored Unescape Escape View File

97 .github/workflows/e2e_gsm8k.yml vendored Unescape Escape View File

63 .github/workflows/e2e_gsm8k_megatron.yml vendored Unescape Escape View File

54 .github/workflows/e2e_gsm8k_prime.yml vendored Unescape Escape View File

59 .github/workflows/e2e_lora.yml vendored Unescape Escape View File

178 .github/workflows/e2e_one_step_off_policy.yml vendored Normal file Unescape Escape View File

79 .github/workflows/e2e_ppo_trainer.yml vendored Normal file Unescape Escape View File

281 .github/workflows/e2e_ppo_trainer_megatron_sglang.yml vendored Normal file Unescape Escape View File

275 .github/workflows/e2e_ppo_trainer_megatron_sglang_2.yml vendored Normal file Unescape Escape View File

292 .github/workflows/e2e_ppo_trainer_megatron_vllm.yml vendored Normal file Unescape Escape View File

420 .github/workflows/e2e_ppo_trainer_megatron_vllm_2.yml vendored Normal file Unescape Escape View File

144 .github/workflows/e2e_sft.yml vendored Unescape Escape View File

60 .github/workflows/e2e_sglang_gsm8k.yml vendored Unescape Escape View File

54 .github/workflows/e2e_vlm_geo3k.yml vendored Unescape Escape View File

113 .github/workflows/gpu_unit_tests.yml vendored Normal file Unescape Escape View File

234 .github/workflows/model.yml vendored Unescape Escape View File

40 .github/workflows/pre-commit.yml vendored Normal file Unescape Escape View File

54 .github/workflows/ray_test.yml vendored Unescape Escape View File

131 .github/workflows/reward_model.yml vendored Normal file Unescape Escape View File

54 .github/workflows/sandbox.yml vendored Unescape Escape View File

77 .github/workflows/sanity.yml vendored Unescape Escape View File

6 .github/workflows/scorecard.yml vendored Unescape Escape View File

17 .github/workflows/secrets_scan.yml vendored Unescape Escape View File

178 .github/workflows/sgl.yml vendored Normal file Unescape Escape View File

31 .github/workflows/type-coverage-check.yml vendored Normal file Unescape Escape View File

141 .github/workflows/vllm.yml vendored Unescape Escape View File

56 .github/workflows/yapf_format.yml vendored Unescape Escape View File

7 .gitignore vendored Unescape Escape View File

37 .pre-commit-config.yaml Normal file Unescape Escape View File

5 .style.yapf Unescape Escape View File

15 .vscode/settings.json vendored Normal file Unescape Escape View File

89 CONTRIBUTING.md Normal file Unescape Escape View File

227 README.md Unescape Escape View File

57 docker/Apptainerfile.rocm Normal file Unescape Escape View File

55 docker/Dockerfile.extention.awsefa Normal file Unescape Escape View File

9 docker/Dockerfile.megatron Unescape Escape View File

17 docker/Dockerfile.ngc.vllm Unescape Escape View File

43 docker/Dockerfile.ngc.vllm0.8 Unescape Escape View File

2 docker/Dockerfile.ngc.vllm0.8.sagemaker Unescape Escape View File

321 docker/Dockerfile.rocm Unescape Escape View File

141 docker/Dockerfile.rocm7 Normal file Unescape Escape View File

58 docker/Dockerfile.rocm_verl-0.3.0.post1 Normal file Unescape Escape View File

323 docker/Dockerfile.rocm_verl-0.4.1 Normal file Unescape Escape View File

55 docker/Dockerfile.sglang Normal file Unescape Escape View File

2 docker/Dockerfile.vemlp.vllm.te Unescape Escape View File

115 docker/Dockerfile.vllm.sglang.megatron.deepseek Normal file Unescape Escape View File

72 docker/README.md Normal file Unescape Escape View File

41 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12 Normal file Unescape Escape View File

82 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12.deepep Normal file Unescape Escape View File

82 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.13.preview Normal file Unescape Escape View File

47 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12 Normal file Unescape Escape View File

88 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12.deepep Normal file Unescape Escape View File

85 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.13.preview Normal file Unescape Escape View File

113 docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.base Normal file Unescape Escape View File

1147 Commits

v0.3.x ... 061535208c

10

.gemini/config.yaml Normal file

View File

30

.github/CODEOWNERS vendored Normal file

View File

65

.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file

View File

2

.github/ISSUE_TEMPLATE/config.yml vendored Normal file

View File

32

.github/ISSUE_TEMPLATE/feature-request.yml vendored Normal file

View File

40

.github/PULL_REQUEST_TEMPLATE.md vendored Normal file

View File

147

.github/workflows/.deprecate/e2e_eval_aime24.yml vendored Normal file

View File

133

.github/workflows/.deprecate/e2e_ppo_trainer.yml vendored Normal file

View File

155

.github/workflows/.deprecate/e2e_ppo_trainer_megatron_sglang.yml vendored Normal file

View File

66

.github/workflows/.deprecate/e2e_prime.yml vendored Normal file

View File

119

.github/workflows/.deprecate/e2e_spin.yml vendored Normal file

View File

118

.github/workflows/.deprecate/e2e_sppo.yml vendored Normal file

View File

73

.github/workflows/README.md vendored Normal file

View File

58

.github/workflows/check-pr-title.yml vendored Normal file

View File

175

.github/workflows/checkpoint_converter.yml vendored Normal file

View File

64

.github/workflows/checkpoints.yml vendored

View File

89

.github/workflows/cpu_unit_tests.yml vendored Normal file

View File

61

.github/workflows/dataset.yml vendored

View File

100

.github/workflows/doc.yml vendored Normal file

View File

129

.github/workflows/e2e_ascend.yml vendored

View File

145

.github/workflows/e2e_dapo.yml vendored Normal file

View File

55

.github/workflows/e2e_digit_completion.yml vendored

View File

47

.github/workflows/e2e_digit_completion_fire.yml vendored

View File

141

.github/workflows/e2e_genrm_remote.yml vendored Normal file

View File

70

.github/workflows/e2e_grpo.yml vendored

View File

97

.github/workflows/e2e_gsm8k.yml vendored

View File

63

.github/workflows/e2e_gsm8k_megatron.yml vendored

View File

54

.github/workflows/e2e_gsm8k_prime.yml vendored

View File

59

.github/workflows/e2e_lora.yml vendored

View File

178

.github/workflows/e2e_one_step_off_policy.yml vendored Normal file

View File

79

.github/workflows/e2e_ppo_trainer.yml vendored Normal file

View File

281

.github/workflows/e2e_ppo_trainer_megatron_sglang.yml vendored Normal file

View File

275

.github/workflows/e2e_ppo_trainer_megatron_sglang_2.yml vendored Normal file

View File

292

.github/workflows/e2e_ppo_trainer_megatron_vllm.yml vendored Normal file

View File

420

.github/workflows/e2e_ppo_trainer_megatron_vllm_2.yml vendored Normal file

View File

144

.github/workflows/e2e_sft.yml vendored

View File

60

.github/workflows/e2e_sglang_gsm8k.yml vendored

View File

54

.github/workflows/e2e_vlm_geo3k.yml vendored

View File

113

.github/workflows/gpu_unit_tests.yml vendored Normal file

View File

234

.github/workflows/model.yml vendored

View File

40

.github/workflows/pre-commit.yml vendored Normal file

View File

54

.github/workflows/ray_test.yml vendored

View File

131

.github/workflows/reward_model.yml vendored Normal file

View File

54

.github/workflows/sandbox.yml vendored

View File

77

.github/workflows/sanity.yml vendored

View File

6

.github/workflows/scorecard.yml vendored

View File

17

.github/workflows/secrets_scan.yml vendored

View File

178

.github/workflows/sgl.yml vendored Normal file

View File

31

.github/workflows/type-coverage-check.yml vendored Normal file

View File

141

.github/workflows/vllm.yml vendored

View File

56

.github/workflows/yapf_format.yml vendored

View File

7

.gitignore vendored

View File

37

.pre-commit-config.yaml Normal file

View File

5

.style.yapf

View File

15

.vscode/settings.json vendored Normal file

View File

89

CONTRIBUTING.md Normal file

View File

227

README.md

View File

57

docker/Apptainerfile.rocm Normal file

View File

55

docker/Dockerfile.extention.awsefa Normal file

View File

9

docker/Dockerfile.megatron

View File

17

docker/Dockerfile.ngc.vllm

View File

43

docker/Dockerfile.ngc.vllm0.8

View File

2

docker/Dockerfile.ngc.vllm0.8.sagemaker

View File

321

docker/Dockerfile.rocm

View File

141

docker/Dockerfile.rocm7 Normal file

View File

58

docker/Dockerfile.rocm_verl-0.3.0.post1 Normal file

View File

323

docker/Dockerfile.rocm_verl-0.4.1 Normal file

View File

55

docker/Dockerfile.sglang Normal file

View File

2

docker/Dockerfile.vemlp.vllm.te

View File

115

docker/Dockerfile.vllm.sglang.megatron.deepseek Normal file

View File

72

docker/README.md Normal file

View File

41

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12 Normal file

View File

82

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12.deepep Normal file

View File

82

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.13.preview Normal file

View File

47

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12 Normal file

View File

88

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12.deepep Normal file

View File

85

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.13.preview Normal file

View File

113

docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.base Normal file

View File

31

docker/verl0.4-cu124-torch2.6-fa2.7.4/README.md Normal file

View File