frozenleaves/verl - verl - Gitea: Git for Me

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Author	SHA1	Message	Date
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	acfcf98ed0	[doc] fix: `actor_rollout_ref.critic` is not correct (#3778 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. They should start directly with `critic` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-16 11:12:45 +08:00
EduardDurech	26a734e740	[algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup (#3555 ) Vectorize RLOO advantage estimator 130ms -> 6ms Similar method can be done for other advantage estimators, I just don't have time Implements $$r_i - \frac{\sum_{j\ne i} r_j}{G-1} = \frac{(G-1)r_i - \sum_{j\ne i} r_j}{G-1} = \frac{G r_i - \sum_{j\in g} r_j}{G-1}$$ <img width="2199" height="628" alt="image" src="https://github.com/user-attachments/assets/339e5bd2-6949-4460-a297-34268ffc1764" />	2025-09-24 17:36:41 +08:00
Chi Zhang	83205fdae0	[ci] feat: using local dataset to avoid network issue (#3533 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-19 16:21:55 +08:00
xvxuopop	f4e2047074	[model, ci] feat: add qwen3-8b ppo script on ASCEND NPU (#3502 ) ### What does this PR do? add examples/ppo_trainer/run_qwen3-8b_npu.sh > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-17 18:48:24 +08:00
X. HU	dfa3933ac4	[tool] feat: support local gsm8k dataset in example/data_preprocess (#3362 )	2025-09-09 22:29:56 +08:00
Feng Yao	b8dc5377c6	[BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling (#2953 ) ### What does this PR do? Support [vLLM-FSDP off-policy importance sampling correction](https://fengyao.notion.site/off-policy-rl) using Truncated Importance Sampling (TIS): <img width="859" height="382" alt="TIS" src="https://github.com/user-attachments/assets/adc8f797-aa14-4b29-b265-a682c281d08e" /> ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=gae \ data.train_files="$train_files" \ data.val_files="$test_files" \ data.train_batch_size=1024 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=Qwen/Qwen2.5-32B-Instruct \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.actor.use_kl_loss=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ critic.optim.lr=1e-5 \ critic.model.use_remove_padding=True \ critic.model.path=Qwen/Qwen2.5-32B-Instruct \ critic.model.enable_gradient_checkpointing=False \ critic.ppo_micro_batch_size_per_gpu=8 \ critic.model.fsdp_config.param_offload=False \ critic.model.fsdp_config.optimizer_offload=False \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger='["console","wandb"]' \ trainer.project_name='verl_example' \ trainer.experiment_name='Qwen2.5-32B-Instruct_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.nnodes=4 \ trainer.save_freq=20 \ trainer.test_freq=10 \ trainer.total_epochs=15 \ actor_rollout_ref.rollout.calculate_log_probs=True \ # add this config to return rollout prob +actor_rollout_ref.actor.behav_imp_weight_cap=10.0$@ # add this config to set up C value in TIS ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Narsil-Dinghuai Zhang 张鼎怀 <dinghuai233@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: LiyuanLucasLiu <llychinalz@gmail.com>	2025-08-26 14:06:07 -07:00
Blue Space	b79263ad60	[perf] refactor: part 2 - Profiler ci test and fixes (#3001 ) ### What does this PR do? [perf] refactor part 2: Profiler ci test and fixes ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-12 08:59:39 +08:00
Blue Space	545f899844	[BREAKING] [perf] refactor: Profiler api refactor (#2894 ) ### What does this PR do? Refactor profiler CI to a unified way. TODO: - nsys use `save_path` - nsys descrete tests are disabled - torch profiler cc: @davidmlw ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example Global profiler config: ```yaml global_profiler: _target_: verl.utils.profiler.ProfilerConfig tool: null steps: null profile_continuous_steps: false save_path: outputs/profile tool_config: nsys: _target_: verl.utils.profiler.config.NsightToolConfig discrete: false npu: _target_: verl.utils.profiler.config.NPUToolConfig discrete: false contents: [] level: level1 analysis: true torch: _target_: verl.utils.profiler.config.TorchProfilerToolConfig step_start: 0 step_end: null ``` Local profiler config: ```yaml profiler: # Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs _target_: verl.utils.profiler.ProfilerConfig # profiler tool, default same as profiler.tool in global config # choices: nsys, npu, torch tool: ${oc.select:global_profiler.tool,null} # whether enable profile on critic enable: False # Whether to profile all ranks. all_ranks: False # The ranks that will be profiled. [] or [0,1,...] ranks: [] # profile results saving path save_path: ${oc.select:global_profiler.save_path,null} # specific tool config tool_config: ${oc.select:global_profiler.tool_config,null} ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-11 09:52:41 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0da1a3de06	[megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts (#2864 ) ### What does this PR do? They were removed in #2651, but #2691 overlooked some of them. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] (CI is not needed for this change) Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-08-01 20:51:33 +08:00
Liwei Ma	4a651f5425	[perf, doc] feat: Add profiling continous steps in one database (#2695 ) ### What does this PR do? Some customers would like to observe continuous steps in one database, so the gap between steps can be eliminated. The feature will dump the continuous steps in `profile_steps` into one database controlled by a new config, `trainer.profile_continous_steps`. For example [1, 2, 5], 1 and 2 will be in one database, 5 will be in another. Also add warning when nvtx is not available in cuda platform. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-31 12:26:10 +08:00
H	8888122a89	[megatron] fix: remove the demising model.enable_gradient_checkpointing flags in the script (#2691 ) ### What does this PR do? They were removed in https://github.com/volcengine/verl/pull/2651 ... @ETOgaosion ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-23 09:25:30 +08:00
Blue Space	c5b189a1af	[BREAKING][megatron] refactor: activation checkpointing APIs (#2651 ) ### What does this PR do? Since we directly offer `override_transformer_config` option, we directly use it to recompute activations. Default settings are the same with `megatron.training`. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-22 10:24:28 +08:00
Ziheng Jiang	9d7cba4e12	[trainer] refactor: Training Engine Interface and Development Plan (#1977 ) # [Refactor] Training Engine Interface and Development Plan ## Motivation See the original RFC for background: https://github.com/volcengine/verl/issues/1371 Modernizing our training loop requires that we: - Decouple training-backend implementation from algorithm code so each can evolve independently - Unify on a single, well-defined `Engine` interface across FSDP/Megatron/etc backends - Enable unit-testing of each backend implementation in isolation - Guarantee algorithm “roles” (Critic, Actor, Rollout, Ref) remain completely engine-agnostic. --- ## Current Implementation This PR: - Introduces an abstract `BaseEngine` class that defines a unified training‐engine interface. - Implements `FSDPEngine`, a concrete `BaseEngine` using PyTorch FullyShardedDataParallel. - Provides a `CriticWorker` based on `FSDPEngine` that plugs seamlessly into existing PPO training code without any changes. ### Classic Training Loop with the New Interface ```python # 1. Build and initialize engine engine = FSDPEngine(config) engine.init_model() engine.set_loss_fn(loss_fn) # 2. Training loop for epoch in range(config.num_epochs): for batch in train_loader: # a) zero gradients engine.optimizer_zero_grad() # b) forward + backward with engine.train_mode(): preds, loss, ctx = engine.forward_backward_step( batch, ctx, forward_only=False, preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn ) # c) update and schedule grad_norm = engine.optimizer_step() current_lr = engine.lr_scheduler_step() # 3. Evaluation with engine.eval_mode(): for micro_batch in data: preds, ctx = engine.forward_backward_step( micro_batch, ctx, forward_only=True, preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn ) ``` ### Detailed BaseEngine Interface We now introduce an abstract base class, `BaseEngine`, which defines our unified training-engine interface. Key enhancements over the original RFC: - `train_mode()` / `eval_mode()` Context managers to control parameter and activation load/offload at the start and end of each loop. - `shard_data()` / `unshard_data()` APIs for partitioning and gathering data across devices or workers. - `preprocess_fn` / `postprocess_fn` in `forward_backward_step()` Hooks to apply custom transformations before and after each micro-batch pass. Below are the detailed signatures for each core method. ```python class BaseEngine(object): """ Abstract base class defining the interface for model training engines. Engine implementations must subclass BaseEngine and provide concrete behavior for all methods. """ def __init__(self, config): """ Initialize the BaseEngine. Args: config: Configuration object containing parameters for engine setup. """ raise NotImplementedError def init_model(self): """ Instantiate or load the model, optimizer, and learning rate scheduler. Should prepare all components necessary for training or evaluation. """ raise NotImplementedError def train_mode(self): """ Context manager entry for switching the engine and model into training mode. Usage: with engine.train_mode(): # runs in training mode """ raise NotImplementedError def eval_mode(self): """ Context manager entry for switching the engine and model into evaluation mode. Usage: with engine.eval_mode(): # runs in evaluation mode """ raise NotImplementedError def forward_backward_step(self, batch, ctx=None, forward_only=False, preprocess_fn=None, postprocess_fn=None): """ Execute a forward pass (and optional backward pass) over a batch of data. Args: batch: Raw batch data (e.g., tensors or mappings) to process. ctx: Optional context dict passed to preprocess/postprocess functions. forward_only: If True, skip gradient computation and backward pass. preprocess_fn: Function(batch, ctx) -> (inputs, ctx), applied before model call. postprocess_fn: Function(outputs, ctx) -> (predictions, ctx), applied after model call. Returns: If forward_only: (predictions, ctx) Else: (predictions, loss, ctx) """ raise NotImplementedError def optimizer_zero_grad(self): """ Zero out gradients of all parameters before starting a new backward pass. """ raise NotImplementedError def optimizer_step(self): """ Perform an optimization step to update model parameters based on accumulated gradients. Returns: grad_norm (float): The norm of the gradients before clipping or update. """ raise NotImplementedError def lr_scheduler_step(self): """ Advance the learning rate scheduler by one step. Returns: current_lr (float or list[float]): Updated learning rate(s). """ raise NotImplementedError def shard_data(self, data): """ Shard or partition data for distributed training or parallel execution. Args: data: Data structure to be sharded across devices/workers. Returns: Sharded data in the same format as input. """ raise NotImplementedError def unshard_data(self, data): """ Reconstruct or gather sharded data back to a unified format. Args: data: Sharded data structure to reconstruct. Returns: Unsharded, combined data. """ raise NotImplementedError def set_loss_fn(self, loss_fn): """ Set the loss function to be used during training. Args: loss_fn: Callable(data, predictions, ctx) -> (loss_tensor, new_ctx) """ raise NotImplementedError def to(self, device: str, model: bool = True, optimizer: bool = True): """ Move model parameters, optimizer states, or both to the specified device. Args: device: Target device identifier (e.g., "cuda" or "cpu"). model: If True, move the model. optimizer: If True, move the optimizer states. """ raise NotImplementedError def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, max_ckpt_to_keep=None): """ Save model, optimizer, and scheduler states to a checkpoint. Args: local_path: Local filesystem path to save checkpoint. hdfs_path: Optional HDFS path to copy checkpoint. global_step: Integer training step number for naming. max_ckpt_to_keep: Maximum number of recent checkpoints to retain. """ raise NotImplementedError def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_load=True): """ Load model, optimizer, and scheduler states from a checkpoint. Args: local_path: Local filesystem path of the checkpoint. hdfs_path: Optional HDFS path where checkpoint is stored. del_local_after_load: Whether to delete local copy after loading. """ raise NotImplementedError ``` ### FSDPEngine Implementaion A concrete `FSDPEngine` implements all methods using PyTorch FullyShardedDataParallel, supporting all the features that FSDP DPCritic Worker support: - Multi-GPU/model sharding - Activation- and optimizer-offload - LoRA & sequence parallelism - Dynamic batch size and remove padding ### CriticWorker Implementation based on the FSDPEngine - Unchanged public API - Each role calls only BaseEngine methods (init_model, train_mode/eval_mode, forward_backward_step, etc.) - No modifications needed in existing algorithms (e.g., PPOTraining) - New roles can be plugged in identically to legacy code ## Development Plan We’ll roll this out in three gated phases, controlled by a feature-flag (`use_legacy_worker_impl`). ### Phase 1: Engine Development > Flag: use_legacy_worker_impl = True (default) > New interface under active development - Refactor Critic, Actor, Rollout, Ref to use only BaseEngine APIs - Design a hierarchical, immutable config system for engine/backends - Ensure PPO training curves and final accuracy match legacy implementation ### Phase 2: Migration > Flag: use_legacy_worker_impl = False (default) – legacy path logs a deprecation warning > All new code targets the new interface; 2–3 months of integration/stress testing - Enforce new interface for all feature work - Gather benchmarks, bug reports, and performance data ### Phase 3: Cleanup > After Phase 2 validation: - Remove legacy worker code and flags - Finalize documentation, update changelogs, close deprecation notices Please review this refactor and share any feedback or concerns! Contributions are welcome.	2025-07-17 22:05:21 -07:00
Kai-Hsun Chen	a31a8f251f	[doc] fix: quickstart example can't work on zsh (#2509 ) ### What does this PR do? I followed the instructions at https://verl.readthedocs.io/en/latest/start/quickstart.html to run the PPO example on my devbox, which uses zsh. However, I got the error zsh: no matches found: `trainer.logger=[console]` because `[]` is interpreted as a glob pattern in zsh. ``` (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console'] \ trainer.val_before_train=False \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 \| tee verl_demo.log zsh: no matches found: trainer.logger=[console] ``` This PR has 3 changes: * `trainer.logger=['console']` -> `trainer.logger=console` * `trainer.logger=['console','wandb']` -> `trainer.logger='["console","wandb"]'` * `trainer.logger=['console','tensorboard']` -> `trainer.logger='["console","tensorboard"]'` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * `trainer.logger=console` (zsh) <img width="898" height="564" alt="image" src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80" /> * ``trainer.logger='["console","wandb"]'`` (zsh) <img width="870" height="565" alt="image" src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1" /> * `trainer.logger=console` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger=console \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID: 1799248 (TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra'], (TaskRunner pid=1799248) 'save_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra']}, ``` * `trainer.logger='["console","wandb"]'` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger='["console","wandb"]' \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID: 1805000 (TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra'], (TaskRunner pid=1805000) 'save_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra']}, ``` ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-14 13:26:32 +08:00
Liwei Ma	1dfc1359da	[perf] feat: add range tag to start/stop profile; clean actor_rollout_ref.profiler (#2456 ) ### What does this PR do? I found the cost of workers start/stop profile is not negligible, there are big gap between steps which is annoying. So I add range tag to them, making it clear. Another change, I realize that `actor_rollout_ref` needs only one `profiler` config, and needn't redundant for each role. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-11 10:12:56 +08:00
shuyhere	281ecd4cc1	[doc] fix: Fix document config.rst (#2369 ) ### What does this PR do? > Fix document config.rst: the parameter“gemma” -> “gamma”. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/2322 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-05 09:26:42 -07:00
H	52065c6405	[BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support (#2257 ) ### What does this PR do? This PR removes support for vLLM versions 0.5.4 and 0.6.3 from the verl repository, completing a comprehensive cleanup of legacy version-specific code branches. The changes simplify the codebase by eliminating conditional logic and version-specific implementations, requiring users to upgrade to vLLM 0.7.0 or later (recommended: vLLM 0.8.3+). Key Changes: - Deleted legacy rollout implementations (`fire_vllm_rollout.py`, `vllm_rollout.py`, `test_vllm_hf_loader.py`) - Removed version-specific directories (`vllm_v_0_5_4`, `vllm_v_0_6_3`) - Simplified sharding managers by removing `customized_vllm` flag conditionals - Updated configuration files to remove deprecated options (`use_fire_sampling`) - Cleaned up documentation and environment variable exports ### Checklist Before Starting - [x] Search for similar PRs: No similar PRs found for this specific cleanup - [x] Format the PR title as `[BREAKING][vllm, rollout, worker] refactor: Remove vLLM 0.5.4 and 0.6.3 support` - Modules: `vllm`, `rollout`, `worker` (primary affected components) - Type: `refactor` (code cleanup and simplification) - Breaking: Yes, requires vLLM version upgrade ### Test This PR has been validated through: - CI Pipeline: All existing tests pass with vLLM 0.7.0+ (27 checks pending/running) - Version Detection: New version check logic properly rejects vLLM 0.5.4/0.6.3 with clear error messages - Merge Conflict Resolution: Successfully resolved complex conflicts during main branch merge - Pre-commit Checks: All linting and formatting requirements satisfied ### API and Usage Example Breaking Changes: - vLLM Version Requirement: Minimum supported version is now 0.7.0 (recommended: 0.8.3+) - Removed Configuration Options: `use_fire_sampling` no longer available in config files - Environment Variables: `VLLM_ATTENTION_BACKEND=XFORMERS` exports removed (not needed for vLLM 0.7.0+) Migration Guide: ```bash # Before: vLLM 0.5.4/0.6.3 with custom flags pip install vllm==0.6.3 export VLLM_ATTENTION_BACKEND=XFORMERS # After: vLLM 0.8.3+ with V1 API pip install vllm>=0.8.3 export VLLM_USE_V1=1 # Recommended for optimal performance ``` Updated Configuration: ```yaml # generation.yaml - removed use_fire_sampling option rollout: name: vllm_rollout # use_fire_sampling: False # <- REMOVED # Use standard vLLM rollout without legacy options ``` ### High-Level Design ```mermaid graph TB subgraph "Before: Multi-Version Support" A1[vLLM Version Check] --> B1{Version 0.5.4?} A1 --> B2{Version 0.6.3?} A1 --> B3{Version 0.7.0+?} B1 --> C1[Legacy vllm_v_0_5_4 Code] B2 --> C2[Legacy vllm_v_0_6_3 Code] B3 --> C3[Modern vLLM Code] end subgraph "After: Simplified Support" A2[vLLM Version Check] --> B4{Version >= 0.7.0?} B4 -->\|Yes\| C4[Modern vLLM Code Only] B4 -->\|No\| C5[Clear Error Message] end ``` ### Specific Changes Deleted Files: - `verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py` - `verl/workers/rollout/vllm_rollout/vllm_rollout.py` - `tests/workers/rollout/rollout_vllm/test_vllm_hf_loader.py` - `verl/third_party/vllm/vllm_v_0_5_4/` (entire directory) - `verl/third_party/vllm/vllm_v_0_6_3/` (entire directory) - `pytest.ini` Modified Core Files: - `verl/third_party/vllm/__init__.py`: Simplified version detection with clear error messages - `verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py`: Removed cache engine management and version conditionals - `verl/workers/sharding_manager/fsdp_vllm.py`: Dropped `customized_vllm` flag logic - `verl/workers/sharding_manager/megatron_vllm.py`: Simplified weight loading and cache management Configuration Updates: - `verl/trainer/config/generation.yaml`: Removed `use_fire_sampling` option - `verl/trainer/config/ppo_trainer.yaml`: Removed `use_fire_sampling` option - `tests/special_sanity/check_api_docs.py`: Removed `LLMEngine` from whitelist Documentation Updates: - `docs/start/install.rst`: Updated to recommend vLLM 0.8.3+ with `VLLM_USE_V1=1` - `docs/perf/perf_tuning.rst`: Updated performance recommendations - Removed 42+ `VLLM_ATTENTION_BACKEND=XFORMERS` exports from bash scripts Reverted Changes: - `.github/workflows/vllm.yml`: Restored original container image names - `docs/faq/faq.rst`: Restored original apptainer commands - `docs/ascend_tutorial/ascend_quick_start.rst`: Reverted all modifications - `examples/tuning//`: Restored original `nproc_per_gpu` settings ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs): Updated install and performance tuning docs - [x] Add unit or end-to-end test(s): Existing CI tests validate the changes; legacy-specific tests were removed as intended - [x] CI Request*: Once PR is ready, message will be sent to `ci-request` channel in verl Slack workspace --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-06-29 19:27:22 -07:00
xichengpro	ccefcf05ca	[doc] fix: Fix mismatched config description for `ppo_epochs` in critic (#2102 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Fix mismatched config description for `ppo_epochs` in critic ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ![image](https://github.com/user-attachments/assets/72df0d9a-3ac8-418c-b1c0-aa6e6daaccfd) > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-06-19 18:19:31 +08:00
Liwei Ma	e48292f698	[perf] feat: Add verl profiling support from Nvidia Nsight System (#1820 ) Add verl profiling support from Nvidia Nsight System ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? Add verl profiling support from Nvidia Nsight System ### High-Level Design This PR add config fileds to trigger Nsight profiling. If `trainer.profile_steps` is set, Nsight system will be triggered to profiling the corresponding steps. In each task role, other config fields control also control the profiling details. The profiling tasks include the single_controller process and the worker process. Single_controller process uses the re-designed `marked_timer` to record each task range in NVTX. The worker processes dumps the GPU execution details. Since veRL has hybrid-engine mode and supports split mode, there are two profiling modes, discrete or not. Discrete mode means each task will generate a dedicate database; otherwise a whole giant database will be generated. Nsight system supports to import and align multiple databases automatically. ### Specific Changes `verl.utils.debug.profile` add general profling interface and `verl.utils.debug.nvtx_profile` implements the interface. ### API `verl.utils.debug.performance._timer` has been changed to `simple_timer`, and `marked_timer` is added to support profiler range marker. `verl.utils.debug.profile` wrappers the basic profiler interfaces, including mark__range, mark_annotate, ProfilerConfig, WorkerProfiler, and WorkerProfilerExtension. `verl.utils.debug.nvtx_profile` implements the interfaces when nvtx is available. ### Usage Example Two examples are added in `/examples/ppo_trainer/run_deepseek_math_gsm8k_megatron_nsys.sh` `/examples/ppo_trainer/run_qwen2-7b_rm_seq_balance_nsys.sh` ### Test There should be no functional changes and performance changes. ### Additional Info. - Training: both FSDP, Megatron will be affected. - Inference*: both vLLM, SGLang will be affected. ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] Add CI test(s) if necessary.	2025-06-17 11:05:16 -07:00
Jianbing-D	c8908e197c	[fsdp] feat: Memory efficient cross entropy with a linear layer fused (#462 ) Implemented forward and backward of the following compute logics, which eliminated many intermediate storage tensors, and resulted in reduced peak memory usage. ## Equivalent compute logic: ```python def run_torch_entropy(hidden: torch.Tensor, weight: torch.Tensor, labels: torch.Tensor) -> typing.List[torch.Tensor]: logits = torch.matmul(hidden.to(torch.float32), weight.to(torch.float32)) # [num_tokens, vocab_size] pd = torch.nn.functional.softmax(logits, dim=-1) # [num_tokens, vocab_size] entropy_a = torch.logsumexp(logits, dim=-1) # [num_tokens] entropy_b = torch.sum(pd * logits, dim=-1) # [num_tokens] entropy = entropy_a - entropy_b logprobs = torch.nn.functional.cross_entropy(logits, labels) # [1] logprobs = torch.neg(logprobs) return logprobs, entropy ``` ## API ```python from verl.utils.kernel import linear_cross_entropy hidden = torch.randn(num_tokens, hidden_size, dtype=torch.bfloat16, device="cuda") weight = torch.randn(hidden_size, vocab_size, dtype=torch.bfloat16, device="cuda") labels = torch.randint(0, vocab_size, (num_tokens,), device="cuda") loss, entropy = linear_cross_entropy(hidden, weight, labels, reduction="mean") ``` ## Storage and latency <img width="636" alt="image" src="https://github.com/user-attachments/assets/396b7303-a46a-46b1-a261-917fda034b02" /> ## Unit test ```shell $ cd verl/ $ python3 tests/kernel/test_memory_efficient_entropy.py ``` # NOTE For compatibility, `torch.library.triton_op` was not applied to those APIs, so that `torch.compile` might not be able to be enabled on top of it. --------- Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn> Co-authored-by: gaoziyuan.955 <gaoziyuan.955@bytedance.com> Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>	2025-06-11 19:48:47 +08:00
Joel	457f4d2a20	[rollout] feat: follow OpenAI tool calling schema in chat scheduler (#1831 )	2025-06-07 07:47:47 +08:00
H	cef6361def	[docs] lora: fix lora image and add GRPO docs (#1788 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix image rendering	2025-06-01 09:49:42 +08:00
Blue Space	55f13ff16f	[fix] moonlight runnable with trust_remote_code (#1749 )	2025-05-29 22:25:28 +08:00
DtYXs	904a252379	Add an example script for PF-PPO training (#1753 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add an example script for PF-PPO training ### Specific Changes > Add an example script `run_deepseek7b_llm_pfppo.sh` in `examples/ppo_trainer/` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-05-29 15:53:43 +08:00
Yan Bai	be47ac44b2	[mcore] moonlight (small model with deepseekv3 arch) (#1284 ) achieve 74.3 at gsm8k, while moonlight reported as 77.4 still WIP with the performance diff	2025-05-28 17:10:29 +08:00
Joel	3eaaf24d5a	[rollout] perf: replace AsyncOpenAI to aiohttp client in ChatCompletionScheduler (#1588 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? AsyncOpenAI has very severe performance issue due to httpx, replace it to aiohttp client. For train_batch_size=1024, AsyncOpenAI introduces ~25s per generation phase. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-20 11:31:19 +08:00
OC	2c8b2b995f	[feat] Sandbox: support sandbox fusion on FaaS & localhost (#1429 ) ### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Implement sandbox fusion backend on FaaS. For example, reward score using a FaaS instance on volcengine.com. It have better performance and security comparing to local sandbox. ### Specific Changes Added a code branch in _default_compute_score to choose sandbox according to sandbox_fusion_url configuration. ### Usage Example examples/ppo_trainer/run_deepseek7b_llm_sandbox_fusion.sh ### Test tests/reward_score/test_sandbox_fusion.py However, the new testcase requires to setting Sandbox API URL in env SANDBOX_FUSION_URL. If the env is not set, most testcases will be skipped. ### Additional Info. Using sandbox on Faas have save 60% time on reward process comparing local sandbox: <img width="273" alt="截屏2025-05-07 20 37 05" src="https://github.com/user-attachments/assets/fc9c0e23-6afe-4f34-a28a-a1756e85d45f" /> ### Checklist Before Submitting - [] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [] Add `[BREAKING]` to the PR title if it breaks any API. - [] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [] Add CI test(s) if neccessary.	2025-05-15 17:53:47 +08:00
Joel	1e47e412a4	[rollout] misc: add demo chat completion scheduler described in ReTool paper (#1297 ) Co-authored-by: shengguangming <shengguangming@bytedance.com>	2025-05-04 19:07:22 +08:00
HL	958eae3523	[example] chore: remove verl_getting_started.ipynb (#1281 ) remove the out-dated notebook	2025-04-29 10:55:27 +08:00
Shawn/Yuxuan Tong	8e5ad4688a	[Lint] fix: linting errors in all files (#1280 ) This PR enables checking on all files after fixing all the errors: ``` examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120) examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120) examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120) examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120) examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120) examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120) examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120) examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120) recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120) recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120) recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120) recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120) recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120) recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120) recipe/r1/data_process.py:131:9: E722 Do not use bare `except` recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except` scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found scripts/model_merger.py:42:121: E501 Line too long (184 > 120) scripts/model_merger.py:146:13: E722 Do not use bare `except` tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120) tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120) tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20 tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120) tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18 tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120) tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120) tests/sanity/check_license.py:22:1: E402 Module level import not at top of file tests/sanity/check_license.py:23:1: E402 Module level import not at top of file tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/__init__.py:22:1: E402 Module level import not at top of file verl/__init__.py:24:1: E402 Module level import not at top of file verl/__init__.py:25:1: E402 Module level import not at top of file verl/__init__.py:29:1: E402 Module level import not at top of file verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120) verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120) verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120) verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120) verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120) verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120) verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120) verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120) verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120) verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120) verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120) verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120) verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120) verl/protocol.py:303:121: E501 Line too long (125 > 120) verl/protocol.py:352:121: E501 Line too long (171 > 120) verl/protocol.py:578:121: E501 Line too long (142 > 120) verl/protocol.py:580:121: E501 Line too long (150 > 120) verl/protocol.py:583:121: E501 Line too long (167 > 120) verl/protocol.py:715:1: E402 Module level import not at top of file verl/protocol.py:725:121: E501 Line too long (121 > 120) verl/protocol.py:766:1: E402 Module level import not at top of file verl/protocol.py:768:1: E402 Module level import not at top of file verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120) verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120) verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+` verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120) verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120) verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120) verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29 verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120) verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120) verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120) verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82 verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120) verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120) verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120) verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key` verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key` verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120) verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120) verl/utils/debug/profile.py:15:1: I001 [] Import block is un-sorted or un-formatted verl/utils/debug/profile.py:19:15: UP039 [] Unnecessary parentheses after class definition verl/utils/debug/profile.py:50:23: F541 [] f-string without any placeholders verl/utils/debug/profile.py:52:49: F541 [] f-string without any placeholders verl/utils/debug/profile.py:53:47: F541 [] f-string without any placeholders verl/utils/debug/profile.py:54:67: F541 [] f-string without any placeholders verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120) verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120) verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120) verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120) verl/utils/megatron_utils.py:17:1: I001 [] Import block is un-sorted or un-formatted verl/utils/megatron_utils.py:22:20: F401 [] `torch.nn` imported but unused verl/utils/megatron_utils.py:34:38: F401 [] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused verl/utils/megatron_utils.py:332:30: B009 [] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/megatron_utils.py:366:27: B009 [] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/model.py:464:121: E501 Line too long (124 > 120) verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120) verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120) verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120) verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120) verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120) verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120) verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120) verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120) verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120) verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120) verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120) verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120) verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120) verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l` verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used verl/workers/actor/megatron_actor.py:22:1: I001 [] Import block is un-sorted or un-formatted verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120) verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120) verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120) verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120) verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120) verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120) verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120) verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120) verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120) verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120) verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120) verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120) verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120) verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used Found 305 errors. ``` --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-04-27 15:24:30 -07:00
Joel	aacd3660fc	[rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138 ) ### Summary Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710 ### Architecture ![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d) New Components: - AsyncLLMWorker: standalone vllm server instance - FastAPI: provide OpenAI-compatible HTTP server - AsyncLLM: async LLMEngine for online serving, for more details: [AsyncLLM](https://github.com/vllm-project/vllm/pull/9826), [LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine) - ExternalRayDistributedExecutor: custom executor backend manages workers in worker group, it grabs corresponding workers by actor names - AsyncLLManager: manages a group of vllm server instances(AsyncLLMWorker) - AsyncLLM lifecycle: initialization, wake_up, sleep. - FastAPI service discovery - ChatScheduler: schedule multiple chat completion requests with multiple server instances - Least requests load balance - Sticky session with prefix caching - Chat completion callback: tools calling ### TODO - [x] AsyncLLM: intialization/wake_up/sleep - [x] OpenAI API: support `/v1/chat/completions` - [x] RayPPOTrainer integration: replace `generate_sequences` to http call `/v1/chat/completions` - [x] GSM8K e2e training - [ ] Add document --------- Co-authored-by: shengguangming <shengguangming@bytedance.com>	2025-04-25 17:56:34 +08:00
BearBiscuit	f315ac3b98	[misc] refactor moe bash (#1245 )	2025-04-24 22:46:47 +08:00
Blue Space	4081d8af1f	refactor example and test scripts to use megatron comm/comp overlap and checkpoint save (#1202 ) Examples megatron scripts are outdated.	2025-04-23 11:30:30 +08:00
Shawn/Yuxuan Tong	28e45cbde2	[Config] fix: disable XFORMERS by default since we immgrated to newer vLLM versions (#1178 )	2025-04-20 07:46:20 -07:00
Yan Bai	4fa7ed6c0d	[mcore] qwen2moe support (#1139 ) support qwen2moe structure to run with megatron-core including: * qwen2moe config converter * qwen2moe model initializer * refactor the online weight converter from mcore to vllm * qwen2moe online weight converter * qwen2moe offline weight conversion script from hf to mcore * a script to run training qwen1.5moe_a2.7b with 4 nodes TODO add option to freeze the MoE router weight during training	2025-04-20 12:48:46 +08:00
Shawn/Yuxuan Tong	b00f77d855	[dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010 ) > [!WARNING] > We are [immigrating to `ruff` as the linter and formatter and `pre-commit` as the managing tool](https://github.com/volcengine/verl/pull/1010). > > If your branch is based on a previous commit using `yapf` and `pylint`, simply merging might trigger overwhelming linting errors, while you are only expected to resolve ones in the files related to your PR. > > To resolve this issue, please try the following workaround to only include the files you really changed in the PR: > > 1. In your branch, fix linting and format with `ruff`: `ruff check --fix && ruff-format` > 2. Squash into a single commit in a new branch: `git reset --soft $(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."` > 3. Merge with the latest main: `git merge origin/main` > 4. Force push to your branch: `git push --force` We add the reminder above to the documentation to tell contributors how to avoid overwhelming linting errors. ### Motivation According to dicussion in #896, this PR immigrates from yapf & pylint to ruff based on pre-commit, which allows unified version control and automatic hook on committing. ### Summary The `pre-commit` hook and CI - checks staged / committed files in commits / PR's - checks all files each month (This should fail before we fix all the files by the ruff standard) ### Explanation for the Failing CI Workflow `pre-commit` For now, we only apply `ruff format` and `ruff check --fix` without resolving all the errors, since there are too many errors to resolve, which causes the CI workflow `pre-commit` fails. For resolving the remaining errors, we leave to future commits. Specifically, the `pre-commit` hook and CI will require every commit to fix its related files with `ruff`, which will fix all the files incrementally. ### Reviewing Suggestion The commit `3d93f51ba8` is huge since we apply `ruff` to all the files. To review the main changes, please check the commits before and after it.	2025-04-18 07:49:31 -07:00
Junrong Lin	7fc8330d99	[sglang] feat: SGLang rollout multinode support (#915 ) Allow multinode tensor parallel for furture plan --------- Co-authored-by: zobinHuang <zobin1999@gmail.com> Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-04-05 20:17:35 -07:00
Lumeng Wu	072fc9feed	feat: support no reference model; fix KL issues (#644 ) ### Before get started Difference between KL penalty in reward and KL loss > [!TIP] > > 1. In-reward KL penalty > > > $$ > r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log \frac{\pi_{\theta}(o_t \| q, o_{<t})}{\pi_{\text{ref}}(o_t \| q, o_{<t})}} > $$ > > 2. KL Loss > > $$ > L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t, \text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ] > $$ > > $$ > \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} \|\| \pi_{\text{ref}})} > $$ ### Problems 1. The current code doesn't support not using reference model This feature is half-implemented since the very first commit but never completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy` but it's always True since role_worker_mapping always has `Role.RefPolicy`. 2. Restriction of `use_kl_loss` Currently, `use_kl_loss` determines whether to use in-reward kl penalty or kl loss. So we can not use both or neither. `87a813658f/verl/trainer/ppo/ray_trainer.py (L875-L879)` `87a813658f/verl/workers/actor/dp_actor.py (L299-L307)` > [!CAUTION] > > ### You may have unintentionally adopted in-reward KL penalty > > For the experiments you've conducted, if you set `actor.use_kl_loss`=False or didn't set it (Default is False),*You unintentionally used in-reward KL penalty.* If you don't want any KL, you should set `actor_rollout_ref.actor.use_kl_loss=False` and `algorithm.use_kl_in_reward=False` (or not to set them because they are the default config) after this commit. 3. Deprecated config After investigation, I guess Critic may used to be responsible for in-reward KL. But this feature seems paralyzed. 1. Line 290, there may used to be `config.algorithm.kl_ctrl.target_kl` and `config.critic.kl_ctrl.horizon` , which are not supported currently. `3ec83117c3/verl/trainer/ppo/ray_trainer.py (L289-L293)` 2. In `verl/workers/critic/megatron_critic.py` : redundant set of `self.kl_ctrl` `3b18b0eb74/verl/workers/critic/megatron_critic.py (L69-L73)` ### What’s Changed? 1. Add support for not using reference model 2. Fixed the incomplete code of the KL controller. 3. A test case for using both kl terms 4. Some other misc issues in the code. ### How to disable reference model * set `actor_rollout_ref.actor.use_kl_loss=False` and `algorithm.use_kl_in_reward=False` (They are by default False, so you can simply not set them)	2025-04-01 10:14:38 +08:00
Shawn/Yuxuan Tong	64bddb68f5	[BREAKING config] fix: move val_before_train to config yaml. Using trainer.val_before_train instead of +trainer.val_before_train going forward (#820 )	2025-03-30 23:05:48 -07:00
Blue Space	c34206925e	Support for GRPO with Megatron backend (#592 ) Support for GRPO with Megatron backend and fix a configuration bug when not using virtual pipeline. Calibrated with FSDP backend.	2025-03-14 23:39:20 +08:00
Guangming Sheng	386cfabed2	[misc] feat: make filter long prompt an option (#506 ) # Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts.	2025-03-07 19:27:25 +08:00
Blue Space	35555d8ae9	Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism with multiple bug fixed (#495 ) This PR combines multiple modifications. # QWen2.5 checkpoint saver bug fix Thanks for the efforts @uygnef contributed to #368 , we use the new saver for model loader and saver for 3D parallelism support. # Megatron backend 3D-parallelism test benches We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well as the CI workflows, all tested. # Bug Fix for 3D-parallelism Including configuration bugs as well as the module packing. Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the implementation with `torch.bmm`. # Fully migration to Megatron Core Now we only use Megatron core in verl, fully get rid of calling other components. If they are in need, please integrate them into `utils/megatron`. --------- Co-authored-by: uygnef <admin@fengyu.org>	2025-03-07 13:38:58 +08:00
Hong Zhang	7a5e9496bd	support speed up downloading model from modelscope (#463 ) Add support for downloading models from modelscope by setting `VERL_USE_MODELSCOPE=True` --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>	2025-03-05 16:13:56 +08:00
Shawn/Yuxuan Tong	4011f407b0	[Fix] Deprecate `val_batch_size` (#353 ) Validation datasets are sent to inference engines as a whole batch, which will schedule the memory themselves. - [x] Remove `val_batch_size` from examples - [x] Set default values of `val_batch_size` in configs as `null` and add DEPRECATED comments - [x] Add deprecation warnings about `val_batch_size` in `_validate_config`	2025-02-24 10:24:24 +08:00
Kinman Lei	9448762515	[megatron] feat: support qwen2 megatron backend (#261 ) Support Qwen2 Megatron backend The code is primarily adapted from the llama folder, with modifications to use QKV bias and remove the rope_scaling of RoPE in `verl/models/qwen2/megatron/layers/parallel_attention.py`. - Train using Qwen2-7B-Instruct with PPO, GSM8k score can reach 0.87 at step 75. - not support saver now	2025-02-19 22:21:14 +08:00
HL	77f065ea9d	example: fix the gemma2 example, update NGC dockerfile (#291 )	2025-02-18 11:09:50 +08:00
Guangming Sheng	9db52329f6	[misc] feat: support offload parameter and optimizer during rollout (#284 ) - Fixed FSDP1 model offload - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU memory utilization can increase to 0.9 - With actor, critic and reference offload all enabled, there will only be one model copy at a time in the GPU memory. Therefore, we can further increase the `micro_batch_size_per_gpu` or `max_token_per_gpu` Specifically: - During rollout, only rollout model and KVCache are in the GPU memory. - During critic compute values, only the critic model will stay in the GPU memory while its optimizer and other model states are in CPU main memory - During actor update, the actor model, optimizer are stored on GPU while the reference model and critic model, critic optimizer are offloaded to CPU.	2025-02-17 14:07:43 +08:00
HL	ced8ecbf39	example: switch the default model ckpt for Megatron, add wandb logs (#210 ) use the general purpose LLM for the math task instead of code LLM. --------- Co-authored-by: Your Name <you@example.com>	2025-02-05 19:53:59 -08:00
HL	818e4de2fb	megatron: fix config error and add compute log prob interface (#186 )	2025-02-02 18:32:59 -08:00
HL	677e120afa	data: fix the math dataset source (#175 ) since 'lighteval/MATH' is no longer available on huggingface.	2025-02-01 10:02:23 -08:00

1 2

65 Commits