mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 21:53:50 +08:00
main
49 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
60d8a62c0d |
[data, trainer] feat: add support for limiting samples from dataset (#3812)
### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. e.g.: For RLHFDataset, `filter_overlong_prompts` can be very expensive and it will be good to add support to limit the sample size before we do this when the dataset is very large. Also add support for other kinds of datasets for unification. - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) |
|||
65eb019a81 |
[trainer] fix: Add data.seed to config (#3815)
|
|||
ae5d8504d4 |
[trainer] feat: ReMax support using reward model for baseline (#3780)
### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Not only limited to reward functions, we should also support using rm to calculate the reward baseline. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org> |
|||
7f27789961 |
[fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739)
### What does this PR do? > Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to align with Hugging Face Trainer API。 The following pull request is for refactoring the optimizer, however, the naming issue persists. https://github.com/volcengine/verl/pull/3656 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: weiqi.li <weiqi.li@bytedance.com> |
|||
f346f96d29 |
[training_utils] fix: stop using math naming under reward score" (#3378)
|
|||
6469be213e |
[recipe] fix: make compute of step consistent across all trainers (#3132)
### What does this PR do? follow-up to #3117 > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) |
|||
2bbd09245c |
[ray] feat: add support for ray init kwargs (#3049)
### What does this PR do? This PR adds support for passing parameters to `ray.init`. Users can now dynamically configure settings such as `address`, `port`, `_temp_dir`, and more based on their specific needs. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ```bash # when /tmp/ray/ is used by others # when ray is initialized at 6379 by others # when the dashboard is not accessible at localhost # ... bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh \ +ray_kwargs.ray_init._temp_dir=/tmp/ray/my_dir \ +ray_kwargs.ray_init.address=127.0.0.1:6378 \ +ray_kwargs.ray_init.dashboard_host=0.0.0.0 ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) |
|||
c0f99f3da2 |
[BREAKING] [ray, megatron] feat: remove RayMegatronWorker (#2895)
### What does this PR do? - Following https://github.com/volcengine/verl/pull/2893, we can now directly register dispatch and collect function inside the worker. So, there is no need to maintain RayMegatronWorker and RayMegatronWorkerGroup, which is a hacking solution ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
|||
d640f99219 | [recipe] fix: fix issue when running split ppo (#2745) | |||
7459131411 |
[hardware] refactor: replace device_name with config.trainer.device (#2542)
### What does this PR do? In some methods, the get_device() method is redundant, and we plan to replace get_deivce with config.trainer.device ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com> |
|||
a31a8f251f |
[doc] fix: quickstart example can't work on zsh (#2509)
### What does this PR do? I followed the instructions at https://verl.readthedocs.io/en/latest/start/quickstart.html to run the PPO example on my devbox, which uses zsh. However, I got the error zsh: no matches found: `trainer.logger=[console]` because `[]` is interpreted as a glob pattern in zsh. ``` (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console'] \ trainer.val_before_train=False \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 | tee verl_demo.log zsh: no matches found: trainer.logger=[console] ``` This PR has 3 changes: * `trainer.logger=['console']` -> `trainer.logger=console` * `trainer.logger=['console','wandb']` -> `trainer.logger='["console","wandb"]'` * `trainer.logger=['console','tensorboard']` -> `trainer.logger='["console","tensorboard"]'` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * `trainer.logger=console` (zsh) <img width="898" height="564" alt="image" src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80" /> * ``trainer.logger='["console","wandb"]'`` (zsh) <img width="870" height="565" alt="image" src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1" /> * `trainer.logger=console` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger=console \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 | tee verl_demo.log 2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID: 1799248 (TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra'], (TaskRunner pid=1799248) 'save_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra']}, ``` * `trainer.logger='["console","wandb"]'` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger='["console","wandb"]' \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 | tee verl_demo.log 2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID: 1805000 (TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra'], (TaskRunner pid=1805000) 'save_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra']}, ``` ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> |
|||
4aa02fe166 |
[trainer] fix: Allow FSDP2 when doing strategy check (#2497)
### What does this PR do? Allow FSDP2 when doing strategy check ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. For `strategy` field, now both "fsdp" and "fsdp2" are considered valid. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Hollow Man <hollowman@opensuse.org> |
|||
00a10a8ef3 |
[ci] refactor: reduce ruff line-length from 300 to 120 (#2287)
### What does this PR do? Previously the ruff line-len is too large, making it hard for users to view code. If we keep the config, manually created short lines will be formatted to long lines as well. This PR contains 3 commits: - df4bbfca62f41d972c48c8a76088ae2ac29691cf set line len to 120 and run pre-commit auto-format - 9d03f183edd9fff4e22215cacacf62c06b7b41d3 let devin fix the multi-line code - 9fc8d436f5007535fad3dc49983b01d0d457be9c skip lint for test_sglang_async_rollout_sf_tools.py. manually adjust format for rope_utils.py - last two commits: 1. merge with main 2. run lint after merge. add test_sglang_async_rollout_sf_tools.py and scripts/legacy_model_merger.py to lint.exclude ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test This PR relies on CI for testing. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> |
|||
86ef66ebe6 | [trainer] fix: fix split placement (#2227) | |||
ff750e2472 |
[trainer] fix: indentation error leading to critic_output.get() failure (#2143)
### What does this PR do? This PR addresses an `IndentationError` that was causing the `critic_output.get()` call to fail when `self.use_critic` was false. ### Checklist Before Starting - [x] Search for similar PRs. [The PR cause the problem](https://github.com/volcengine/verl/pull/281) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > None. This is just a simple bug fix involving a few lines of code. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > This is just a simple bug fix involving a few lines of code. ### Specific Changes > This is just a simple bug fix involving a few lines of code. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). |
|||
68d62518ce |
[misc] fix: fix timer importance error in split_placement (#2169)
### What does this PR do? fix timer importance error in split_placement, should use `from verl.trainer.ppo.ray_trainer import marked_timer`, but got `from verl.trainer.ppo.ray_trainer import _timer` now. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### High-Level Design Not related. ### Specific Changes fix timer importance error in split_placement, should use `from verl.trainer.ppo.ray_trainer import marked_timer`, but got `import _timer` now. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). |
|||
ca65c363fb |
[hardware] refactor: refactor part of device management (#1974)
### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [x] In format of: [modules] type: Title - [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [x] type is in `feat, fix, refactor, chore` - [x] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Refactor device management such as `torch.cuda` and `nccl` in most part of code in `verl/recipe` and `verl/verl`, which is more convinent for supporting other devices or platforms. ### Test Not related. ### High-Level Design Not related. ### Specific Changes 1. use `get_torch_device()` to get corresponding `torch.device()` object based on specific device. 2. use `get_device_id()` to get corresponding device rank index based on specific device. 3. use `get_nccl_backend()` to get corresponding nccl backend based on specific device. ### API Not related. ### Usage Example Monifications in this PR should not be perceived. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. |
|||
0bd03d7c05 |
[FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy (#1927)
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Add fsdp1 forward pefetch configuration. 2. Add chunk entropy computation. 3. Add torch.checkpoint to entropy computation. 4. Move data to device from `ActorRolloutRefWorker.update_actor` to `DataParallelPPOActor.update_policy`. 5. Add `npu_cross_entropy_loss` fusion kernel. ### High-Level Design 1. More detail see [FSDP forward_pefetch](https://docs.pytorch.org/docs/stable/fsdp.html#module-torch.distributed.fsdp) 2. `logits` usually is a large tensor [bsz\*seq_len, voc], on `compute_entropy_from_logits` will use [bsz\*seq_len, voc] * (4(float32) + 2(autocast of softmax+logsumexp) + 1(output of softmax)) memory. To reduce this memory peak, we can use chunk calculation, changing [bsz*seq_len, voc] to [chunk_size(2048), voc]. 3. During the training phase, `enable_gradient_checkpointing=True` is not applicable to entropy calculation, so add the recomputation function of entropy to reduce the memory peak during the training phase. 4. On `ActorRolloutRefWorker.update_actor` all batch data is moved to the device, but this is unnecessary, `DataParallelPPOActor.update_policy` will move the data to the device for each micro batch. ### Specific Changes > List the specific changes. ### API Add 3 new configurations in actor/ref, 1 new configuration in critic/reward. - actor_rollout_ref.actor.fsdp_config.forward_prefetch: False - actor_rollout_ref.actor.entropy_from_logits_with_chunking: False - actor_rollout_ref.actor.entropy_checkpointing: False - actor_rollout_ref.ref.fsdp_config.forward_prefetch: False - actor_rollout_ref.ref.entropy_from_logits_with_chunking: False - actor_rollout_ref.ref.entropy_checkpointing: False - critic.model.fsdp_config.forward_prefetch: False - reward_model.model.fsdp_config.forward_prefetch: False ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. |
|||
70bd3d3d6b | [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding (#1834) | |||
263115cd9d |
[dev] fix: note that DP balancing doesn't affect advantage calculation (#1809)
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes the comments about DP balancing. btw, it adds the DP balancing option in the PRIME trainer, while keeping the default value as `False`. ### Additional Info. - **Issue Number**: #1718 - **Training**: none - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. |
|||
2aed8d0a45 |
[BREAKING] config: set the default value of actor.entropy_coeff to 0 (#1770)
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? entropy_coeff shall be set carefully during RL. When enabled, inappropriate coefficient may case training to collapse. You can see more empirical experiments from Skywork Open Reasoner 1 Technical Report (https://arxiv.org/pdf/2505.22312). In this PR, the default value of entropy_coeff is set to 0. This is a breaking change that may affect your experiment, although majority of verl example scripts set it to 0 manually already. We let most example script just pick up the default value of 0 for entropy_coeff. For a few documentation page where the reference model performance and commands are provided, we modify the doc so that the experiment result is consistent with the config setup. ### Usage Example To enable entropy loss coefficient, use ```bash actor_rollout_ref.actor.entropy_coeff=0.001 # or other values ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. |
|||
8653b1b200 | [misc] feat: support return full prompt with chat template in RLHFDataset (#1567) | |||
c3b20575d2 |
[util] docs: add docstrings to metric util functions that recipes reuse (#1395)
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In `/recipes`, a few functions under `trainer/ppo/metric_utils` are imported and reused. Right now many of them are task dependent and assume specific keys in the input metric dict. To make these functions more robust and backward compatible, a few tests are added. Additionally, one method is moved to verl.utils as a public API due to its general purpose nature. A API doc page is added correspondingly. In order to make it easy for others to customize verl trainers, many more other classes require further documentations, such as: - AdvantageEstimator, RayPPOTrainer, apply_kl_penalty, compute_advantage - from verl.single_controller.ray import RayWorkerGroup - from verl.trainer.ppo.core_algos import agg_loss - from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role, WorkerType - from verl.utils.checkpoint.checkpoint_manager import find_latest_ckpt_path They shall be enhanced in future PRs. ### High-Level Design None ### Specific Changes - added tests - added verl.utils.metric namespace ### API `verl.trainer.ppo.metric_utils.reduce_metrics` changed to `verl.utils.metric.reduce_metrics`. deprecation warnings are added. ### Usage Example None ### Test Added ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. https://github.com/volcengine/verl/issues/1354 - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Co-authored-by: openhands <openhands@all-hands.dev> |
|||
8e5ad4688a |
[Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors: ``` examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120) examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120) examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120) examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120) examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120) examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120) examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120) examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120) recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120) recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120) recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120) recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120) recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120) recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120) recipe/r1/data_process.py:131:9: E722 Do not use bare `except` recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except` scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found scripts/model_merger.py:42:121: E501 Line too long (184 > 120) scripts/model_merger.py:146:13: E722 Do not use bare `except` tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120) tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120) tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20 tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120) tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18 tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120) tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120) tests/sanity/check_license.py:22:1: E402 Module level import not at top of file tests/sanity/check_license.py:23:1: E402 Module level import not at top of file tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/__init__.py:22:1: E402 Module level import not at top of file verl/__init__.py:24:1: E402 Module level import not at top of file verl/__init__.py:25:1: E402 Module level import not at top of file verl/__init__.py:29:1: E402 Module level import not at top of file verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120) verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120) verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120) verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120) verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120) verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120) verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120) verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120) verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120) verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120) verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120) verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120) verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120) verl/protocol.py:303:121: E501 Line too long (125 > 120) verl/protocol.py:352:121: E501 Line too long (171 > 120) verl/protocol.py:578:121: E501 Line too long (142 > 120) verl/protocol.py:580:121: E501 Line too long (150 > 120) verl/protocol.py:583:121: E501 Line too long (167 > 120) verl/protocol.py:715:1: E402 Module level import not at top of file verl/protocol.py:725:121: E501 Line too long (121 > 120) verl/protocol.py:766:1: E402 Module level import not at top of file verl/protocol.py:768:1: E402 Module level import not at top of file verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120) verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120) verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+` verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120) verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120) verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120) verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29 verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120) verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120) verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120) verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82 verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120) verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120) verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120) verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key` verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key` verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120) verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120) verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120) verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120) verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120) verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120) verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/model.py:464:121: E501 Line too long (124 > 120) verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120) verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120) verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120) verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120) verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120) verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120) verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120) verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120) verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120) verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120) verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120) verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120) verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120) verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l` verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120) verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120) verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120) verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120) verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120) verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120) verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120) verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120) verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120) verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120) verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120) verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120) verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120) verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used Found 305 errors. ``` --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com> |
|||
103f90113f |
[dev] fix: instructions about merging from before using ruff (#1180)
Our pre-commit hook and CI action only check the changes for now. In this PR, 1. We apply `ruff check --fix` and `ruff format`. 2. We remove the unnecessary pipeline from the immigration warning, since directly merging without applying `ruff`, which might cause extra conflicts, is the best way to avoid introducing extra file changes. |
|||
725c67666f |
[ray] fix: ray hang due to num_cpus (#1009)
Fixing #523 according to https://github.com/volcengine/verl/issues/523#issuecomment-2723652147 Concern: will `num_cpus=1` limit the performance of the cluster scheduler? |
|||
6d8f2f6ab9 |
[algo] feat: Add DrGRPO (#990)
https://github.com/volcengine/verl/issues/742 - Add an option for disabling standard-deviation normalization of advantages in GRPO. - This completes one out of two algorithmic changes made by Dr.GRPO to GRPO, the other one being the removal of sequence-length averaging during loss aggregation. |
|||
b00f77d855 |
[dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010)
> [!WARNING]
> We are [immigrating to `ruff` as the linter and formatter and
`pre-commit` as the managing
tool](https://github.com/volcengine/verl/pull/1010).
>
> If your branch is based on a previous commit using `yapf` and
`pylint`, simply merging might trigger overwhelming linting errors,
while **you are only expected to resolve ones in the files related to
your PR**.
>
> To resolve this issue, please try the following workaround to only
include the files you **really changed** in the PR:
>
> 1. In your branch, fix linting and format with `ruff`: `ruff check
--fix && ruff-format`
> 2. Squash into a single commit in a new branch: `git reset --soft
$(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."`
> 3. Merge with the latest main: `git merge origin/main`
> 4. Force push to your branch: `git push --force`
We add the reminder above to the documentation to tell contributors how
to avoid overwhelming linting errors.
### Motivation
According to dicussion in #896, this PR immigrates from yapf & pylint to
ruff based on pre-commit, which allows unified version control and
automatic hook on committing.
### Summary
The `pre-commit` hook and CI
- checks staged / committed files in commits / PR's
- checks all files each month (This should fail before we fix all the
files by the ruff standard)
### Explanation for the Failing CI Workflow `pre-commit`
For now, we only apply `ruff format` and `ruff check --fix` **without
resolving all the errors**, since there are too many errors to resolve,
which causes the CI workflow `pre-commit` fails.
For resolving the remaining errors, we leave to future commits.
Specifically, the `pre-commit` hook and CI will require every commit to
fix its related files with `ruff`, which will fix all the files
incrementally.
### Reviewing Suggestion
The commit
|
|||
3a27a98647 |
[recipe] feat: integrate DAPO and provide reproduction script (#623)
> [!WARNING] > As mentioned in https://github.com/volcengine/verl/pull/623#issuecomment-2733688593, the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation. > **We keep it as is for reproducibility in this branch** and will fix it in another PR for the main branch. --------- Co-authored-by: Guangming Sheng <shengguangming@bytedance.com> Co-authored-by: Guangming Sheng <petershengwhu@gmail.com> |
|||
072fc9feed |
feat: support no reference model; fix KL issues (#644)
### Before get started Difference between KL penalty in reward and KL loss > [!TIP] > > 1. In-reward KL penalty > > > $$ > r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log \frac{\pi_{\theta}(o_t | q, o_{<t})}{\pi_{\text{ref}}(o_t | q, o_{<t})}} > $$ > > 2. KL Loss > > $$ > L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t, \text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ] > $$ > > $$ > \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} || \pi_{\text{ref}})} > $$ ### Problems 1. The current code doesn't support not using reference model This feature is half-implemented since the very first commit but never completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy` but it's always True since role_worker_mapping always has `Role.RefPolicy`. 2. Restriction of `use_kl_loss` Currently, `use_kl_loss` determines whether to use in-reward kl penalty or kl loss. So we can not use **both or neither**. |
|||
7fbf609197 |
[BREAKING config] feat: add mlflow val generation log and uri config (#822)
### Changes - Add mlflow validation generation in `ValidationGenerationsLogger` in the form of MLFlow artifact files. - Add the config of `MLFLOW_TRACKING_URI` in mlflow tracking. - rename `val_generations_to_log_to_wandb` to `log_val_generations` ### Test Tested in the self-host mlflow servers. |
|||
50cba4aab9 |
docs: update checkpoint doc (#800)
Also fix some APIs. |
|||
22657bade5 |
[config] feat: lr_warmup_steps (#564)
This PR adds the `lr_warmup_steps` configuration. Note the `num_warmup_steps` is prior to `lr_warmup_steps_ratio`. |
|||
386cfabed2 |
[misc] feat: make filter long prompt an option (#506)
# Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts. |
|||
3165d98894 |
fix: (1) skipped last step (2) redundant validation and logging (#409)
This PR solves these 2 following problems. 1. Last step skipped `self.global_steps += 1` before if `self.global_steps >= self.total_training_steps` makes the last step skipped. We start from step 1, and we expect `self.total_training_steps` in total. |
|||
2440aa6993 | apis: add data proto to documentation page. use copy_to_local instead of copy_local_path_from_hdfs (#358) | |||
3b1aef2f5e |
[Fix] Using an enumeration class to avoid spelling errors in adv_esti… (#377)
#369 --------- Co-authored-by: Thom <zhangyi@zhangyideMacBook-Pro.local> |
|||
4011f407b0 |
[Fix] Deprecate val_batch_size (#353)
Validation datasets are sent to inference engines as a whole batch, which will schedule the memory themselves. - [x] Remove `val_batch_size` from examples - [x] Set default values of `val_batch_size` in configs as `null` and add DEPRECATED comments - [x] Add deprecation warnings about `val_batch_size` in `_validate_config` |
|||
9db52329f6 |
[misc] feat: support offload parameter and optimizer during rollout (#284)
- Fixed FSDP1 model offload - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU memory utilization can increase to 0.9 - With actor, critic and reference offload all enabled, there will only be one model copy at a time in the GPU memory. Therefore, we can further increase the `micro_batch_size_per_gpu` or `max_token_per_gpu` **Specifically:** - During rollout, only rollout model and KVCache are in the GPU memory. - During critic compute values, only the critic model will stay in the GPU memory while its optimizer and other model states are in CPU main memory - During actor update, the actor model, optimizer are stored on GPU while the reference model and critic model, critic optimizer are offloaded to CPU. |
|||
c8b9c3559a |
fix the split placement example (#281)
The split placement example is outdated, I tried it and encountered some errors. To address this, the following changes were made in this PR 1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to `examples/split_placement/config/ppo_trainer_split.yaml` 2. Copied `RayPPOTrainer.fit` method into the `fit` func in `examples/split_placement/split_monkey_patch.py` and modified it to get the futures of `critic_output` and `actor_output` |
|||
e842b73dae | docs: add programming model guide (#230) | |||
f2a76acd94 |
[BREAKING][misc] feat: change micro_batch_size to micro_batch_size_per_gpu (#136)
## Summary This PR changes all the micro_batch_size to micro_batch_size_per_gpu. **The Core logic of setting batch size:** - **All algorithmic metrics** (train batch size, ppo mini batch size): are global (from the perspective of single-controller), which will be normalized in each Worker. - **All performance-related parameters** (micro batch size, max token length in dynamic batch size) are local parameters, which represent the data sizes per GPU (i.e., each Worker). ## Main Changes 1. Change the scripts and config and delete the normalization for micro_bsz 2. Fix CI for SFT |
|||
2997e5dee8 | [rollout] feat: support best-of-n generation in vLLM (#80) | |||
99d2c19b0e |
[misc] feat: remove @ray.remote on workers to allow inheritance (#61)
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com> |
|||
f53597729e |
[BREAKING][refact]: move actor/critic/hybrid_engine/reward_model/rollout/workers out of ppo directory for reuse (#52)
* [refact] move actor/critic/hybrid_engine/reward_model/rollout/workers out of verl/trainer/ppo directory to verl/trainer directory for reusage * [misc] fix import * [refact] move actor/critic and others inside workers * fix uncommit files * fix uncommit files * [refact] move workers out of trainers |
|||
d60f843091 |
[sft] feat: fix sft dataset with latest preprocess code (#49)
* api: rename tracking logger to wandb logger type * [sft] feat: add tests for sft dataset * refresh dataset * force refresh * use ds model for tokenizer * add option for trainer.val_only * fix path * fix lint * add sft test for cot and raw q&a * add hf_tokenizer api to patch gemma tokenizer * fix test |
|||
c2296e80df | api: rename tracking logger to wandb logger type (#47) | |||
cd6cef609e |
[BREAKING][core] move single_controller into verl directory (#45)
* [BREAKING][core] move single_controller into verl directory * fix blocking flag in fsdp workers |
|||
6e8667bd66 |
[example] add a split placement tutorial (#43)
* [example] add a split placement tutorial * lint |