frozenleaves/verl - verl - Gitea: Git for Me

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 21:53:50 +08:00

Author	SHA1	Message	Date
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	60d8a62c0d	[data, trainer] feat: add support for limiting samples from dataset (#3812 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. e.g.: For RLHFDataset, `filter_overlong_prompts` can be very expensive and it will be good to add support to limit the sample size before we do this when the dataset is very large. Also add support for other kinds of datasets for unification. - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-10-20 20:28:48 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	65eb019a81	[trainer] fix: Add `data.seed` to config (#3815 )	2025-10-20 09:57:14 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	ae5d8504d4	[trainer] feat: ReMax support using reward model for baseline (#3780 ) ### What does this PR do? > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. Not only limited to reward functions, we should also support using rm to calculate the reward baseline. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-17 12:07:05 +08:00
yangbaoxing	7f27789961	[fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739 ) ### What does this PR do? > Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to align with Hugging Face Trainer API。 The following pull request is for refactoring the optimizer, however, the naming issue persists. https://github.com/volcengine/verl/pull/3656 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: weiqi.li <weiqi.li@bytedance.com>	2025-10-13 15:58:59 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	f346f96d29	[training_utils] fix: stop using `math` naming under reward score" (#3378 )	2025-09-07 09:57:24 +08:00
Tialo	6469be213e	[recipe] fix: make compute of `step` consistent across all trainers (#3132 ) ### What does this PR do? follow-up to #3117 > Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-20 09:54:29 +08:00
Qizhi Chen	2bbd09245c	[ray] feat: add support for ray init kwargs (#3049 ) ### What does this PR do? This PR adds support for passing parameters to `ray.init`. Users can now dynamically configure settings such as `address`, `port`, `_temp_dir`, and more based on their specific needs. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ```bash # when /tmp/ray/ is used by others # when ray is initialized at 6379 by others # when the dashboard is not accessible at localhost # ... bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh \ +ray_kwargs.ray_init._temp_dir=/tmp/ray/my_dir \ +ray_kwargs.ray_init.address=127.0.0.1:6378 \ +ray_kwargs.ray_init.dashboard_host=0.0.0.0 ``` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-15 20:02:56 +08:00
Chi Zhang	c0f99f3da2	[BREAKING] [ray, megatron] feat: remove RayMegatronWorker (#2895 ) ### What does this PR do? - Following https://github.com/volcengine/verl/pull/2893, we can now directly register dispatch and collect function inside the worker. So, there is no need to maintain RayMegatronWorker and RayMegatronWorkerGroup, which is a hacking solution ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-05 11:05:38 +08:00
Cheetah	d640f99219	[recipe] fix: fix issue when running split ppo (#2745 )	2025-07-29 07:32:59 +08:00
Cheetah	7459131411	[hardware] refactor: replace device_name with config.trainer.device (#2542 ) ### What does this PR do? In some methods, the get_device() method is redundant, and we plan to replace get_deivce with config.trainer.device ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-07-17 13:29:01 -07:00
Kai-Hsun Chen	a31a8f251f	[doc] fix: quickstart example can't work on zsh (#2509 ) ### What does this PR do? I followed the instructions at https://verl.readthedocs.io/en/latest/start/quickstart.html to run the PPO example on my devbox, which uses zsh. However, I got the error zsh: no matches found: `trainer.logger=[console]` because `[]` is interpreted as a glob pattern in zsh. ``` (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console'] \ trainer.val_before_train=False \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 \| tee verl_demo.log zsh: no matches found: trainer.logger=[console] ``` This PR has 3 changes: * `trainer.logger=['console']` -> `trainer.logger=console` * `trainer.logger=['console','wandb']` -> `trainer.logger='["console","wandb"]'` * `trainer.logger=['console','tensorboard']` -> `trainer.logger='["console","tensorboard"]'` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * `trainer.logger=console` (zsh) <img width="898" height="564" alt="image" src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80" /> * ``trainer.logger='["console","wandb"]'`` (zsh) <img width="870" height="565" alt="image" src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1" /> * `trainer.logger=console` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger=console \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID: 1799248 (TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra'], (TaskRunner pid=1799248) 'save_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra']}, ``` * `trainer.logger='["console","wandb"]'` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger='["console","wandb"]' \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID: 1805000 (TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra'], (TaskRunner pid=1805000) 'save_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra']}, ``` ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-14 13:26:32 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4aa02fe166	[trainer] fix: Allow FSDP2 when doing strategy check (#2497 ) ### What does this PR do? Allow FSDP2 when doing strategy check ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. For `strategy` field, now both "fsdp" and "fsdp2" are considered valid. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-07-12 16:31:31 -07:00
H	00a10a8ef3	[ci] refactor: reduce ruff line-length from 300 to 120 (#2287 ) ### What does this PR do? Previously the ruff line-len is too large, making it hard for users to view code. If we keep the config, manually created short lines will be formatted to long lines as well. This PR contains 3 commits: - df4bbfca62f41d972c48c8a76088ae2ac29691cf set line len to 120 and run pre-commit auto-format - 9d03f183edd9fff4e22215cacacf62c06b7b41d3 let devin fix the multi-line code - 9fc8d436f5007535fad3dc49983b01d0d457be9c skip lint for test_sglang_async_rollout_sf_tools.py. manually adjust format for rope_utils.py - last two commits: 1. merge with main 2. run lint after merge. add test_sglang_async_rollout_sf_tools.py and scripts/legacy_model_merger.py to lint.exclude ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test This PR relies on CI for testing. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-07-01 09:54:40 +08:00
Chi Zhang	86ef66ebe6	[trainer] fix: fix split placement (#2227 )	2025-06-29 12:42:51 -07:00
xingyunjohn1	ff750e2472	[trainer] fix: indentation error leading to critic_output.get() failure (#2143 ) ### What does this PR do? This PR addresses an `IndentationError` that was causing the `critic_output.get()` call to fail when `self.use_critic` was false. ### Checklist Before Starting - [x] Search for similar PRs. [The PR cause the problem](https://github.com/volcengine/verl/pull/281) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > None. This is just a simple bug fix involving a few lines of code. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > This is just a simple bug fix involving a few lines of code. ### Specific Changes > This is just a simple bug fix involving a few lines of code. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-26 14:50:09 -07:00
Zhen	68d62518ce	[misc] fix: fix timer importance error in split_placement (#2169 ) ### What does this PR do? fix timer importance error in split_placement, should use `from verl.trainer.ppo.ray_trainer import marked_timer`, but got `from verl.trainer.ppo.ray_trainer import _timer` now. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Not related. ### API and Usage Example Not related. ### High-Level Design Not related. ### Specific Changes fix timer importance error in split_placement, should use `from verl.trainer.ppo.ray_trainer import marked_timer`, but got `import _timer` now. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-06-24 17:06:12 +08:00
Zhen	ca65c363fb	[hardware] refactor: refactor part of device management (#1974 ) ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - [x] In format of: [modules] type: Title - [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc` - [x] type is in `feat, fix, refactor, chore` - [x] can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Refactor device management such as `torch.cuda` and `nccl` in most part of code in `verl/recipe` and `verl/verl`, which is more convinent for supporting other devices or platforms. ### Test Not related. ### High-Level Design Not related. ### Specific Changes 1. use `get_torch_device()` to get corresponding `torch.device()` object based on specific device. 2. use `get_device_id()` to get corresponding device rank index based on specific device. 3. use `get_nccl_backend()` to get corresponding nccl backend based on specific device. ### API Not related. ### Usage Example Monifications in this PR should not be perceived. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-14 20:53:47 +08:00
CurryRice233	0bd03d7c05	[FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy (#1927 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Add fsdp1 forward pefetch configuration. 2. Add chunk entropy computation. 3. Add torch.checkpoint to entropy computation. 4. Move data to device from `ActorRolloutRefWorker.update_actor` to `DataParallelPPOActor.update_policy`. 5. Add `npu_cross_entropy_loss` fusion kernel. ### High-Level Design 1. More detail see [FSDP forward_pefetch](https://docs.pytorch.org/docs/stable/fsdp.html#module-torch.distributed.fsdp) 2. `logits` usually is a large tensor [bsz\seq_len, voc], on `compute_entropy_from_logits` will use [bsz\seq_len, voc] * (4(float32) + 2(autocast of softmax+logsumexp) + 1(output of softmax)) memory. To reduce this memory peak, we can use chunk calculation, changing [bszseq_len, voc] to [chunk_size(2048), voc]. 3. During the training phase, `enable_gradient_checkpointing=True` is not applicable to entropy calculation, so add the recomputation function of entropy to reduce the memory peak during the training phase. 4. On `ActorRolloutRefWorker.update_actor` all batch data is moved to the device, but this is unnecessary, `DataParallelPPOActor.update_policy` will move the data to the device for each micro batch. ### Specific Changes > List the specific changes. ### API Add 3 new configurations in actor/ref, 1 new configuration in critic/reward. - actor_rollout_ref.actor.fsdp_config.forward_prefetch: False - actor_rollout_ref.actor.entropy_from_logits_with_chunking: False - actor_rollout_ref.actor.entropy_checkpointing: False - actor_rollout_ref.ref.fsdp_config.forward_prefetch: False - actor_rollout_ref.ref.entropy_from_logits_with_chunking: False - actor_rollout_ref.ref.entropy_checkpointing: False - critic.model.fsdp_config.forward_prefetch: False - reward_model.model.fsdp_config.forward_prefetch: False ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference*: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.	2025-06-11 12:52:19 +08:00
Blue Space	70bd3d3d6b	[feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding (#1834 )	2025-06-07 07:45:50 +08:00
Shawn/Yuxuan Tong	263115cd9d	[dev] fix: note that DP balancing doesn't affect advantage calculation (#1809 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes the comments about DP balancing. btw, it adds the DP balancing option in the PRIME trainer, while keeping the default value as `False`. ### Additional Info. - Issue Number: #1718 - Training: none - Inference: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.	2025-06-03 10:20:54 +08:00
H	2aed8d0a45	[BREAKING] config: set the default value of actor.entropy_coeff to 0 (#1770 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? entropy_coeff shall be set carefully during RL. When enabled, inappropriate coefficient may case training to collapse. You can see more empirical experiments from Skywork Open Reasoner 1 Technical Report (https://arxiv.org/pdf/2505.22312). In this PR, the default value of entropy_coeff is set to 0. This is a breaking change that may affect your experiment, although majority of verl example scripts set it to 0 manually already. We let most example script just pick up the default value of 0 for entropy_coeff. For a few documentation page where the reference model performance and commands are provided, we modify the doc so that the experiment result is consistent with the config setup. ### Usage Example To enable entropy loss coefficient, use ```bash actor_rollout_ref.actor.entropy_coeff=0.001 # or other values ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.	2025-05-30 14:42:53 +08:00
Guangming Sheng	8653b1b200	[misc] feat: support return full prompt with chat template in RLHFDataset (#1567 )	2025-05-19 01:13:21 +08:00
H	c3b20575d2	[util] docs: add docstrings to metric util functions that recipes reuse (#1395 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In `/recipes`, a few functions under `trainer/ppo/metric_utils` are imported and reused. Right now many of them are task dependent and assume specific keys in the input metric dict. To make these functions more robust and backward compatible, a few tests are added. Additionally, one method is moved to verl.utils as a public API due to its general purpose nature. A API doc page is added correspondingly. In order to make it easy for others to customize verl trainers, many more other classes require further documentations, such as: - AdvantageEstimator, RayPPOTrainer, apply_kl_penalty, compute_advantage - from verl.single_controller.ray import RayWorkerGroup - from verl.trainer.ppo.core_algos import agg_loss - from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role, WorkerType - from verl.utils.checkpoint.checkpoint_manager import find_latest_ckpt_path They shall be enhanced in future PRs. ### High-Level Design None ### Specific Changes - added tests - added verl.utils.metric namespace ### API `verl.trainer.ppo.metric_utils.reduce_metrics` changed to `verl.utils.metric.reduce_metrics`. deprecation warnings are added. ### Usage Example None ### Test Added ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. https://github.com/volcengine/verl/issues/1354 - Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-12 08:49:14 +08:00
Shawn/Yuxuan Tong	8e5ad4688a	[Lint] fix: linting errors in all files (#1280 ) This PR enables checking on all files after fixing all the errors: ``` examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120) examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120) examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120) examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120) examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120) examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120) examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120) examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120) recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120) recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120) recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120) recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120) recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120) recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120) recipe/r1/data_process.py:131:9: E722 Do not use bare `except` recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except` scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found scripts/model_merger.py:42:121: E501 Line too long (184 > 120) scripts/model_merger.py:146:13: E722 Do not use bare `except` tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120) tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120) tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20 tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120) tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18 tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120) tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120) tests/sanity/check_license.py:22:1: E402 Module level import not at top of file tests/sanity/check_license.py:23:1: E402 Module level import not at top of file tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/__init__.py:22:1: E402 Module level import not at top of file verl/__init__.py:24:1: E402 Module level import not at top of file verl/__init__.py:25:1: E402 Module level import not at top of file verl/__init__.py:29:1: E402 Module level import not at top of file verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120) verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120) verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120) verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120) verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120) verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120) verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120) verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120) verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120) verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120) verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120) verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120) verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120) verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120) verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120) verl/protocol.py:303:121: E501 Line too long (125 > 120) verl/protocol.py:352:121: E501 Line too long (171 > 120) verl/protocol.py:578:121: E501 Line too long (142 > 120) verl/protocol.py:580:121: E501 Line too long (150 > 120) verl/protocol.py:583:121: E501 Line too long (167 > 120) verl/protocol.py:715:1: E402 Module level import not at top of file verl/protocol.py:725:121: E501 Line too long (121 > 120) verl/protocol.py:766:1: E402 Module level import not at top of file verl/protocol.py:768:1: E402 Module level import not at top of file verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120) verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120) verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+` verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120) verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121 verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26 verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120) verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120) verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120) verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120) verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22 verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27 verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29 verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120) verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120) verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120) verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120) verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82 verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120) verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120) verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120) verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key` verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key` verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120) verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120) verl/utils/debug/profile.py:15:1: I001 [] Import block is un-sorted or un-formatted verl/utils/debug/profile.py:19:15: UP039 [] Unnecessary parentheses after class definition verl/utils/debug/profile.py:50:23: F541 [] f-string without any placeholders verl/utils/debug/profile.py:52:49: F541 [] f-string without any placeholders verl/utils/debug/profile.py:53:47: F541 [] f-string without any placeholders verl/utils/debug/profile.py:54:67: F541 [] f-string without any placeholders verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120) verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120) verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120) verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120) verl/utils/megatron_utils.py:17:1: I001 [] Import block is un-sorted or un-formatted verl/utils/megatron_utils.py:22:20: F401 [] `torch.nn` imported but unused verl/utils/megatron_utils.py:34:38: F401 [] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused verl/utils/megatron_utils.py:332:30: B009 [] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/megatron_utils.py:366:27: B009 [] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access. verl/utils/model.py:464:121: E501 Line too long (124 > 120) verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120) verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120) verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120) verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120) verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except` verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it. verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120) verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120) verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120) verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120) verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120) verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120) verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120) verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120) verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except` verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120) verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l` verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used verl/workers/actor/megatron_actor.py:22:1: I001 [] Import block is un-sorted or un-formatted verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120) verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120) verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120) verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120) verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120) verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120) verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120) verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120) verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120) verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120) verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120) verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120) verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120) verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120) verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120) verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120) verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used Found 305 errors. ``` --------- Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2025-04-27 15:24:30 -07:00
Shawn/Yuxuan Tong	103f90113f	[dev] fix: instructions about merging from before using ruff (#1180 ) Our pre-commit hook and CI action only check the changes for now. In this PR, 1. We apply `ruff check --fix` and `ruff format`. 2. We remove the unnecessary pipeline from the immigration warning, since directly merging without applying `ruff`, which might cause extra conflicts, is the best way to avoid introducing extra file changes.	2025-04-20 13:51:46 -07:00
Shawn/Yuxuan Tong	725c67666f	[ray] fix: ray hang due to num_cpus (#1009 ) Fixing #523 according to https://github.com/volcengine/verl/issues/523#issuecomment-2723652147 Concern: will `num_cpus=1` limit the performance of the cluster scheduler?	2025-04-20 12:50:17 -07:00
SunJin Kim	6d8f2f6ab9	[algo] feat: Add DrGRPO (#990 ) https://github.com/volcengine/verl/issues/742 - Add an option for disabling standard-deviation normalization of advantages in GRPO. - This completes one out of two algorithmic changes made by Dr.GRPO to GRPO, the other one being the removal of sequence-length averaging during loss aggregation.	2025-04-20 08:44:45 -07:00
Shawn/Yuxuan Tong	b00f77d855	[dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010 ) > [!WARNING] > We are [immigrating to `ruff` as the linter and formatter and `pre-commit` as the managing tool](https://github.com/volcengine/verl/pull/1010). > > If your branch is based on a previous commit using `yapf` and `pylint`, simply merging might trigger overwhelming linting errors, while you are only expected to resolve ones in the files related to your PR. > > To resolve this issue, please try the following workaround to only include the files you really changed in the PR: > > 1. In your branch, fix linting and format with `ruff`: `ruff check --fix && ruff-format` > 2. Squash into a single commit in a new branch: `git reset --soft $(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."` > 3. Merge with the latest main: `git merge origin/main` > 4. Force push to your branch: `git push --force` We add the reminder above to the documentation to tell contributors how to avoid overwhelming linting errors. ### Motivation According to dicussion in #896, this PR immigrates from yapf & pylint to ruff based on pre-commit, which allows unified version control and automatic hook on committing. ### Summary The `pre-commit` hook and CI - checks staged / committed files in commits / PR's - checks all files each month (This should fail before we fix all the files by the ruff standard) ### Explanation for the Failing CI Workflow `pre-commit` For now, we only apply `ruff format` and `ruff check --fix` without resolving all the errors, since there are too many errors to resolve, which causes the CI workflow `pre-commit` fails. For resolving the remaining errors, we leave to future commits. Specifically, the `pre-commit` hook and CI will require every commit to fix its related files with `ruff`, which will fix all the files incrementally. ### Reviewing Suggestion The commit `3d93f51ba8` is huge since we apply `ruff` to all the files. To review the main changes, please check the commits before and after it.	2025-04-18 07:49:31 -07:00
Shawn/Yuxuan Tong	3a27a98647	[recipe] feat: integrate DAPO and provide reproduction script (#623 ) > [!WARNING] > As mentioned in https://github.com/volcengine/verl/pull/623#issuecomment-2733688593, the implementation of gradient accumulation in verl has been only compatible with the sequence-mean loss, but all the DAPO experiments with the token-mean loss were run with the incompatible implementation. > We keep it as is for reproducibility in this branch and will fix it in another PR for the main branch. --------- Co-authored-by: Guangming Sheng <shengguangming@bytedance.com> Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>	2025-04-04 05:46:47 +08:00
Lumeng Wu	072fc9feed	feat: support no reference model; fix KL issues (#644 ) ### Before get started Difference between KL penalty in reward and KL loss > [!TIP] > > 1. In-reward KL penalty > > > $$ > r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log \frac{\pi_{\theta}(o_t \| q, o_{<t})}{\pi_{\text{ref}}(o_t \| q, o_{<t})}} > $$ > > 2. KL Loss > > $$ > L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t, \text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ] > $$ > > $$ > \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} \|\| \pi_{\text{ref}})} > $$ ### Problems 1. The current code doesn't support not using reference model This feature is half-implemented since the very first commit but never completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy` but it's always True since role_worker_mapping always has `Role.RefPolicy`. 2. Restriction of `use_kl_loss` Currently, `use_kl_loss` determines whether to use in-reward kl penalty or kl loss. So we can not use both or neither. `87a813658f/verl/trainer/ppo/ray_trainer.py (L875-L879)` `87a813658f/verl/workers/actor/dp_actor.py (L299-L307)` > [!CAUTION] > > ### You may have unintentionally adopted in-reward KL penalty > > For the experiments you've conducted, if you set `actor.use_kl_loss`=False or didn't set it (Default is False),*You unintentionally used in-reward KL penalty.* If you don't want any KL, you should set `actor_rollout_ref.actor.use_kl_loss=False` and `algorithm.use_kl_in_reward=False` (or not to set them because they are the default config) after this commit. 3. Deprecated config After investigation, I guess Critic may used to be responsible for in-reward KL. But this feature seems paralyzed. 1. Line 290, there may used to be `config.algorithm.kl_ctrl.target_kl` and `config.critic.kl_ctrl.horizon` , which are not supported currently. `3ec83117c3/verl/trainer/ppo/ray_trainer.py (L289-L293)` 2. In `verl/workers/critic/megatron_critic.py` : redundant set of `self.kl_ctrl` `3b18b0eb74/verl/workers/critic/megatron_critic.py (L69-L73)` ### What’s Changed? 1. Add support for not using reference model 2. Fixed the incomplete code of the KL controller. 3. A test case for using both kl terms 4. Some other misc issues in the code. ### How to disable reference model * set `actor_rollout_ref.actor.use_kl_loss=False` and `algorithm.use_kl_in_reward=False` (They are by default False, so you can simply not set them)	2025-04-01 10:14:38 +08:00
Changlong Yu	7fbf609197	[BREAKING config] feat: add mlflow val generation log and uri config (#822 ) ### Changes - Add mlflow validation generation in `ValidationGenerationsLogger` in the form of MLFlow artifact files. - Add the config of `MLFLOW_TRACKING_URI` in mlflow tracking. - rename `val_generations_to_log_to_wandb` to `log_val_generations` ### Test Tested in the self-host mlflow servers.	2025-03-30 08:44:36 -07:00
Blue Space	50cba4aab9	docs: update checkpoint doc (#800 ) Also fix some APIs.	2025-03-28 21:27:01 -07:00
Shawn/Yuxuan Tong	22657bade5	[config] feat: lr_warmup_steps (#564 ) This PR adds the `lr_warmup_steps` configuration. Note the `num_warmup_steps` is prior to `lr_warmup_steps_ratio`.	2025-03-14 16:09:12 +08:00
Guangming Sheng	386cfabed2	[misc] feat: make filter long prompt an option (#506 ) # Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts.	2025-03-07 19:27:25 +08:00
Lumeng Wu	3165d98894	fix: (1) skipped last step (2) redundant validation and logging (#409 ) This PR solves these 2 following problems. 1. Last step skipped `self.global_steps += 1` before if `self.global_steps >= self.total_training_steps` makes the last step skipped. We start from step 1, and we expect `self.total_training_steps` in total. `82b38e25c7/verl/trainer/ppo/ray_trainer.py (L999-L1001)` When `self.global_steps == self.total_training_steps-1`: * we have only executed `self.total_training_steps-1` steps * `self.global_steps` is updated to `self.total_training_steps` * `self.global_steps >= self.total_training_steps` is satisfied, and the training ends. Therefore, we should put `self.global_steps += 1` at last 2. redundant validation and logging If `self.total_training_steps % self.config.trainer.test_freq == 0` : * `self._validate()` will be executed twice 1. `82b38e25c7/verl/trainer/ppo/ray_trainer.py (L984)` 2. `82b38e25c7/verl/trainer/ppo/ray_trainer.py (L1005)` * logging will also be executed twice 1. `82b38e25c7/verl/trainer/ppo/ray_trainer.py (L985)` and `82b38e25c7/verl/trainer/ppo/ray_trainer.py (L997)` 2. `82b38e25c7/verl/trainer/ppo/ray_trainer.py (L1007)`	2025-03-06 23:35:04 +08:00
HL	2440aa6993	apis: add data proto to documentation page. use copy_to_local instead of copy_local_path_from_hdfs (#358 )	2025-02-26 14:16:44 +08:00
_T_L_R_	3b1aef2f5e	[Fix] Using an enumeration class to avoid spelling errors in adv_esti… (#377 ) #369 --------- Co-authored-by: Thom <zhangyi@zhangyideMacBook-Pro.local>	2025-02-25 19:01:34 +08:00
Shawn/Yuxuan Tong	4011f407b0	[Fix] Deprecate `val_batch_size` (#353 ) Validation datasets are sent to inference engines as a whole batch, which will schedule the memory themselves. - [x] Remove `val_batch_size` from examples - [x] Set default values of `val_batch_size` in configs as `null` and add DEPRECATED comments - [x] Add deprecation warnings about `val_batch_size` in `_validate_config`	2025-02-24 10:24:24 +08:00
Guangming Sheng	9db52329f6	[misc] feat: support offload parameter and optimizer during rollout (#284 ) - Fixed FSDP1 model offload - With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and `actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU memory utilization can increase to 0.9 - With actor, critic and reference offload all enabled, there will only be one model copy at a time in the GPU memory. Therefore, we can further increase the `micro_batch_size_per_gpu` or `max_token_per_gpu` Specifically: - During rollout, only rollout model and KVCache are in the GPU memory. - During critic compute values, only the critic model will stay in the GPU memory while its optimizer and other model states are in CPU main memory - During actor update, the actor model, optimizer are stored on GPU while the reference model and critic model, critic optimizer are offloaded to CPU.	2025-02-17 14:07:43 +08:00
zhou fan	c8b9c3559a	fix the split placement example (#281 ) The split placement example is outdated, I tried it and encountered some errors. To address this, the following changes were made in this PR 1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to `examples/split_placement/config/ppo_trainer_split.yaml` 2. Copied `RayPPOTrainer.fit` method into the `fit` func in `examples/split_placement/split_monkey_patch.py` and modified it to get the futures of `critic_output` and `actor_output`	2025-02-16 00:18:34 +08:00
HL	e842b73dae	docs: add programming model guide (#230 )	2025-02-09 19:10:47 +08:00
Guangming Sheng	f2a76acd94	[BREAKING][misc] feat: change micro_batch_size to micro_batch_size_per_gpu (#136 ) ## Summary This PR changes all the micro_batch_size to micro_batch_size_per_gpu. The Core logic of setting batch size: - All algorithmic metrics (train batch size, ppo mini batch size): are global (from the perspective of single-controller), which will be normalized in each Worker. - All performance-related parameters (micro batch size, max token length in dynamic batch size) are local parameters, which represent the data sizes per GPU (i.e., each Worker). ## Main Changes 1. Change the scripts and config and delete the normalization for micro_bsz 2. Fix CI for SFT	2025-01-27 11:26:52 +08:00
Guangming Sheng	2997e5dee8	[rollout] feat: support best-of-n generation in vLLM (#80 )	2025-01-05 21:28:33 +08:00
Guangming Sheng	99d2c19b0e	[misc] feat: remove @ray.remote on workers to allow inheritance (#61 ) Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>	2024-12-21 11:06:11 -07:00
Guangming Sheng	f53597729e	[BREAKING][refact]: move actor/critic/hybrid_engine/reward_model/rollout/workers out of ppo directory for reuse (#52 ) * [refact] move actor/critic/hybrid_engine/reward_model/rollout/workers out of verl/trainer/ppo directory to verl/trainer directory for reusage * [misc] fix import * [refact] move actor/critic and others inside workers * fix uncommit files * fix uncommit files * [refact] move workers out of trainers	2024-12-18 14:34:19 +08:00
HL	d60f843091	[sft] feat: fix sft dataset with latest preprocess code (#49 ) * api: rename tracking logger to wandb logger type * [sft] feat: add tests for sft dataset * refresh dataset * force refresh * use ds model for tokenizer * add option for trainer.val_only * fix path * fix lint * add sft test for cot and raw q&a * add hf_tokenizer api to patch gemma tokenizer * fix test	2024-12-16 16:19:46 -08:00
HL	c2296e80df	api: rename tracking logger to wandb logger type (#47 )	2024-12-14 21:57:55 -08:00
Guangming Sheng	cd6cef609e	[BREAKING][core] move single_controller into verl directory (#45 ) * [BREAKING][core] move single_controller into verl directory * fix blocking flag in fsdp workers	2024-12-11 23:48:24 +08:00
Guangming Sheng	6e8667bd66	[example] add a split placement tutorial (#43 ) * [example] add a split placement tutorial * lint	2024-12-11 22:41:22 +08:00

49 Commits