49 Commits

Author SHA1 Message Date
60d8a62c0d [data, trainer] feat: add support for limiting samples from dataset (#3812)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

e.g.: For RLHFDataset, `filter_overlong_prompts` can be very expensive
and it will be good to add support to limit the sample size before we do
this when the dataset is very large.

Also add support for other kinds of datasets for unification.

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-10-20 20:28:48 +08:00
65eb019a81 [trainer] fix: Add data.seed to config (#3815) 2025-10-20 09:57:14 +08:00
ae5d8504d4 [trainer] feat: ReMax support using reward model for baseline (#3780)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Not only limited to reward functions, we should also support using rm to
calculate the reward baseline.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-17 12:07:05 +08:00
7f27789961 [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739)
### What does this PR do?

> Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to
align with Hugging Face Trainer API。

The following pull request is for refactoring the optimizer, however,
the naming issue persists.
https://github.com/volcengine/verl/pull/3656 
### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weiqi.li <weiqi.li@bytedance.com>
2025-10-13 15:58:59 +08:00
f346f96d29 [training_utils] fix: stop using math naming under reward score" (#3378) 2025-09-07 09:57:24 +08:00
6469be213e [recipe] fix: make compute of step consistent across all trainers (#3132)
### What does this PR do?
follow-up to #3117
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-20 09:54:29 +08:00
2bbd09245c [ray] feat: add support for ray init kwargs (#3049)
### What does this PR do?

This PR adds support for passing parameters to `ray.init`.
Users can now dynamically configure settings such as `address`, `port`,
`_temp_dir`, and more based on their specific needs.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

```bash
# when /tmp/ray/ is used by others
# when ray is initialized at 6379 by others
# when the dashboard is not accessible at localhost
# ...
bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh \
    +ray_kwargs.ray_init._temp_dir=/tmp/ray/my_dir \
    +ray_kwargs.ray_init.address=127.0.0.1:6378 \
    +ray_kwargs.ray_init.dashboard_host=0.0.0.0
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-15 20:02:56 +08:00
c0f99f3da2 [BREAKING] [ray, megatron] feat: remove RayMegatronWorker (#2895)
### What does this PR do?

- Following https://github.com/volcengine/verl/pull/2893, we can now
directly register dispatch and collect function inside the worker. So,
there is no need to maintain RayMegatronWorker and
RayMegatronWorkerGroup, which is a hacking solution

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-05 11:05:38 +08:00
d640f99219 [recipe] fix: fix issue when running split ppo (#2745) 2025-07-29 07:32:59 +08:00
7459131411 [hardware] refactor: replace device_name with config.trainer.device (#2542)
### What does this PR do?

In some methods, the get_device() method is redundant, and we plan to
replace get_deivce with config.trainer.device

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-17 13:29:01 -07:00
a31a8f251f [doc] fix: quickstart example can't work on zsh (#2509)
### What does this PR do?

I followed the instructions at
https://verl.readthedocs.io/en/latest/start/quickstart.html to run the
PPO example on my devbox, which uses zsh. However, I got the error zsh:
no matches found: `trainer.logger=[console]` because `[]` is interpreted
as a glob pattern in zsh.

```
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
 data.train_files=$HOME/data/gsm8k/train.parquet \
 data.val_files=$HOME/data/gsm8k/test.parquet \
 data.train_batch_size=256 \
 data.max_prompt_length=512 \
 data.max_response_length=256 \
 actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 actor_rollout_ref.actor.optim.lr=1e-6 \
 actor_rollout_ref.actor.ppo_mini_batch_size=64 \
 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
 actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
 actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
 critic.optim.lr=1e-5 \
 critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 critic.ppo_micro_batch_size_per_gpu=4 \
 algorithm.kl_ctrl.kl_coef=0.001 \
 trainer.logger=['console'] \
 trainer.val_before_train=False \
 trainer.n_gpus_per_node=1 \
 trainer.nnodes=1 \
 trainer.save_freq=10 \
 trainer.test_freq=10 \
 trainer.total_epochs=15 2>&1 | tee verl_demo.log
zsh: no matches found: trainer.logger=[console]
```

This PR has 3 changes:
* `trainer.logger=['console']` -> `trainer.logger=console`
* `trainer.logger=['console','wandb']` ->
`trainer.logger='["console","wandb"]'`
* `trainer.logger=['console','tensorboard']` ->
`trainer.logger='["console","tensorboard"]'`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* `trainer.logger=console` (zsh)
<img width="898" height="564" alt="image"
src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80"
/>

* ``trainer.logger='["console","wandb"]'`` (zsh)
<img width="870" height="565" alt="image"
src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1"
/>

* `trainer.logger=console` (bash)
  ```bash
ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m
verl.trainer.main_ppo \
  >  data.train_files=$HOME/data/gsm8k/train.parquet \
  >  data.val_files=$HOME/data/gsm8k/test.parquet \
  >  data.train_batch_size=256 \
  >  data.max_prompt_length=512 \
  >  data.max_response_length=256 \
  >  actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  actor_rollout_ref.actor.optim.lr=1e-6 \
  >  actor_rollout_ref.actor.ppo_mini_batch_size=64 \
  >  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
  >  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
  >  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  >  actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
  >  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
  >  critic.optim.lr=1e-5 \
  >  critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  critic.ppo_micro_batch_size_per_gpu=4 \
  >  algorithm.kl_ctrl.kl_coef=0.001 \
  >  trainer.logger=console \
  >  trainer.val_before_train=False \
  >  trainer.n_gpus_per_node=1 \
  >  trainer.nnodes=1 \
  >  trainer.save_freq=10 \
  >  trainer.test_freq=10 \
  >  trainer.total_epochs=15 2>&1 | tee verl_demo.log
2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray
instance. View the dashboard at 127.0.0.1:8265
(TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID:
1799248
(TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint':
{'load_contents': ['model',
(TaskRunner pid=1799248) 'optimizer',
(TaskRunner pid=1799248) 'extra'],
(TaskRunner pid=1799248) 'save_contents': ['model',
(TaskRunner pid=1799248) 'optimizer',
(TaskRunner pid=1799248) 'extra']},
  ```

* `trainer.logger='["console","wandb"]'` (bash)
  ```bash
ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m
verl.trainer.main_ppo \
  >  data.train_files=$HOME/data/gsm8k/train.parquet \
  >  data.val_files=$HOME/data/gsm8k/test.parquet \
  >  data.train_batch_size=256 \
  >  data.max_prompt_length=512 \
  >  data.max_response_length=256 \
  >  actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  actor_rollout_ref.actor.optim.lr=1e-6 \
  >  actor_rollout_ref.actor.ppo_mini_batch_size=64 \
  >  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
  >  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
  >  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  >  actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
  >  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
  >  critic.optim.lr=1e-5 \
  >  critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  critic.ppo_micro_batch_size_per_gpu=4 \
  >  algorithm.kl_ctrl.kl_coef=0.001 \
  >  trainer.logger='["console","wandb"]' \
  >  trainer.val_before_train=False \
  >  trainer.n_gpus_per_node=1 \
  >  trainer.nnodes=1 \
  >  trainer.save_freq=10 \
  >  trainer.test_freq=10 \
  >  trainer.total_epochs=15 2>&1 | tee verl_demo.log
2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray
instance. View the dashboard at 127.0.0.1:8265
(TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID:
1805000
(TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint':
{'load_contents': ['model',
(TaskRunner pid=1805000) 'optimizer',
(TaskRunner pid=1805000) 'extra'],
(TaskRunner pid=1805000) 'save_contents': ['model',
(TaskRunner pid=1805000) 'optimizer',
(TaskRunner pid=1805000) 'extra']},
  ```

### API and Usage Example

No

### Design & Code Changes

No

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2025-07-14 13:26:32 +08:00
4aa02fe166 [trainer] fix: Allow FSDP2 when doing strategy check (#2497)
### What does this PR do?

Allow FSDP2 when doing strategy check

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

For `strategy` field, now both "fsdp" and "fsdp2" are considered valid.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-12 16:31:31 -07:00
H
00a10a8ef3 [ci] refactor: reduce ruff line-length from 300 to 120 (#2287)
### What does this PR do?

Previously the ruff line-len is too large, making it hard for users to
view code. If we keep the config, manually created short lines will be
formatted to long lines as well. This PR contains 3 commits:
- df4bbfca62f41d972c48c8a76088ae2ac29691cf set line len to 120 and run
pre-commit auto-format
- 9d03f183edd9fff4e22215cacacf62c06b7b41d3 let devin fix the multi-line
code
- 9fc8d436f5007535fad3dc49983b01d0d457be9c skip lint for
test_sglang_async_rollout_sf_tools.py. manually adjust format for
rope_utils.py
- last two commits:
  1. merge with main
2. run lint after merge. add test_sglang_async_rollout_sf_tools.py and
scripts/legacy_model_merger.py to lint.exclude

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR relies on CI for testing.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-01 09:54:40 +08:00
86ef66ebe6 [trainer] fix: fix split placement (#2227) 2025-06-29 12:42:51 -07:00
ff750e2472 [trainer] fix: indentation error leading to critic_output.get() failure (#2143)
### What does this PR do?

This PR addresses an `IndentationError` that was causing the
`critic_output.get()` call to fail when `self.use_critic` was false.

### Checklist Before Starting

- [x] Search for similar PRs. [The PR cause the
problem](https://github.com/volcengine/verl/pull/281)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> None. This is just a simple bug fix involving a few lines of code.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> This is just a simple bug fix involving a few lines of code.

### Specific Changes

> This is just a simple bug fix involving a few lines of code.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 14:50:09 -07:00
68d62518ce [misc] fix: fix timer importance error in split_placement (#2169)
### What does this PR do?

fix timer importance error in split_placement, should use `from
verl.trainer.ppo.ray_trainer import marked_timer`, but got `from
verl.trainer.ppo.ray_trainer import _timer` now.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

Not related.

### API and Usage Example

Not related.

### High-Level Design

Not related.

### Specific Changes

fix timer importance error in split_placement, should use `from
verl.trainer.ppo.ray_trainer import marked_timer`, but got `import
_timer` now.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-24 17:06:12 +08:00
ca65c363fb [hardware] refactor: refactor part of device management (#1974)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [x] In format of: [modules] type: Title
- [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [x] type is in `feat, fix, refactor, chore`
- [x] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Refactor device management such as `torch.cuda` and `nccl` in most part
of code in `verl/recipe` and `verl/verl`, which is more convinent for
supporting other devices or platforms.

### Test

Not related.

### High-Level Design

Not related.

### Specific Changes

1. use `get_torch_device()` to get corresponding `torch.device()` object
based on specific device.
2. use `get_device_id()` to get corresponding device rank index based on
specific device.
3. use `get_nccl_backend()` to get corresponding nccl backend based on
specific device.

### API

Not related.

### Usage Example

Monifications in this PR should not be perceived.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-14 20:53:47 +08:00
0bd03d7c05 [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy (#1927)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Add fsdp1 forward pefetch configuration.
2. Add chunk entropy computation.
3. Add torch.checkpoint to entropy computation.
4. Move data to device from `ActorRolloutRefWorker.update_actor` to
`DataParallelPPOActor.update_policy`.
5. Add `npu_cross_entropy_loss` fusion kernel.

### High-Level Design

1. More detail see [FSDP
forward_pefetch](https://docs.pytorch.org/docs/stable/fsdp.html#module-torch.distributed.fsdp)
2. `logits` usually is a large tensor [bsz\*seq_len, voc], on
`compute_entropy_from_logits` will use [bsz\*seq_len, voc] * (4(float32)
+ 2(autocast of softmax+logsumexp) + 1(output of softmax)) memory. To
reduce this memory peak, we can use chunk calculation, changing
[bsz*seq_len, voc] to [chunk_size(2048), voc].
3. During the training phase, `enable_gradient_checkpointing=True` is
not applicable to entropy calculation, so add the recomputation function
of entropy to reduce the memory peak during the training phase.
4. On `ActorRolloutRefWorker.update_actor` all batch data is moved to
the device, but this is unnecessary,
`DataParallelPPOActor.update_policy` will move the data to the device
for each micro batch.


### Specific Changes

> List the specific changes.

### API

Add 3 new configurations in actor/ref, 1 new configuration in
critic/reward.

- actor_rollout_ref.actor.fsdp_config.forward_prefetch: False
- actor_rollout_ref.actor.entropy_from_logits_with_chunking: False
- actor_rollout_ref.actor.entropy_checkpointing: False
- actor_rollout_ref.ref.fsdp_config.forward_prefetch: False
- actor_rollout_ref.ref.entropy_from_logits_with_chunking: False
- actor_rollout_ref.ref.entropy_checkpointing: False
- critic.model.fsdp_config.forward_prefetch: False
- reward_model.model.fsdp_config.forward_prefetch: False


### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-11 12:52:19 +08:00
70bd3d3d6b [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding (#1834) 2025-06-07 07:45:50 +08:00
263115cd9d [dev] fix: note that DP balancing doesn't affect advantage calculation (#1809)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes the comments about DP balancing.

btw, it adds the DP balancing option in the PRIME trainer, while keeping
the default value as `False`.

### Additional Info.

- **Issue Number**: #1718 
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-06-03 10:20:54 +08:00
H
2aed8d0a45 [BREAKING] config: set the default value of actor.entropy_coeff to 0 (#1770)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

entropy_coeff shall be set carefully during RL. When enabled,
inappropriate coefficient may case training to collapse. You can see
more empirical experiments from Skywork Open Reasoner 1 Technical Report
(https://arxiv.org/pdf/2505.22312).

In this PR, the default value of entropy_coeff is set to 0. This is a
breaking change that may affect your experiment, although majority of
verl example scripts set it to 0 manually already.

We let most example script just pick up the default value of 0 for
entropy_coeff. For a few documentation page where the reference model
performance and commands are provided, we modify the doc so that the
experiment result is consistent with the config setup.

### Usage Example

To enable entropy loss coefficient, use 
```bash
actor_rollout_ref.actor.entropy_coeff=0.001 # or other values
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
2025-05-30 14:42:53 +08:00
8653b1b200 [misc] feat: support return full prompt with chat template in RLHFDataset (#1567) 2025-05-19 01:13:21 +08:00
H
c3b20575d2 [util] docs: add docstrings to metric util functions that recipes reuse (#1395)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In `/recipes`, a few functions under `trainer/ppo/metric_utils` are
imported and reused. Right now many of them are task dependent and
assume specific keys in the input metric dict.

To make these functions more robust and backward compatible, a few tests
are added. Additionally, one method is moved to verl.utils as a public
API due to its general purpose nature. A API doc page is added
correspondingly.

In order to make it easy for others to customize verl trainers, many
more other classes require further documentations, such as:
- AdvantageEstimator, RayPPOTrainer, apply_kl_penalty, compute_advantage
- from verl.single_controller.ray import RayWorkerGroup
- from verl.trainer.ppo.core_algos import agg_loss
- from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role,
WorkerType
- from verl.utils.checkpoint.checkpoint_manager import
find_latest_ckpt_path

They shall be enhanced in future PRs. 

### High-Level Design

None

### Specific Changes

- added tests
- added verl.utils.metric namespace

### API

`verl.trainer.ppo.metric_utils.reduce_metrics` changed to
`verl.utils.metric.reduce_metrics`. deprecation warnings are added.

### Usage Example

None

### Test

Added

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
https://github.com/volcengine/verl/issues/1354
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-12 08:49:14 +08:00
8e5ad4688a [Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors:

```
examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120)
examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120)
examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120)
examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120)
examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120)
examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120)
examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120)
examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell
recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120)
recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120)
recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120)
recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120)
recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120)
recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120)
recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120)
recipe/r1/data_process.py:131:9: E722 Do not use bare `except`
recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except`
scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used
scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found
scripts/model_merger.py:42:121: E501 Line too long (184 > 120)
scripts/model_merger.py:146:13: E722 Do not use bare `except`
tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120)
tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used
tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used
tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used
tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used
tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120)
tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20
tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120)
tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18
tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120)
tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120)
tests/sanity/check_license.py:22:1: E402 Module level import not at top of file
tests/sanity/check_license.py:23:1: E402 Module level import not at top of file
tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/__init__.py:22:1: E402 Module level import not at top of file
verl/__init__.py:24:1: E402 Module level import not at top of file
verl/__init__.py:25:1: E402 Module level import not at top of file
verl/__init__.py:29:1: E402 Module level import not at top of file
verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120)
verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120)
verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file
verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120)
verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120)
verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120)
verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120)
verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120)
verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120)
verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120)
verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120)
verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120)
verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120)
verl/protocol.py:303:121: E501 Line too long (125 > 120)
verl/protocol.py:352:121: E501 Line too long (171 > 120)
verl/protocol.py:578:121: E501 Line too long (142 > 120)
verl/protocol.py:580:121: E501 Line too long (150 > 120)
verl/protocol.py:583:121: E501 Line too long (167 > 120)
verl/protocol.py:715:1: E402 Module level import not at top of file
verl/protocol.py:725:121: E501 Line too long (121 > 120)
verl/protocol.py:766:1: E402 Module level import not at top of file
verl/protocol.py:768:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file
verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120)
verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120)
verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+`
verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused
verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused
verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120)
verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120)
verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged
verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120)
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120)
verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120)
verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120)
verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82
verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file
verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found
verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120)
verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120)
verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120)
verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used
verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key`
verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key`
verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120)
verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120)
verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition
verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120)
verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120)
verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string
verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120)
verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string
verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120)
verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused
verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused
verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/model.py:464:121: E501 Line too long (124 > 120)
verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string
verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120)
verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120)
verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused
verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120)
verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120)
verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120)
verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120)
verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120)
verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120)
verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120)
verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120)
verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120)
verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120)
verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file
verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120)
verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l`
verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used
verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file
verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used
verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used
verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120)
verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120)
verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120)
verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120)
verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120)
verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120)
verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120)
verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used
verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used
verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120)
verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used
verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120)
verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used
verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120)
verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120)
verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120)
verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used
verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120)
verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used
Found 305 errors.
```

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-04-27 15:24:30 -07:00
103f90113f [dev] fix: instructions about merging from before using ruff (#1180)
Our pre-commit hook and CI action only check the changes for now.

In this PR,

1. We apply `ruff check --fix` and `ruff format`.
2. We remove the unnecessary pipeline from the immigration warning,
since directly merging without applying `ruff`, which might cause extra
conflicts, is the best way to avoid introducing extra file changes.
2025-04-20 13:51:46 -07:00
725c67666f [ray] fix: ray hang due to num_cpus (#1009)
Fixing #523 according to
https://github.com/volcengine/verl/issues/523#issuecomment-2723652147

Concern: will `num_cpus=1` limit the performance of the cluster
scheduler?
2025-04-20 12:50:17 -07:00
6d8f2f6ab9 [algo] feat: Add DrGRPO (#990)
https://github.com/volcengine/verl/issues/742

- Add an option for disabling standard-deviation normalization of
advantages in GRPO.
- This completes one out of two algorithmic changes made by Dr.GRPO to
GRPO, the other one being the removal of sequence-length averaging
during loss aggregation.
2025-04-20 08:44:45 -07:00
b00f77d855 [dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010)
> [!WARNING]
> We are [immigrating to `ruff` as the linter and formatter and
`pre-commit` as the managing
tool](https://github.com/volcengine/verl/pull/1010).
>
> If your branch is based on a previous commit using `yapf` and
`pylint`, simply merging might trigger overwhelming linting errors,
while **you are only expected to resolve ones in the files related to
your PR**.
>
> To resolve this issue, please try the following workaround to only
include the files you **really changed** in the PR:
>
> 1. In your branch, fix linting and format with `ruff`: `ruff check
--fix && ruff-format`
> 2. Squash into a single commit in a new branch: `git reset --soft
$(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."`
> 3. Merge with the latest main: `git merge origin/main`
> 4. Force push to your branch: `git push --force`

We add the reminder above to the documentation to tell contributors how
to avoid overwhelming linting errors.

### Motivation

According to dicussion in #896, this PR immigrates from yapf & pylint to
ruff based on pre-commit, which allows unified version control and
automatic hook on committing.

### Summary

The `pre-commit` hook and CI

- checks staged / committed files in commits / PR's
- checks all files each month (This should fail before we fix all the
files by the ruff standard)

### Explanation for the Failing CI Workflow `pre-commit`

For now, we only apply `ruff format` and `ruff check --fix` **without
resolving all the errors**, since there are too many errors to resolve,
which causes the CI workflow `pre-commit` fails.

For resolving the remaining errors, we leave to future commits.
Specifically, the `pre-commit` hook and CI will require every commit to
fix its related files with `ruff`, which will fix all the files
incrementally.

### Reviewing Suggestion

The commit
3d93f51ba8
is huge since we apply `ruff` to all the files. To review the main
changes, please check the commits before and after it.
2025-04-18 07:49:31 -07:00
3a27a98647 [recipe] feat: integrate DAPO and provide reproduction script (#623)
> [!WARNING]
> As mentioned in
https://github.com/volcengine/verl/pull/623#issuecomment-2733688593, the
implementation of gradient accumulation in verl has been only compatible
with the sequence-mean loss, but all the DAPO experiments with the
token-mean loss were run with the incompatible implementation.
> **We keep it as is for reproducibility in this branch** and will fix
it in another PR for the main branch.

---------

Co-authored-by: Guangming Sheng <shengguangming@bytedance.com>
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
2025-04-04 05:46:47 +08:00
072fc9feed feat: support no reference model; fix KL issues (#644)
### Before get started

Difference between KL penalty in reward and KL loss

>  [!TIP]
>
>  1. In-reward KL penalty
>
>
>  $$
> r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log
\frac{\pi_{\theta}(o_t | q, o_{<t})}{\pi_{\text{ref}}(o_t | q, o_{<t})}}
>  $$
>
>  2. KL Loss
>
>  $$
> L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t,
\text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ]
>  $$
>
>  $$
>  \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} || \pi_{\text{ref}})}
>  $$

### Problems

1. The current code doesn't support not using reference model

This feature is half-implemented since the very first commit but never
completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy`
but it's always True since role_worker_mapping always has
`Role.RefPolicy`.

2. Restriction of `use_kl_loss` 

Currently, `use_kl_loss` determines whether to use in-reward kl penalty
or kl loss. So we can not use **both or neither**.


87a813658f/verl/trainer/ppo/ray_trainer.py (L875-L879)


87a813658f/verl/workers/actor/dp_actor.py (L299-L307)

>  [!CAUTION]  
>
>  ### You may have unintentionally adopted in-reward KL penalty
>
> For the experiments you've conducted, if you set
`actor.use_kl_loss`=False or didn't set it (Default is False),***You
unintentionally used in-reward KL penalty.*** If you don't want any KL,
you should set `actor_rollout_ref.actor.use_kl_loss=False` and
`algorithm.use_kl_in_reward=False` (or not to set them because they are
the default config) after this commit.

3. Deprecated config

After investigation, I guess Critic may used to be responsible for
in-reward KL. But this feature seems paralyzed.

1. Line 290, there may used to be `config.algorithm.kl_ctrl.target_kl`
and `config.critic.kl_ctrl.horizon` , which are not supported currently.


3ec83117c3/verl/trainer/ppo/ray_trainer.py (L289-L293)

2. In `verl/workers/critic/megatron_critic.py` : redundant set of
`self.kl_ctrl`


3b18b0eb74/verl/workers/critic/megatron_critic.py (L69-L73)


### What’s Changed?

1. Add support for not using reference model
2. Fixed the incomplete code of the KL controller.
3. A test case for using both kl terms
4. Some other misc issues in the code.

### How to disable reference model

* set `actor_rollout_ref.actor.use_kl_loss=False` and
`algorithm.use_kl_in_reward=False` (They are by default False, so you
can simply not set them)
2025-04-01 10:14:38 +08:00
7fbf609197 [BREAKING config] feat: add mlflow val generation log and uri config (#822)
### Changes

- Add mlflow validation generation in `ValidationGenerationsLogger` in
the form of MLFlow artifact files.
- Add the config of `MLFLOW_TRACKING_URI` in mlflow tracking. 
- rename `val_generations_to_log_to_wandb` to `log_val_generations`

### Test
Tested in the self-host mlflow servers.
2025-03-30 08:44:36 -07:00
50cba4aab9 docs: update checkpoint doc (#800)
Also fix some APIs.
2025-03-28 21:27:01 -07:00
22657bade5 [config] feat: lr_warmup_steps (#564)
This PR adds the `lr_warmup_steps` configuration.

Note the `num_warmup_steps` is prior to `lr_warmup_steps_ratio`.
2025-03-14 16:09:12 +08:00
386cfabed2 [misc] feat: make filter long prompt an option (#506)
# Background

In RLHFDataset, we filter out prompts that are too long. This requires
apply_chat_template to the whole dataset, which is not scalable when the
dataset is large.
https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132

Instead of performing filtering online, we probably want to move this
process offline and add an assertion to avoid truncation or simply
perform truncation

Reference: #502 

# Key Changes

- Add an option `data.filter_overlong_prompts=True \` to enable the
above data filtering. The default value is set to False, but we enable
it for all the example scripts.
- Add an option `data.truncation` to truncate the input_ids or prompt
length if they
exceed max_prompt_length. The default is 'error', which does not allow
the
max_prompt_length to be exceeded. The users should increase the
max_prompt_length if
  throwing the error. You can also set `left` and `right`.

### Suggestion for large-scale dataset.
For large-scale datasets, filtering overlong prompts could be
time-consuming. You should set `data.filtering_overlong_prompts=False`
and set `truncation='left'`. Also, please note that you should increase
`data.max_prompt_length` to avoid over-truncation of the prompts.
2025-03-07 19:27:25 +08:00
3165d98894 fix: (1) skipped last step (2) redundant validation and logging (#409)
This PR solves these 2 following problems.

1. Last step skipped

`self.global_steps += 1` before if `self.global_steps >=
self.total_training_steps` makes the last step skipped.

We start from step 1, and we expect `self.total_training_steps` in
total.


82b38e25c7/verl/trainer/ppo/ray_trainer.py (L999-L1001)

   When `self.global_steps == self.total_training_steps-1`:

   * we have only executed `self.total_training_steps-1` steps

   * `self.global_steps` is updated to `self.total_training_steps`
* `self.global_steps >= self.total_training_steps` is satisfied, and the
training ends.

   Therefore, we should put `self.global_steps += 1` at last

2. redundant validation and logging

If `self.total_training_steps % self.config.trainer.test_freq == 0` :

   * `self._validate()` will be executed twice 

1.
82b38e25c7/verl/trainer/ppo/ray_trainer.py (L984)

2.
82b38e25c7/verl/trainer/ppo/ray_trainer.py (L1005)

   * logging will also be executed twice

1.
82b38e25c7/verl/trainer/ppo/ray_trainer.py (L985)
and
82b38e25c7/verl/trainer/ppo/ray_trainer.py (L997)
2.
82b38e25c7/verl/trainer/ppo/ray_trainer.py (L1007)
2025-03-06 23:35:04 +08:00
HL
2440aa6993 apis: add data proto to documentation page. use copy_to_local instead of copy_local_path_from_hdfs (#358) 2025-02-26 14:16:44 +08:00
3b1aef2f5e [Fix] Using an enumeration class to avoid spelling errors in adv_esti… (#377)
#369

---------

Co-authored-by: Thom <zhangyi@zhangyideMacBook-Pro.local>
2025-02-25 19:01:34 +08:00
4011f407b0 [Fix] Deprecate val_batch_size (#353)
Validation datasets are sent to inference engines as a whole batch,
which will schedule the memory themselves.

- [x] Remove `val_batch_size` from examples
- [x] Set default values of `val_batch_size` in configs as `null` and
add DEPRECATED comments
- [x] Add deprecation warnings about `val_batch_size` in
`_validate_config`
2025-02-24 10:24:24 +08:00
9db52329f6 [misc] feat: support offload parameter and optimizer during rollout (#284)
- Fixed FSDP1 model offload
- With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and
`actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU
memory utilization can increase to 0.9
- With actor, critic and reference offload all enabled, there will only
be one model copy at a time in the GPU memory. Therefore, we can further
increase the `micro_batch_size_per_gpu` or `max_token_per_gpu`

**Specifically:**
- During rollout, only rollout model and KVCache are in the GPU memory.
- During critic compute values, only the critic model will stay in the
GPU memory while its optimizer and other model states are in CPU main
memory
- During actor update, the actor model, optimizer are stored on GPU
while the reference model and critic model, critic optimizer are
offloaded to CPU.
2025-02-17 14:07:43 +08:00
c8b9c3559a fix the split placement example (#281)
The split placement example is outdated, I tried it and encountered some
errors. To address this, the following changes were made in this PR
1. Copied the content from `verl/trainer/config/ppo_trainer.yaml` to
`examples/split_placement/config/ppo_trainer_split.yaml`
2. Copied `RayPPOTrainer.fit` method into the `fit` func in
`examples/split_placement/split_monkey_patch.py` and modified it to get
the futures of `critic_output` and `actor_output`
2025-02-16 00:18:34 +08:00
HL
e842b73dae docs: add programming model guide (#230) 2025-02-09 19:10:47 +08:00
f2a76acd94 [BREAKING][misc] feat: change micro_batch_size to micro_batch_size_per_gpu (#136)
## Summary

This PR changes all the micro_batch_size to micro_batch_size_per_gpu.

**The Core logic of setting batch size:**
- **All algorithmic metrics** (train batch size, ppo mini batch size):
are global (from the perspective of single-controller), which will be
normalized in each Worker.
- **All performance-related parameters** (micro batch size, max token
length in dynamic batch size) are local parameters, which represent the
data sizes per GPU (i.e., each Worker).

## Main Changes

1. Change the scripts and config and delete the normalization for
micro_bsz
2. Fix CI for SFT
2025-01-27 11:26:52 +08:00
2997e5dee8 [rollout] feat: support best-of-n generation in vLLM (#80) 2025-01-05 21:28:33 +08:00
99d2c19b0e [misc] feat: remove @ray.remote on workers to allow inheritance (#61)
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2024-12-21 11:06:11 -07:00
f53597729e [BREAKING][refact]: move actor/critic/hybrid_engine/reward_model/rollout/workers out of ppo directory for reuse (#52)
* [refact] move actor/critic/hybrid_engine/reward_model/rollout/workers out of verl/trainer/ppo directory to verl/trainer directory for reusage

* [misc] fix import

* [refact] move actor/critic and others inside workers

* fix uncommit files

* fix uncommit files

* [refact] move workers out of trainers
2024-12-18 14:34:19 +08:00
HL
d60f843091 [sft] feat: fix sft dataset with latest preprocess code (#49)
* api: rename tracking logger to wandb logger type

* [sft] feat: add tests for sft dataset

* refresh dataset

* force refresh

* use ds model for tokenizer

* add option for trainer.val_only

* fix path

* fix lint

* add sft test for cot and raw q&a

* add hf_tokenizer api to patch gemma tokenizer

* fix test
2024-12-16 16:19:46 -08:00
HL
c2296e80df api: rename tracking logger to wandb logger type (#47) 2024-12-14 21:57:55 -08:00
cd6cef609e [BREAKING][core] move single_controller into verl directory (#45)
* [BREAKING][core] move single_controller into verl directory

* fix blocking flag in fsdp workers
2024-12-11 23:48:24 +08:00
6e8667bd66 [example] add a split placement tutorial (#43)
* [example] add a split placement tutorial

* lint
2024-12-11 22:41:22 +08:00