59 Commits

Author SHA1 Message Date
65eb019a81 [trainer] fix: Add data.seed to config (#3815) 2025-10-20 09:57:14 +08:00
ae5d8504d4 [trainer] feat: ReMax support using reward model for baseline (#3780)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Not only limited to reward functions, we should also support using rm to
calculate the reward baseline.

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-17 12:07:05 +08:00
a80ed95e70 [trainer] fix: batch size mismatch with n>1 when gen_max for ReMax (#3779)
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

Resolves #3408

We should not repeat directly on top of `gen_batch`, as in `gen_max`, we
need the original `gen_batch` so
that we can disable `do_sample` for rollout to calculate the reward
baseline.

```log
  File "verl/trainer/main_ppo.py", line 317, in run
    trainer.fit()
  File "verl/trainer/ppo/ray_trainer.py", line 1065, in fit
    batch = batch.union(gen_baseline_output)
  File "verl/protocol.py", line 802, in union
    self.batch = union_tensor_dict(self.batch, other.batch)
  File "verl/protocol.py", line 110, in union_tensor_dict
    assert tensor_dict1.batch_size == tensor_dict2.batch_size, (
AssertionError: Two tensor dict must have identical batch size. Got torch.Size([4096]) and torch.Size([16384])
```

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-17 10:05:12 +08:00
7f27789961 [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739)
### What does this PR do?

> Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to
align with Hugging Face Trainer API。

The following pull request is for refactoring the optimizer, however,
the naming issue persists.
https://github.com/volcengine/verl/pull/3656 
### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weiqi.li <weiqi.li@bytedance.com>
2025-10-13 15:58:59 +08:00
7592d69cbb [trainer] refactor: PPO config validation fast fail (#3187)
### What does this PR do?

Make main ppo script validate config as soon as all needed info is
available. this enables the script to fail as fast as possible in case
of bug in config.
New changes would avoid downloading and loading tokenizer and loading
data before validating config
solve #3182 

### Design & Code Changes

Isolated config validation in utils (out of PpoRayTrainer) and call it
from main_ppo as soon as possible.
2025-08-26 10:31:39 +08:00
2bbd09245c [ray] feat: add support for ray init kwargs (#3049)
### What does this PR do?

This PR adds support for passing parameters to `ray.init`.
Users can now dynamically configure settings such as `address`, `port`,
`_temp_dir`, and more based on their specific needs.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

```bash
# when /tmp/ray/ is used by others
# when ray is initialized at 6379 by others
# when the dashboard is not accessible at localhost
# ...
bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh \
    +ray_kwargs.ray_init._temp_dir=/tmp/ray/my_dir \
    +ray_kwargs.ray_init.address=127.0.0.1:6378 \
    +ray_kwargs.ray_init.dashboard_host=0.0.0.0
```

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
2025-08-15 20:02:56 +08:00
c0f99f3da2 [BREAKING] [ray, megatron] feat: remove RayMegatronWorker (#2895)
### What does this PR do?

- Following https://github.com/volcengine/verl/pull/2893, we can now
directly register dispatch and collect function inside the worker. So,
there is no need to maintain RayMegatronWorker and
RayMegatronWorkerGroup, which is a hacking solution

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-05 11:05:38 +08:00
H
4de3ecf0f0 [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code (#2621)
As initially mentioned in
https://github.com/volcengine/verl/discussions/1941, having structured
configuration classes in verl makes argument passing easier for testing
and validation.

This is an extended thread on the current implementation of
configuration schema in verl. Related PRs:
-  https://github.com/volcengine/verl/pull/2117
- https://github.com/volcengine/verl/pull/2621 

# Motivation 
By moving from loose `omegaconfig.DictConfig`-based parameters to
structured dataclasses, we gain:
- Type safety & IDE support when accessing fields (e.g. cfg.optim.lr).
- Validation hooks via __post_init__ in each class.
- Immutable defaults with controlled mutability (e.g., an extra field).
- Seamless Hydra/OmegaConf integration and easy per-recipe extension.

# Core: BaseConfig

hydra natively provides support for converting DictConfig to dataclass,
but dataclass does not support accessing attribute via `get()`. We
introduce a base class to provide backward compatibility and make the
change less abrupt for existing users.

All config dataclasses inherit from BaseConfig, which:
- Implements collections.abc.Mapping → dict-like iteration/access.
- Freezes attributes once set, unless listed in _mutable_fields.
- Provides an `extra: dict[str, Any]` for unchecked extensions.

```python
@dataclass
class BaseConfig(collections.abc.Mapping):
    """Dict-like, frozen dataclass with opt-in mutability."""
    _mutable_fields: set[str] = {"extra"}
    extra: dict[str, Any] = field(default_factory=dict)

    def __setattr__(self, name: str, value):
        if name in self.__dict__ and name not in self._mutable_fields:
            raise FrozenInstanceError(f"Field '{name}' is frozen")
        super().__setattr__(name, value)

    # Mapping methods: get, __getitem__, __iter__, __len__ …

```

# Example Config Classes (verl/trainer/config)

Each sub-component of the trainer has its own dataclass, inheriting
BaseConfig.
```yaml:
critic:
  checkpoint:
    _target_: verl.trainer.config.CheckpointConfig
    save_contents: ["model","optimizer","extra"]
    load_contents: ["model","optimizer","extra"]
    async_save: false
```
Definition: 
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False

    def __post_init__(self):
        # validation checks go here after initialization


ckpt_cfg = CheckpointConfig(async_save=True)
print(ckpt_cfg.save_contents)
print(ckpt_cfg.get("save_contents", default_value))
print(ckpt_cfg["save_contents"])

# converting hydra-generated omegaconf.DictConfig to the dataclass config:
from verl.utils.config import omegaconf_to_dataclass
ckpt_cfg_from_cli = omegaconf_to_dataclass(config.critic.checkpoint)
```

# Extending existing config classes
Because now configs become structured, unexpected keys would raise
exceptions. To add new keys, there are two ways:
## Explicit class extensions:
```python
from verl.workers.config import FSDPActorConfig

@dataclass
class SPPOActorConfig(FSDPActorConfig):
    """Add SPPO-specific temperature/penalty."""
    sppo_eta: float = 1.0

```
When using yaml or from command line, update the target config class:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    _target_:  recipe.sppo.config.SPPOActorConfig # **new target dataclass required for extension **
    sppo_eta: 1.0  
```
or directly from command line:
```bash
python main_sppo.py \
  actor_rollout_ref.actor._target_=recipe.sppo.config.SPPOActorConfig \
  actor_rollout_ref.actor.sppo_eta=1.0
```

## Leverage the `extra` field
Adding more keys to the `extra` field of any dataclass that inherits
from `BaseConfig` also works. This way there's no need to define your
own dataclass in python:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    extra:
        sppo_eta: 1.0  
```

# Declaring mutable fields
For historical reasons some fields in the configs are mutated inplace in
the codebase such as batch size for data/sequence parallelism. We are in
the process of deprecating this kind of behavior. However, if you want
to intentionally mutate one field, specify it with the `_mutable_fields`
attr:
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    _mutable_fields = BaseConfig._mutable_fields | {"save_contents"} # mark save_contents as mutable.

    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False
```

# Other helpful resources
verl default trainer configs combines the following config files
together, specified in the `_defaults_` field:
https://github.com/volcengine/verl/blob/main/verl/trainer/config/ppo_trainer.yaml#L1-L36
- verl/trainer/config/ppo_trainer.yaml  # main config for entrypoint 
- verl/trainer/config/actor/dp_actor.yaml 
- verl/trainer/config/critic/dp_critic.yaml 
- verl/trainer/config/reward_model/dp_reward_model.yaml 
- verl/trainer/config/rollout/rollout.yaml 

To quickly peek the default full config in a single file, you can check
the auto-generated full config in
https://github.com/volcengine/verl/blob/main/verl/trainer/config/_generated_ppo_trainer.yaml

# Change log and impact on existing code
This PR converts the following fields to structured dataclass in the
training pipeline. More can be done in future PRs (contributions from
the community is welcome)
- [x] actor_rollout_ref.actor
- [x] critic 
- [ ] actor_rollout_ref.rollout
- [ ] actor_rollout_ref.ref
- [ ] reward_model
- [ ] data
- [ ] trainer

Changes needed for existing code that added new fields to config:
- see recipe/sppo for an example 
- `OmegaConf.to_container(self.config.model.get("override_config",
OmegaConf.create()))` now has to manually changed to
`self.config.model.get("override_config", {})`. Because
OmegaConf.to_container expects a DictConfig but
config.model.override_config is already a dict.

# Other Breaking Changes
critic.optim.lr for megatron changed from 1e-6 to 1e-5

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Joel <wuxibin@bytedance.com>
Co-authored-by: Cheetah <1659275352@qq.com>
Co-authored-by: 杨睿 <yangruipis@163.com>
Co-authored-by: X. HU <huxiaobo@zju.edu.cn>
Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com>
Co-authored-by: Ziheng Jiang <ziheng@apache.org>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-23 11:45:14 -07:00
7459131411 [hardware] refactor: replace device_name with config.trainer.device (#2542)
### What does this PR do?

In some methods, the get_device() method is redundant, and we plan to
replace get_deivce with config.trainer.device

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: H <linhaibin.eric@gmail.com>
2025-07-17 13:29:01 -07:00
10f4eb8cfc [misc] chore: fix typo in function name (#2525)
### What does this PR do?

fix typo `gather_outpus_and_unpad` -> `gather_outputs_and_unpad`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: ShareLer <ShareLe@163.com>
2025-07-15 19:06:20 +08:00
a31a8f251f [doc] fix: quickstart example can't work on zsh (#2509)
### What does this PR do?

I followed the instructions at
https://verl.readthedocs.io/en/latest/start/quickstart.html to run the
PPO example on my devbox, which uses zsh. However, I got the error zsh:
no matches found: `trainer.logger=[console]` because `[]` is interpreted
as a glob pattern in zsh.

```
(verl) ➜  verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
 data.train_files=$HOME/data/gsm8k/train.parquet \
 data.val_files=$HOME/data/gsm8k/test.parquet \
 data.train_batch_size=256 \
 data.max_prompt_length=512 \
 data.max_response_length=256 \
 actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 actor_rollout_ref.actor.optim.lr=1e-6 \
 actor_rollout_ref.actor.ppo_mini_batch_size=64 \
 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
 actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
 actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
 critic.optim.lr=1e-5 \
 critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
 critic.ppo_micro_batch_size_per_gpu=4 \
 algorithm.kl_ctrl.kl_coef=0.001 \
 trainer.logger=['console'] \
 trainer.val_before_train=False \
 trainer.n_gpus_per_node=1 \
 trainer.nnodes=1 \
 trainer.save_freq=10 \
 trainer.test_freq=10 \
 trainer.total_epochs=15 2>&1 | tee verl_demo.log
zsh: no matches found: trainer.logger=[console]
```

This PR has 3 changes:
* `trainer.logger=['console']` -> `trainer.logger=console`
* `trainer.logger=['console','wandb']` ->
`trainer.logger='["console","wandb"]'`
* `trainer.logger=['console','tensorboard']` ->
`trainer.logger='["console","tensorboard"]'`

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

* `trainer.logger=console` (zsh)
<img width="898" height="564" alt="image"
src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80"
/>

* ``trainer.logger='["console","wandb"]'`` (zsh)
<img width="870" height="565" alt="image"
src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1"
/>

* `trainer.logger=console` (bash)
  ```bash
ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m
verl.trainer.main_ppo \
  >  data.train_files=$HOME/data/gsm8k/train.parquet \
  >  data.val_files=$HOME/data/gsm8k/test.parquet \
  >  data.train_batch_size=256 \
  >  data.max_prompt_length=512 \
  >  data.max_response_length=256 \
  >  actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  actor_rollout_ref.actor.optim.lr=1e-6 \
  >  actor_rollout_ref.actor.ppo_mini_batch_size=64 \
  >  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
  >  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
  >  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  >  actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
  >  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
  >  critic.optim.lr=1e-5 \
  >  critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  critic.ppo_micro_batch_size_per_gpu=4 \
  >  algorithm.kl_ctrl.kl_coef=0.001 \
  >  trainer.logger=console \
  >  trainer.val_before_train=False \
  >  trainer.n_gpus_per_node=1 \
  >  trainer.nnodes=1 \
  >  trainer.save_freq=10 \
  >  trainer.test_freq=10 \
  >  trainer.total_epochs=15 2>&1 | tee verl_demo.log
2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray
instance. View the dashboard at 127.0.0.1:8265
(TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID:
1799248
(TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint':
{'load_contents': ['model',
(TaskRunner pid=1799248) 'optimizer',
(TaskRunner pid=1799248) 'extra'],
(TaskRunner pid=1799248) 'save_contents': ['model',
(TaskRunner pid=1799248) 'optimizer',
(TaskRunner pid=1799248) 'extra']},
  ```

* `trainer.logger='["console","wandb"]'` (bash)
  ```bash
ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m
verl.trainer.main_ppo \
  >  data.train_files=$HOME/data/gsm8k/train.parquet \
  >  data.val_files=$HOME/data/gsm8k/test.parquet \
  >  data.train_batch_size=256 \
  >  data.max_prompt_length=512 \
  >  data.max_response_length=256 \
  >  actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  actor_rollout_ref.actor.optim.lr=1e-6 \
  >  actor_rollout_ref.actor.ppo_mini_batch_size=64 \
  >  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
  >  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
  >  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  >  actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
  >  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
  >  critic.optim.lr=1e-5 \
  >  critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
  >  critic.ppo_micro_batch_size_per_gpu=4 \
  >  algorithm.kl_ctrl.kl_coef=0.001 \
  >  trainer.logger='["console","wandb"]' \
  >  trainer.val_before_train=False \
  >  trainer.n_gpus_per_node=1 \
  >  trainer.nnodes=1 \
  >  trainer.save_freq=10 \
  >  trainer.test_freq=10 \
  >  trainer.total_epochs=15 2>&1 | tee verl_demo.log
2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray
instance. View the dashboard at 127.0.0.1:8265
(TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID:
1805000
(TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint':
{'load_contents': ['model',
(TaskRunner pid=1805000) 'optimizer',
(TaskRunner pid=1805000) 'extra'],
(TaskRunner pid=1805000) 'save_contents': ['model',
(TaskRunner pid=1805000) 'optimizer',
(TaskRunner pid=1805000) 'extra']},
  ```

### API and Usage Example

No

### Design & Code Changes

No

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
2025-07-14 13:26:32 +08:00
4aa02fe166 [trainer] fix: Allow FSDP2 when doing strategy check (#2497)
### What does this PR do?

Allow FSDP2 when doing strategy check

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

For `strategy` field, now both "fsdp" and "fsdp2" are considered valid.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-12 16:31:31 -07:00
fc35956543 [BREAKING][rollout] feat: repeat DataProto when n>1 in driver instead of rollout workers (#2324)
### What does this PR do?

Before this PR, when `generate_sequences` with sampling param n>1,
DataProto repeat is quit diverge.
- validation: DataProto is repeated by `n` in driver, then chunked and
dispatched to rollout workers.
- training
- batch mode: DataProto is chunked and dispatched to rollout workers,
then repeated in rollout workers
- server mode: DataProto is repeated by `n` in driver, then chunked and
dispatched to rollout workers.

In batch mode, the `chunk-dispatch-repeat` pattern restricts GRPO
training where we have more GPUs than batch_size. For example,
`batch_size=128, n=16, world_size=256`:
- `chunk-dispatch-repeat`: DataProto(batch_size=128) can't be chunked to
256 shards.
- `repeat-chunk-dispatch`: after repeat, DataProto(batch_size=2048) can
be successfully chunked.

After this PR, always repeat DataProto in driver whether it's validate
or training, batch mode or server mode.

> [!IMPORTANT]
> This change breaks almost all recipes and projects using verl GRPO as
submodules.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-07-07 14:57:01 +08:00
H
c936ec7d5c [trainer, cfg] feat: add BaseConfig for all dataclass configs. Introduce dataclass for algorithm related configs (#2147)
### What does this PR do?

This PR introduces a BaseConfig class that bridges dataclass and hydra's
DictConfig in the codebase. In this PR, the algorithm related configs
and profiler related configs are instantiated as dataclass upfront for
both main_ppo and main_dapo. The config related changes are expected to
be backward compatible (supporting xx_config.get() API)

Besides, this PR also moves the profiler related files under
verl.utils.debug to verl.utils.profiler.xx. The
`verl.utils.debug.performance.py` is kept for backward compatibility
purpose and we'll drop it in later versions.

Main principle:
- users are not forced to use dataclass configs. All changes are
backward compatible.
- dataclass configs are converted upfront on a per entrypoint basis.
Here we target main_ppo.py and main_dapo.py, and the other recipes'
entrypoints are left intact.
- the new dataclass are intentionally set to be frozen. Configs should
not be mutable. Whenever a new field is needed, we should make a copy of
the config for a new one.
- whenever a dataclass config is introduced, we encourage having simple
cpu-based unit tests to test the basic functionality of functions that
rely on it (e.g. the grpo adv estimation in core_algorithm.py). and then
also update all type annotation for the impacted functions.
- in the yaml file, `_target_` field should be specified for dataclass
conversion. e.g. `_target_: verl.xxx.XXConfig`

The PR is built on top of @liuzhenhai93 's contribution.

### Checklist Before Describing the Details

- [x] Searched for similar PR(s).
- [x] PR title is in the format of: `[modules] type: Title`
  - modules: `trainer, cfg`
  - type: `feat`

### Test

- Added comprehensive unit tests in
`tests/trainer/config/test_algorithm_config_on_cpu.py`,
`test_base_config_on_cpu.py`
- Tests cover dataclass creation, nested configuration handling,
backward compatibility, and integration with core algorithms
- All tests pass successfully, validating the functionality and
integration with existing code

### High-Level Design

The design introduces three dataclasses:
1. **`KLControlConfig`**: Handles KL control parameters (type, kl_coef,
horizon, target_kl)
2. **`PFPPOConfig`**: Manages preference feedback PPO parameters
(reweight_method, weight_pow)
3. **`AlgorithmConfig`**: Main algorithm configuration containing all
fields from the YAML config

The conversion uses the existing `verl.utils.omega_conf_to_dataclass`
utility to seamlessly convert from OmegaConf DictConfig to typed
dataclasses.


### API and Usage Example

The API maintains backward compatibility while providing type-safe
access:

```python
# Before (DictConfig)
if config.algorithm.use_kl_in_reward:
    kl_penalty = config.algorithm.kl_penalty
    kl_coef = config.algorithm.kl_ctrl.get("kl_coef", 0.001)

# After (Dataclass) - Type-safe with IDE support
algorithm_config = omega_conf_to_dataclass(config.algorithm)
if algorithm_config.use_kl_in_reward:
    kl_penalty = algorithm_config.kl_penalty  # Type-safe access
    kl_coef = algorithm_config.kl_ctrl.kl_coef  # Nested config access

# Backward compatibility maintained
gamma = algorithm_config.get("gamma", 1.0)  # Still works


# other cases
profiler_config = omega_conf_to_dataclass(config)
self.assertEqual(profiler_config.discrete, config.discrete)
self.assertEqual(profiler_config.all_ranks, config.all_ranks)
self.assertEqual(profiler_config.ranks, config.ranks)
assert isinstance(profiler_config, ProfilerConfig)
with self.assertRaises(AttributeError):
    _ = profiler_config.non_existing_key
assert config.get("non_existing_key") == profiler_config.get("non_existing_key")
assert config.get("non_existing_key", 1) == profiler_config.get("non_existing_key", 1)
assert config["discrete"] == profiler_config["discrete"]
from dataclasses import FrozenInstanceError

with self.assertRaises(FrozenInstanceError):
    profiler_config.discrete = False

```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --show-diff-on-failure --color=always --all-files`
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

**Note**: This change is fully backward compatible and does not break
any existing APIs. The dataclass provides the same interface as the
original DictConfig while adding type safety and better structure.

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-04 08:12:09 -07:00
H
00a10a8ef3 [ci] refactor: reduce ruff line-length from 300 to 120 (#2287)
### What does this PR do?

Previously the ruff line-len is too large, making it hard for users to
view code. If we keep the config, manually created short lines will be
formatted to long lines as well. This PR contains 3 commits:
- df4bbfca62f41d972c48c8a76088ae2ac29691cf set line len to 120 and run
pre-commit auto-format
- 9d03f183edd9fff4e22215cacacf62c06b7b41d3 let devin fix the multi-line
code
- 9fc8d436f5007535fad3dc49983b01d0d457be9c skip lint for
test_sglang_async_rollout_sf_tools.py. manually adjust format for
rope_utils.py
- last two commits:
  1. merge with main
2. run lint after merge. add test_sglang_async_rollout_sf_tools.py and
scripts/legacy_model_merger.py to lint.exclude

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

This PR relies on CI for testing.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-07-01 09:54:40 +08:00
H
52065c6405 [BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support (#2257)
### What does this PR do?

This PR removes support for vLLM versions 0.5.4 and 0.6.3 from the verl
repository, completing a comprehensive cleanup of legacy
version-specific code branches. The changes simplify the codebase by
eliminating conditional logic and version-specific implementations,
requiring users to upgrade to vLLM 0.7.0 or later (recommended: vLLM
0.8.3+).

**Key Changes:**
- Deleted legacy rollout implementations (`fire_vllm_rollout.py`,
`vllm_rollout.py`, `test_vllm_hf_loader.py`)
- Removed version-specific directories (`vllm_v_0_5_4`, `vllm_v_0_6_3`) 
- Simplified sharding managers by removing `customized_vllm` flag
conditionals
- Updated configuration files to remove deprecated options
(`use_fire_sampling`)
- Cleaned up documentation and environment variable exports

### Checklist Before Starting

- [x] Search for similar PRs: No similar PRs found for this specific
cleanup
- [x] Format the PR title as `[BREAKING][vllm, rollout, worker]
refactor: Remove vLLM 0.5.4 and 0.6.3 support`
  - Modules: `vllm`, `rollout`, `worker` (primary affected components)
  - Type: `refactor` (code cleanup and simplification)
  - Breaking: Yes, requires vLLM version upgrade

### Test

This PR has been validated through:
- **CI Pipeline**: All existing tests pass with vLLM 0.7.0+ (27 checks
pending/running)
- **Version Detection**: New version check logic properly rejects vLLM
0.5.4/0.6.3 with clear error messages
- **Merge Conflict Resolution**: Successfully resolved complex conflicts
during main branch merge
- **Pre-commit Checks**: All linting and formatting requirements
satisfied

### API and Usage Example

**Breaking Changes:**
- **vLLM Version Requirement**: Minimum supported version is now 0.7.0
(recommended: 0.8.3+)
- **Removed Configuration Options**: `use_fire_sampling` no longer
available in config files
- **Environment Variables**: `VLLM_ATTENTION_BACKEND=XFORMERS` exports
removed (not needed for vLLM 0.7.0+)

**Migration Guide:**
```bash
# Before: vLLM 0.5.4/0.6.3 with custom flags
pip install vllm==0.6.3
export VLLM_ATTENTION_BACKEND=XFORMERS

# After: vLLM 0.8.3+ with V1 API
pip install vllm>=0.8.3
export VLLM_USE_V1=1  # Recommended for optimal performance
```

**Updated Configuration:**
```yaml
# generation.yaml - removed use_fire_sampling option
rollout:
  name: vllm_rollout
  # use_fire_sampling: False  # <- REMOVED
  
# Use standard vLLM rollout without legacy options
```

### High-Level Design

```mermaid
graph TB
    subgraph "Before: Multi-Version Support"
        A1[vLLM Version Check] --> B1{Version 0.5.4?}
        A1 --> B2{Version 0.6.3?}
        A1 --> B3{Version 0.7.0+?}
        B1 --> C1[Legacy vllm_v_0_5_4 Code]
        B2 --> C2[Legacy vllm_v_0_6_3 Code]
        B3 --> C3[Modern vLLM Code]
    end
    
    subgraph "After: Simplified Support"
        A2[vLLM Version Check] --> B4{Version >= 0.7.0?}
        B4 -->|Yes| C4[Modern vLLM Code Only]
        B4 -->|No| C5[Clear Error Message]
    end
```

### Specific Changes

**Deleted Files:**
- `verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py`
- `verl/workers/rollout/vllm_rollout/vllm_rollout.py` 
- `tests/workers/rollout/rollout_vllm/test_vllm_hf_loader.py`
- `verl/third_party/vllm/vllm_v_0_5_4/` (entire directory)
- `verl/third_party/vllm/vllm_v_0_6_3/` (entire directory)
- `pytest.ini`

**Modified Core Files:**
- `verl/third_party/vllm/__init__.py`: Simplified version detection with
clear error messages
- `verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py`: Removed
cache engine management and version conditionals
- `verl/workers/sharding_manager/fsdp_vllm.py`: Dropped
`customized_vllm` flag logic
- `verl/workers/sharding_manager/megatron_vllm.py`: Simplified weight
loading and cache management

**Configuration Updates:**
- `verl/trainer/config/generation.yaml`: Removed `use_fire_sampling`
option
- `verl/trainer/config/ppo_trainer.yaml`: Removed `use_fire_sampling`
option
- `tests/special_sanity/check_api_docs.py`: Removed `LLMEngine` from
whitelist

**Documentation Updates:**
- `docs/start/install.rst`: Updated to recommend vLLM 0.8.3+ with
`VLLM_USE_V1=1`
- `docs/perf/perf_tuning.rst`: Updated performance recommendations
- Removed 42+ `VLLM_ATTENTION_BACKEND=XFORMERS` exports from bash
scripts

**Reverted Changes:**
- `.github/workflows/vllm.yml`: Restored original container image names
- `docs/faq/faq.rst`: Restored original apptainer commands
- `docs/ascend_tutorial/ascend_quick_start.rst`: Reverted all
modifications
- `examples/tuning/*/`: Restored original `nproc_per_gpu` settings

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --all-files --show-diff-on-failure --color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs):
Updated install and performance tuning docs
- [x] Add unit or end-to-end test(s): Existing CI tests validate the
changes; legacy-specific tests were removed as intended
- [x] **CI Request**: Once PR is ready, message will be sent to
`ci-request` channel in verl Slack workspace

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-06-29 19:27:22 -07:00
4f1ece8bed [recipe] fix: parameter order in RayPRIMETrainer super().__init__() call (#2172)
### What does this PR do?
<!-- > Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review. -->

- Fixes incorrect parameter order in `RayPRIMETrainer.__init__()` when
calling `super().__init__()`.
- The missing `processor` parameter was causing all subsequent
positional arguments to be passed to wrong parameters, leading to
`reward_fn` being passed as `processor` and `val_reward_fn` being passed
as `reward_fn`.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+PRIME
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

<!-- > Demonstrate how the API changes if any, and provide usage
example(s) if possible.

```python
# Add code snippet or script demonstrating how to use this
``` -->
- No breaking changes to existing API

### High-Level Design

<!-- > Demonstrate the high-level design if this PR is complex. -->
- Simple parameter alignment fix, no design changes

### Specific Changes

<!-- > List the specific changes. -->
- Added `reward_fn=my_reward_fn` and `val_reward_fn=my_val_reward_fn` to
the `super().__init__()` call in `RayPRIMETrainer.__init__()` to
maintain correct parameter alignment with parent class RayPPOTrainer
- Ensures `reward_fn` and `val_reward_fn` are passed to their intended
parameters instead of being shifted due to missing processor argument

### Checklist Before Submitting

<!-- > [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review. -->

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-26 19:37:36 +08:00
e48292f698 [perf] feat: Add verl profiling support from Nvidia Nsight System (#1820)
Add verl profiling support from Nvidia Nsight System

### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

Add verl profiling support from Nvidia Nsight System

### High-Level Design

This PR add config fileds to trigger Nsight profiling. If
`trainer.profile_steps` is set, Nsight system will be triggered to
profiling the corresponding steps. In each task role, other config
fields control also control the profiling details.

The profiling tasks include the single_controller process and the worker
process. Single_controller process uses the re-designed `marked_timer`
to record each task range in NVTX.

The worker processes dumps the GPU execution details. Since veRL has
hybrid-engine mode and supports split mode, there are two profiling
modes, discrete or not. Discrete mode means each task will generate a
dedicate database; otherwise a whole giant database will be generated.
Nsight system supports to import and align multiple databases
automatically.

### Specific Changes

`verl.utils.debug.profile` add general profling interface and
`verl.utils.debug.nvtx_profile` implements the interface.

### API

`verl.utils.debug.performance._timer` has been changed to
`simple_timer`, and `marked_timer` is added to support profiler range
marker.

`verl.utils.debug.profile` wrappers the basic profiler interfaces,
including mark_*_range, mark_annotate, ProfilerConfig, WorkerProfiler,
and WorkerProfilerExtension. `verl.utils.debug.nvtx_profile` implements
the interfaces when nvtx is available.

### Usage Example

Two examples are added in
`/examples/ppo_trainer/run_deepseek_math_gsm8k_megatron_nsys.sh`
`/examples/ppo_trainer/run_qwen2-7b_rm_seq_balance_nsys.sh`

### Test

There should be no functional changes and performance changes.

### Additional Info.

- **Training**: both FSDP, Megatron will be affected.
- **Inference**: both vLLM, SGLang will be affected.

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if necessary.
2025-06-17 11:05:16 -07:00
ca65c363fb [hardware] refactor: refactor part of device management (#1974)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [x] In format of: [modules] type: Title
- [x] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - [x] type is in `feat, fix, refactor, chore`
- [x] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Refactor device management such as `torch.cuda` and `nccl` in most part
of code in `verl/recipe` and `verl/verl`, which is more convinent for
supporting other devices or platforms.

### Test

Not related.

### High-Level Design

Not related.

### Specific Changes

1. use `get_torch_device()` to get corresponding `torch.device()` object
based on specific device.
2. use `get_device_id()` to get corresponding device rank index based on
specific device.
3. use `get_nccl_backend()` to get corresponding nccl backend based on
specific device.

### API

Not related.

### Usage Example

Monifications in this PR should not be perceived.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
2025-06-14 20:53:47 +08:00
c8908e197c [fsdp] feat: Memory efficient cross entropy with a linear layer fused (#462)
Implemented forward and backward of the following compute logics, which
eliminated many intermediate storage tensors, and resulted in reduced
peak memory usage.

## Equivalent compute logic:
```python
def run_torch_entropy(hidden: torch.Tensor,
                    weight: torch.Tensor,
                    labels: torch.Tensor) -> typing.List[torch.Tensor]:
    logits = torch.matmul(hidden.to(torch.float32), weight.to(torch.float32)) # [num_tokens, vocab_size]
    pd = torch.nn.functional.softmax(logits, dim=-1) # [num_tokens, vocab_size]
    entropy_a = torch.logsumexp(logits, dim=-1) # [num_tokens]
    entropy_b = torch.sum(pd * logits, dim=-1) # [num_tokens]
    entropy = entropy_a - entropy_b
    logprobs = torch.nn.functional.cross_entropy(logits, labels) # [1]
    logprobs = torch.neg(logprobs)
    return logprobs, entropy
```

## API
```python
from verl.utils.kernel import linear_cross_entropy

hidden = torch.randn(num_tokens, hidden_size, dtype=torch.bfloat16, device="cuda")
weight = torch.randn(hidden_size, vocab_size, dtype=torch.bfloat16, device="cuda")
labels = torch.randint(0, vocab_size, (num_tokens,), device="cuda")

loss, entropy = linear_cross_entropy(hidden, weight, labels, reduction="mean")
```

## Storage and latency
<img width="636" alt="image"
src="https://github.com/user-attachments/assets/396b7303-a46a-46b1-a261-917fda034b02"
/>

## Unit test
```shell
$ cd verl/
$ python3 tests/kernel/test_memory_efficient_entropy.py
```

# NOTE
For compatibility, `torch.library.triton_op` was not applied to those
APIs, so that `torch.compile` might not be able to be enabled on top of
it.

---------

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: gaoziyuan.955 <gaoziyuan.955@bytedance.com>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-06-11 19:48:47 +08:00
b4aa2dce8f [fsdp] fix: fsdp entropy metrics (#1943)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - [ ] In format of: [modules] type: Title
- [ ] modules are in `fsdp, megatron, sglang, vllm, rollout, trainer,
tests, training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt`
  - [ ] type is in `feat, fix, doc, refactor, chore`
- [ ] can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp] feat: xxx`

### What does this PR do?

FSDP entropy calculation forgot to revert indices when use dynamic batch
size.
This does not affect training loss or gradient, but rather the metrics
displayed on tensorboard/wandb.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-10 11:28:48 -07:00
70bd3d3d6b [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding (#1834) 2025-06-07 07:45:50 +08:00
H
adf775c43b [logging] misc: update PR template and fix lint (#1806) 2025-06-04 07:53:12 +08:00
263115cd9d [dev] fix: note that DP balancing doesn't affect advantage calculation (#1809)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes the comments about DP balancing.

btw, it adds the DP balancing option in the PRIME trainer, while keeping
the default value as `False`.

### Additional Info.

- **Issue Number**: #1718 
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-06-03 10:20:54 +08:00
7695b8db43 [recipe] prime: Code example for PRIME (#1714)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add running example for PRIME algorithm on coding data of
[Eurus-2-RL-Data](https://huggingface.co/datasets/PRIME-RL/Eurus-2-RL-Data)

### Specific Changes

> Runing example
> Log

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-06-02 19:08:11 -07:00
4779f26164 [Refactor] fused kernel in forward (#1624)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Shifts fused_linear_for_ppo into model.forward for FSDP

### High-Level Design

Self explaining

### Specific Changes

- Update monkey patch to return log_probs and entropy instead of
last_hidden_state.

### API

No changes

### Usage Example

```sh
actor_rollout_ref.model.use_fused_kernels=True
```

### Test


![image](https://github.com/user-attachments/assets/c6af68fb-0200-4aee-9596-0b445afdc562)


### Additional Info.

- This is to fix #1565 
- The original bug arises because we tried to access
model.lm_head.weight from outside of the FSDP wrapped context.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 13:50:57 +08:00
a3c4cb386c Disable fused kernels in prime (#1598)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Currently, the `e2e_prime` test encounters the error` AttributeError:
'NoneType' object has no attribute 'squeeze'`, which is caused by [
#1212].

In PR [#1568], the parameter `use_fused_kernel` in `ppo_trainer.yaml`
was set to `false`, but the corresponding parameter in
`prime_trainer.yaml` was not updated. This is preventing the CI from
passing. Before the root cause of `use_fused_kernel` is fully resolved ,
I guess we should temporarily set `use_fused_kernel` to `false` in
`prime_trainer.yaml`
### High-Level Design

Not needed

### Specific Changes

- Default use_fused_kernels = False

### API

Not needed

### Usage Example

Not needed

### Test

Not needed

### Additional Info.

Not needed

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-20 16:27:33 +08:00
eb077f66e5 Feat/memory optimized loss (#1212)
# What does this PR do?

This PR implements fused losses for alignment. #710
It reduces the memory required for loss calculation to a small constant
amount.

# ChangeLog:

- added the option use_fused_kernels
- monkey patch to make model.forward return last_hidden_state and not
calculate logits
- Added FusedLinearForPPO to verl/utils/experimental/torch_functional.py

# Usage

Simply add the following option
```
actor_rollout_ref.model.use_fused_kernels=True
```

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info:
- The current implementation uses chunking to reduce the memory
consumption to a constant value.
- It works by splitting the loss calculations into chunks of 512 tokens.
Calculating the log_probs / entropy values / gradients for each chunk
and accumulating them.
- However the current implementation can be slow. It processes each
chunk sequentially in a python for loop.
- In the future we should consider converting the fused functions into
triton or some other JIT solution.
- Compared to FusedPPOLossFunction, optimizing hidden_states -> entropy
& log_probs is much better for algorithm developers as the memory heavy
part is optimized away for them and they are free to combine the values
for their own custom loss functions.

---------

Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-05-16 22:52:54 +08:00
H
c3b20575d2 [util] docs: add docstrings to metric util functions that recipes reuse (#1395)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In `/recipes`, a few functions under `trainer/ppo/metric_utils` are
imported and reused. Right now many of them are task dependent and
assume specific keys in the input metric dict.

To make these functions more robust and backward compatible, a few tests
are added. Additionally, one method is moved to verl.utils as a public
API due to its general purpose nature. A API doc page is added
correspondingly.

In order to make it easy for others to customize verl trainers, many
more other classes require further documentations, such as:
- AdvantageEstimator, RayPPOTrainer, apply_kl_penalty, compute_advantage
- from verl.single_controller.ray import RayWorkerGroup
- from verl.trainer.ppo.core_algos import agg_loss
- from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role,
WorkerType
- from verl.utils.checkpoint.checkpoint_manager import
find_latest_ckpt_path

They shall be enhanced in future PRs. 

### High-Level Design

None

### Specific Changes

- added tests
- added verl.utils.metric namespace

### API

`verl.trainer.ppo.metric_utils.reduce_metrics` changed to
`verl.utils.metric.reduce_metrics`. deprecation warnings are added.

### Usage Example

None

### Test

Added

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
https://github.com/volcengine/verl/issues/1354
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-12 08:49:14 +08:00
709796f849 [dev] fix: validation metrics (#1374)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Fix the error that `metric` is not added when `n == 1`.
2. Remove `std@1`.
3. Add assertation for doing initial validation but `val_metrics` is
empty.

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-04 09:06:53 -07:00
HL
52437be1a6 [trainer] breaking: pass dataset as required args to SFTTrainer; also change ppo ray trainer to take custom datasets as inputs (#1282) 2025-05-02 21:03:22 -07:00
8e5ad4688a [Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors:

```
examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120)
examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120)
examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120)
examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120)
examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120)
examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120)
examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120)
examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell
recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120)
recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120)
recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120)
recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120)
recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120)
recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120)
recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120)
recipe/r1/data_process.py:131:9: E722 Do not use bare `except`
recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except`
scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used
scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found
scripts/model_merger.py:42:121: E501 Line too long (184 > 120)
scripts/model_merger.py:146:13: E722 Do not use bare `except`
tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120)
tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used
tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used
tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used
tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used
tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120)
tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20
tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120)
tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18
tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120)
tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120)
tests/sanity/check_license.py:22:1: E402 Module level import not at top of file
tests/sanity/check_license.py:23:1: E402 Module level import not at top of file
tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/__init__.py:22:1: E402 Module level import not at top of file
verl/__init__.py:24:1: E402 Module level import not at top of file
verl/__init__.py:25:1: E402 Module level import not at top of file
verl/__init__.py:29:1: E402 Module level import not at top of file
verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120)
verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120)
verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file
verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120)
verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120)
verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120)
verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120)
verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120)
verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120)
verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120)
verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120)
verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120)
verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120)
verl/protocol.py:303:121: E501 Line too long (125 > 120)
verl/protocol.py:352:121: E501 Line too long (171 > 120)
verl/protocol.py:578:121: E501 Line too long (142 > 120)
verl/protocol.py:580:121: E501 Line too long (150 > 120)
verl/protocol.py:583:121: E501 Line too long (167 > 120)
verl/protocol.py:715:1: E402 Module level import not at top of file
verl/protocol.py:725:121: E501 Line too long (121 > 120)
verl/protocol.py:766:1: E402 Module level import not at top of file
verl/protocol.py:768:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file
verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120)
verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120)
verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+`
verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused
verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused
verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120)
verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120)
verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged
verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120)
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120)
verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120)
verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120)
verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82
verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file
verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found
verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120)
verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120)
verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120)
verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used
verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key`
verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key`
verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120)
verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120)
verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition
verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120)
verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120)
verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string
verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120)
verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string
verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120)
verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused
verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused
verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/model.py:464:121: E501 Line too long (124 > 120)
verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string
verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120)
verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120)
verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused
verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120)
verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120)
verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120)
verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120)
verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120)
verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120)
verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120)
verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120)
verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120)
verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120)
verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file
verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120)
verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l`
verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used
verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file
verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used
verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used
verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120)
verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120)
verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120)
verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120)
verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120)
verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120)
verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120)
verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used
verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used
verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120)
verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used
verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120)
verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used
verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120)
verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120)
verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120)
verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used
verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120)
verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used
Found 305 errors.
```

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-04-27 15:24:30 -07:00
cea529116f feat: move AsyncLLM ChatCompletionScheduler to separate thread (#1274)
Move AsyncLLM ChatCompletionScheduler to separate thread to avoid making
PPOTrainer async class.
2025-04-27 22:02:52 +08:00
64056835b9 [bugfix] fix: add await for _validate() (#1269)
As titled.
2025-04-26 20:32:46 +08:00
e8cd4196e3 fix: remove deprecated remove_previous_ckpt key in prime_ray_trainer.py (#1254)
deprecated remove_previous_ckpt key cause save checkpoint crash.
See: https://github.com/volcengine/verl/issues/1183
2025-04-25 18:12:18 +08:00
aacd3660fc [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138)
### Summary
Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710

### Architecture


![async_llm_arch](https://github.com/user-attachments/assets/e8cd974c-0c26-4d96-9a9e-b71fd85dd32d)



**New Components**:
- AsyncLLMWorker: standalone vllm server instance
  - FastAPI: provide OpenAI-compatible HTTP server
- AsyncLLM: async LLMEngine for online serving, for more details:
[AsyncLLM](https://github.com/vllm-project/vllm/pull/9826),
[LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine)
- ExternalRayDistributedExecutor: custom executor backend manages
workers in worker group, it grabs corresponding workers by actor names

- AsyncLLManager: manages a group of vllm server
instances(AsyncLLMWorker)
  - AsyncLLM lifecycle: initialization, wake_up, sleep.
  - FastAPI service discovery

- ChatScheduler: schedule multiple chat completion requests with
multiple server instances
  - Least requests load balance
  - Sticky session with prefix caching
  - Chat completion callback: tools calling

### TODO
- [x] AsyncLLM: intialization/wake_up/sleep
- [x] OpenAI API:  support `/v1/chat/completions`
- [x] RayPPOTrainer integration: replace `generate_sequences` to http
call `/v1/chat/completions`
- [x] GSM8K e2e training
- [ ] Add document

---------

Co-authored-by: shengguangming <shengguangming@bytedance.com>
2025-04-25 17:56:34 +08:00
99fdbf6985 Log gpu mem refactor (#1190)
Use wrapper to refactor logging GPU memory enter or exit a function.

Simply use `VERL_LOGGING_LEVEL=DEBUG` to open current implemented memory
logger wrapped around common functions.
2025-04-22 13:28:10 +08:00
725c67666f [ray] fix: ray hang due to num_cpus (#1009)
Fixing #523 according to
https://github.com/volcengine/verl/issues/523#issuecomment-2723652147

Concern: will `num_cpus=1` limit the performance of the cluster
scheduler?
2025-04-20 12:50:17 -07:00
28e45cbde2 [Config] fix: disable XFORMERS by default since we immgrated to newer vLLM versions (#1178) 2025-04-20 07:46:20 -07:00
HL
568239fb38 CI: limit ruff checks and enable push tests (#1157) 2025-04-19 13:54:45 +08:00
b00f77d855 [dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010)
> [!WARNING]
> We are [immigrating to `ruff` as the linter and formatter and
`pre-commit` as the managing
tool](https://github.com/volcengine/verl/pull/1010).
>
> If your branch is based on a previous commit using `yapf` and
`pylint`, simply merging might trigger overwhelming linting errors,
while **you are only expected to resolve ones in the files related to
your PR**.
>
> To resolve this issue, please try the following workaround to only
include the files you **really changed** in the PR:
>
> 1. In your branch, fix linting and format with `ruff`: `ruff check
--fix && ruff-format`
> 2. Squash into a single commit in a new branch: `git reset --soft
$(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."`
> 3. Merge with the latest main: `git merge origin/main`
> 4. Force push to your branch: `git push --force`

We add the reminder above to the documentation to tell contributors how
to avoid overwhelming linting errors.

### Motivation

According to dicussion in #896, this PR immigrates from yapf & pylint to
ruff based on pre-commit, which allows unified version control and
automatic hook on committing.

### Summary

The `pre-commit` hook and CI

- checks staged / committed files in commits / PR's
- checks all files each month (This should fail before we fix all the
files by the ruff standard)

### Explanation for the Failing CI Workflow `pre-commit`

For now, we only apply `ruff format` and `ruff check --fix` **without
resolving all the errors**, since there are too many errors to resolve,
which causes the CI workflow `pre-commit` fails.

For resolving the remaining errors, we leave to future commits.
Specifically, the `pre-commit` hook and CI will require every commit to
fix its related files with `ruff`, which will fix all the files
incrementally.

### Reviewing Suggestion

The commit
3d93f51ba8
is huge since we apply `ruff` to all the files. To review the main
changes, please check the commits before and after it.
2025-04-18 07:49:31 -07:00
25b0f2262f Move entropy to comput log probs to reduce peak memory when calculating entropy. (#1100)
Actor do not calculate Entropy loss if `entropy_coeff==0`, and move the
calculation of entropy to `compute_log_probs`

Tested configuration:

```sh
    data.max_prompt_length=$((1024 * 2)) \
    data.max_response_length=$((1024 * 10)) \
    actor_rollout_ref.rollout.max_num_batched_tokens=$((1024 * 12)) \
    context_parallel_size=2 \
```
2025-04-17 17:35:59 +08:00
3256142434 [Breaking] dataset: support customized datasets for RayPPOTrainer (#924)
This PR enable user to specify their customized dataset for
RayPPOTrainer.

NOTE: the RLHFDataset interface has been broken into:
```
RLHFDataset(
    data_files: Union[str, List[str]],
    tokenizer: PreTrainedTokenizer,
    config: DictConfig,
    processor: Optional[ProcessorMixin] = None
)
```

and the custom dataset class MUST also use this interface.

cc @eric-haibin-lin
2025-04-10 22:07:42 -07:00
0407cad23b [dataset] refactor: remove unused filter_prompts parameter from RLHFDataset (#889)
`filter_prompts` has never been used, I think this parameter has been
replaced by `filter_overlong_prompts` so we can simply remove this.
2025-04-04 09:32:49 -07:00
6974bbaeea [dataset] refactor: use hf Dataset instead of pandas DataFrame in RLHFDataset for speedup (#890)
HF Dataset provides better memory management and can handle larger
datasets. It also supports multi-process acceleration during map/filter
operations (while pandas requires version >2.0).

Now we can specify `filter_overlong_prompts` on large-scale datasets
when set `filter_overlong_prompts_workers` to a appreciate num.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-04-03 21:51:53 -07:00
8cae42dc29 fix: misleading eos_mask->response_mask (#878)
https://github.com/volcengine/verl/pull/868#discussion_r2024416560
2025-04-03 13:01:07 +08:00
072fc9feed feat: support no reference model; fix KL issues (#644)
### Before get started

Difference between KL penalty in reward and KL loss

>  [!TIP]
>
>  1. In-reward KL penalty
>
>
>  $$
> r_t = r_{\varphi}(q, o_{\leq t}) - \beta\ \boxed{\log
\frac{\pi_{\theta}(o_t | q, o_{<t})}{\pi_{\text{ref}}(o_t | q, o_{<t})}}
>  $$
>
>  2. KL Loss
>
>  $$
> L^{\text{PPO}}(\theta) = \mathbb{E}_t [ \min(ratio_t A_t,
\text{clip}(ratio_t, 1 - \epsilon, 1 + \epsilon) A_t) ]
>  $$
>
>  $$
>  \- \beta\ \boxed{D_{\text{KL}}(\pi_{\theta} || \pi_{\text{ref}})}
>  $$

### Problems

1. The current code doesn't support not using reference model

This feature is half-implemented since the very first commit but never
completed, e.g., `RayPPOTrainer` has an attribute `use_reference_policy`
but it's always True since role_worker_mapping always has
`Role.RefPolicy`.

2. Restriction of `use_kl_loss` 

Currently, `use_kl_loss` determines whether to use in-reward kl penalty
or kl loss. So we can not use **both or neither**.


87a813658f/verl/trainer/ppo/ray_trainer.py (L875-L879)


87a813658f/verl/workers/actor/dp_actor.py (L299-L307)

>  [!CAUTION]  
>
>  ### You may have unintentionally adopted in-reward KL penalty
>
> For the experiments you've conducted, if you set
`actor.use_kl_loss`=False or didn't set it (Default is False),***You
unintentionally used in-reward KL penalty.*** If you don't want any KL,
you should set `actor_rollout_ref.actor.use_kl_loss=False` and
`algorithm.use_kl_in_reward=False` (or not to set them because they are
the default config) after this commit.

3. Deprecated config

After investigation, I guess Critic may used to be responsible for
in-reward KL. But this feature seems paralyzed.

1. Line 290, there may used to be `config.algorithm.kl_ctrl.target_kl`
and `config.critic.kl_ctrl.horizon` , which are not supported currently.


3ec83117c3/verl/trainer/ppo/ray_trainer.py (L289-L293)

2. In `verl/workers/critic/megatron_critic.py` : redundant set of
`self.kl_ctrl`


3b18b0eb74/verl/workers/critic/megatron_critic.py (L69-L73)


### What’s Changed?

1. Add support for not using reference model
2. Fixed the incomplete code of the KL controller.
3. A test case for using both kl terms
4. Some other misc issues in the code.

### How to disable reference model

* set `actor_rollout_ref.actor.use_kl_loss=False` and
`algorithm.use_kl_in_reward=False` (They are by default False, so you
can simply not set them)
2025-04-01 10:14:38 +08:00
c0621e1bcd [ulysses] fix: repeat kv heads by sp_size//nheads_k if nheads_k is less than sp_size (#850) 2025-03-31 16:25:53 -07:00
50cba4aab9 docs: update checkpoint doc (#800)
Also fix some APIs.
2025-03-28 21:27:01 -07:00
afb9f9f66f [Feat] add max_ckpt_to_keep for old ckpt removal (#724)
Sometimes its space consuming to save too many old checkpoints
2025-03-26 15:03:03 +08:00