### What does this PR do?
> Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to
align with Hugging Face Trainer API。
The following pull request is for refactoring the optimizer, however,
the naming issue persists.
https://github.com/volcengine/verl/pull/3656
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: weiqi.li <weiqi.li@bytedance.com>
As initially mentioned in
https://github.com/volcengine/verl/discussions/1941, having structured
configuration classes in verl makes argument passing easier for testing
and validation.
This is an extended thread on the current implementation of
configuration schema in verl. Related PRs:
- https://github.com/volcengine/verl/pull/2117
- https://github.com/volcengine/verl/pull/2621
# Motivation
By moving from loose `omegaconfig.DictConfig`-based parameters to
structured dataclasses, we gain:
- Type safety & IDE support when accessing fields (e.g. cfg.optim.lr).
- Validation hooks via __post_init__ in each class.
- Immutable defaults with controlled mutability (e.g., an extra field).
- Seamless Hydra/OmegaConf integration and easy per-recipe extension.
# Core: BaseConfig
hydra natively provides support for converting DictConfig to dataclass,
but dataclass does not support accessing attribute via `get()`. We
introduce a base class to provide backward compatibility and make the
change less abrupt for existing users.
All config dataclasses inherit from BaseConfig, which:
- Implements collections.abc.Mapping → dict-like iteration/access.
- Freezes attributes once set, unless listed in _mutable_fields.
- Provides an `extra: dict[str, Any]` for unchecked extensions.
```python
@dataclass
class BaseConfig(collections.abc.Mapping):
"""Dict-like, frozen dataclass with opt-in mutability."""
_mutable_fields: set[str] = {"extra"}
extra: dict[str, Any] = field(default_factory=dict)
def __setattr__(self, name: str, value):
if name in self.__dict__ and name not in self._mutable_fields:
raise FrozenInstanceError(f"Field '{name}' is frozen")
super().__setattr__(name, value)
# Mapping methods: get, __getitem__, __iter__, __len__ …
```
# Example Config Classes (verl/trainer/config)
Each sub-component of the trainer has its own dataclass, inheriting
BaseConfig.
```yaml:
critic:
checkpoint:
_target_: verl.trainer.config.CheckpointConfig
save_contents: ["model","optimizer","extra"]
load_contents: ["model","optimizer","extra"]
async_save: false
```
Definition:
```python
@dataclass
class CheckpointConfig(BaseConfig):
"""What to save/load and async behavior."""
save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
async_save: bool = False
def __post_init__(self):
# validation checks go here after initialization
ckpt_cfg = CheckpointConfig(async_save=True)
print(ckpt_cfg.save_contents)
print(ckpt_cfg.get("save_contents", default_value))
print(ckpt_cfg["save_contents"])
# converting hydra-generated omegaconf.DictConfig to the dataclass config:
from verl.utils.config import omegaconf_to_dataclass
ckpt_cfg_from_cli = omegaconf_to_dataclass(config.critic.checkpoint)
```
# Extending existing config classes
Because now configs become structured, unexpected keys would raise
exceptions. To add new keys, there are two ways:
## Explicit class extensions:
```python
from verl.workers.config import FSDPActorConfig
@dataclass
class SPPOActorConfig(FSDPActorConfig):
"""Add SPPO-specific temperature/penalty."""
sppo_eta: float = 1.0
```
When using yaml or from command line, update the target config class:
```yaml
hydra:
searchpath:
- file://verl/trainer/config
defaults:
- ppo_trainer # base trainer config
- _self_ # then apply these overrides
actor_rollout_ref:
actor:
_target_: recipe.sppo.config.SPPOActorConfig # **new target dataclass required for extension **
sppo_eta: 1.0
```
or directly from command line:
```bash
python main_sppo.py \
actor_rollout_ref.actor._target_=recipe.sppo.config.SPPOActorConfig \
actor_rollout_ref.actor.sppo_eta=1.0
```
## Leverage the `extra` field
Adding more keys to the `extra` field of any dataclass that inherits
from `BaseConfig` also works. This way there's no need to define your
own dataclass in python:
```yaml
hydra:
searchpath:
- file://verl/trainer/config
defaults:
- ppo_trainer # base trainer config
- _self_ # then apply these overrides
actor_rollout_ref:
actor:
extra:
sppo_eta: 1.0
```
# Declaring mutable fields
For historical reasons some fields in the configs are mutated inplace in
the codebase such as batch size for data/sequence parallelism. We are in
the process of deprecating this kind of behavior. However, if you want
to intentionally mutate one field, specify it with the `_mutable_fields`
attr:
```python
@dataclass
class CheckpointConfig(BaseConfig):
"""What to save/load and async behavior."""
_mutable_fields = BaseConfig._mutable_fields | {"save_contents"} # mark save_contents as mutable.
save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
async_save: bool = False
```
# Other helpful resources
verl default trainer configs combines the following config files
together, specified in the `_defaults_` field:
https://github.com/volcengine/verl/blob/main/verl/trainer/config/ppo_trainer.yaml#L1-L36
- verl/trainer/config/ppo_trainer.yaml # main config for entrypoint
- verl/trainer/config/actor/dp_actor.yaml
- verl/trainer/config/critic/dp_critic.yaml
- verl/trainer/config/reward_model/dp_reward_model.yaml
- verl/trainer/config/rollout/rollout.yaml
To quickly peek the default full config in a single file, you can check
the auto-generated full config in
https://github.com/volcengine/verl/blob/main/verl/trainer/config/_generated_ppo_trainer.yaml
# Change log and impact on existing code
This PR converts the following fields to structured dataclass in the
training pipeline. More can be done in future PRs (contributions from
the community is welcome)
- [x] actor_rollout_ref.actor
- [x] critic
- [ ] actor_rollout_ref.rollout
- [ ] actor_rollout_ref.ref
- [ ] reward_model
- [ ] data
- [ ] trainer
Changes needed for existing code that added new fields to config:
- see recipe/sppo for an example
- `OmegaConf.to_container(self.config.model.get("override_config",
OmegaConf.create()))` now has to manually changed to
`self.config.model.get("override_config", {})`. Because
OmegaConf.to_container expects a DictConfig but
config.model.override_config is already a dict.
# Other Breaking Changes
critic.optim.lr for megatron changed from 1e-6 to 1e-5
---------
Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Joel <wuxibin@bytedance.com>
Co-authored-by: Cheetah <1659275352@qq.com>
Co-authored-by: 杨睿 <yangruipis@163.com>
Co-authored-by: X. HU <huxiaobo@zju.edu.cn>
Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com>
Co-authored-by: Ziheng Jiang <ziheng@apache.org>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
Currently, the `e2e_prime` test encounters the error` AttributeError:
'NoneType' object has no attribute 'squeeze'`, which is caused by [
#1212].
In PR [#1568], the parameter `use_fused_kernel` in `ppo_trainer.yaml`
was set to `false`, but the corresponding parameter in
`prime_trainer.yaml` was not updated. This is preventing the CI from
passing. Before the root cause of `use_fused_kernel` is fully resolved ,
I guess we should temporarily set `use_fused_kernel` to `false` in
`prime_trainer.yaml`
### High-Level Design
Not needed
### Specific Changes
- Default use_fused_kernels = False
### API
Not needed
### Usage Example
Not needed
### Test
Not needed
### Additional Info.
Not needed
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
# What does this PR do?
This PR implements fused losses for alignment. #710
It reduces the memory required for loss calculation to a small constant
amount.
# ChangeLog:
- added the option use_fused_kernels
- monkey patch to make model.forward return last_hidden_state and not
calculate logits
- Added FusedLinearForPPO to verl/utils/experimental/torch_functional.py
# Usage
Simply add the following option
```
actor_rollout_ref.model.use_fused_kernels=True
```
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- The current implementation uses chunking to reduce the memory
consumption to a constant value.
- It works by splitting the loss calculations into chunks of 512 tokens.
Calculating the log_probs / entropy values / gradients for each chunk
and accumulating them.
- However the current implementation can be slow. It processes each
chunk sequentially in a python for loop.
- In the future we should consider converting the fused functions into
triton or some other JIT solution.
- Compared to FusedPPOLossFunction, optimizing hidden_states -> entropy
& log_probs is much better for algorithm developers as the memory heavy
part is optimized away for them and they are free to combine the values
for their own custom loss functions.
---------
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
Refactor and merge PRIME algorithm into verl/main
https://github.com/PRIME-RL/PRIME
Breaking changes:
`trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.