8 Commits

Author SHA1 Message Date
7f27789961 [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type (#3739)
### What does this PR do?

> Rename `warmup_style` in FSDPOptimizerConfig to `lr_scheduler_type` to
align with Hugging Face Trainer API。

The following pull request is for refactoring the optimizer, however,
the naming issue persists.
https://github.com/volcengine/verl/pull/3656 
### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

---------

Co-authored-by: weiqi.li <weiqi.li@bytedance.com>
2025-10-13 15:58:59 +08:00
H
4de3ecf0f0 [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code (#2621)
As initially mentioned in
https://github.com/volcengine/verl/discussions/1941, having structured
configuration classes in verl makes argument passing easier for testing
and validation.

This is an extended thread on the current implementation of
configuration schema in verl. Related PRs:
-  https://github.com/volcengine/verl/pull/2117
- https://github.com/volcengine/verl/pull/2621 

# Motivation 
By moving from loose `omegaconfig.DictConfig`-based parameters to
structured dataclasses, we gain:
- Type safety & IDE support when accessing fields (e.g. cfg.optim.lr).
- Validation hooks via __post_init__ in each class.
- Immutable defaults with controlled mutability (e.g., an extra field).
- Seamless Hydra/OmegaConf integration and easy per-recipe extension.

# Core: BaseConfig

hydra natively provides support for converting DictConfig to dataclass,
but dataclass does not support accessing attribute via `get()`. We
introduce a base class to provide backward compatibility and make the
change less abrupt for existing users.

All config dataclasses inherit from BaseConfig, which:
- Implements collections.abc.Mapping → dict-like iteration/access.
- Freezes attributes once set, unless listed in _mutable_fields.
- Provides an `extra: dict[str, Any]` for unchecked extensions.

```python
@dataclass
class BaseConfig(collections.abc.Mapping):
    """Dict-like, frozen dataclass with opt-in mutability."""
    _mutable_fields: set[str] = {"extra"}
    extra: dict[str, Any] = field(default_factory=dict)

    def __setattr__(self, name: str, value):
        if name in self.__dict__ and name not in self._mutable_fields:
            raise FrozenInstanceError(f"Field '{name}' is frozen")
        super().__setattr__(name, value)

    # Mapping methods: get, __getitem__, __iter__, __len__ …

```

# Example Config Classes (verl/trainer/config)

Each sub-component of the trainer has its own dataclass, inheriting
BaseConfig.
```yaml:
critic:
  checkpoint:
    _target_: verl.trainer.config.CheckpointConfig
    save_contents: ["model","optimizer","extra"]
    load_contents: ["model","optimizer","extra"]
    async_save: false
```
Definition: 
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False

    def __post_init__(self):
        # validation checks go here after initialization


ckpt_cfg = CheckpointConfig(async_save=True)
print(ckpt_cfg.save_contents)
print(ckpt_cfg.get("save_contents", default_value))
print(ckpt_cfg["save_contents"])

# converting hydra-generated omegaconf.DictConfig to the dataclass config:
from verl.utils.config import omegaconf_to_dataclass
ckpt_cfg_from_cli = omegaconf_to_dataclass(config.critic.checkpoint)
```

# Extending existing config classes
Because now configs become structured, unexpected keys would raise
exceptions. To add new keys, there are two ways:
## Explicit class extensions:
```python
from verl.workers.config import FSDPActorConfig

@dataclass
class SPPOActorConfig(FSDPActorConfig):
    """Add SPPO-specific temperature/penalty."""
    sppo_eta: float = 1.0

```
When using yaml or from command line, update the target config class:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    _target_:  recipe.sppo.config.SPPOActorConfig # **new target dataclass required for extension **
    sppo_eta: 1.0  
```
or directly from command line:
```bash
python main_sppo.py \
  actor_rollout_ref.actor._target_=recipe.sppo.config.SPPOActorConfig \
  actor_rollout_ref.actor.sppo_eta=1.0
```

## Leverage the `extra` field
Adding more keys to the `extra` field of any dataclass that inherits
from `BaseConfig` also works. This way there's no need to define your
own dataclass in python:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    extra:
        sppo_eta: 1.0  
```

# Declaring mutable fields
For historical reasons some fields in the configs are mutated inplace in
the codebase such as batch size for data/sequence parallelism. We are in
the process of deprecating this kind of behavior. However, if you want
to intentionally mutate one field, specify it with the `_mutable_fields`
attr:
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    _mutable_fields = BaseConfig._mutable_fields | {"save_contents"} # mark save_contents as mutable.

    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False
```

# Other helpful resources
verl default trainer configs combines the following config files
together, specified in the `_defaults_` field:
https://github.com/volcengine/verl/blob/main/verl/trainer/config/ppo_trainer.yaml#L1-L36
- verl/trainer/config/ppo_trainer.yaml  # main config for entrypoint 
- verl/trainer/config/actor/dp_actor.yaml 
- verl/trainer/config/critic/dp_critic.yaml 
- verl/trainer/config/reward_model/dp_reward_model.yaml 
- verl/trainer/config/rollout/rollout.yaml 

To quickly peek the default full config in a single file, you can check
the auto-generated full config in
https://github.com/volcengine/verl/blob/main/verl/trainer/config/_generated_ppo_trainer.yaml

# Change log and impact on existing code
This PR converts the following fields to structured dataclass in the
training pipeline. More can be done in future PRs (contributions from
the community is welcome)
- [x] actor_rollout_ref.actor
- [x] critic 
- [ ] actor_rollout_ref.rollout
- [ ] actor_rollout_ref.ref
- [ ] reward_model
- [ ] data
- [ ] trainer

Changes needed for existing code that added new fields to config:
- see recipe/sppo for an example 
- `OmegaConf.to_container(self.config.model.get("override_config",
OmegaConf.create()))` now has to manually changed to
`self.config.model.get("override_config", {})`. Because
OmegaConf.to_container expects a DictConfig but
config.model.override_config is already a dict.

# Other Breaking Changes
critic.optim.lr for megatron changed from 1e-6 to 1e-5

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Joel <wuxibin@bytedance.com>
Co-authored-by: Cheetah <1659275352@qq.com>
Co-authored-by: 杨睿 <yangruipis@163.com>
Co-authored-by: X. HU <huxiaobo@zju.edu.cn>
Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com>
Co-authored-by: Ziheng Jiang <ziheng@apache.org>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-23 11:45:14 -07:00
c8908e197c [fsdp] feat: Memory efficient cross entropy with a linear layer fused (#462)
Implemented forward and backward of the following compute logics, which
eliminated many intermediate storage tensors, and resulted in reduced
peak memory usage.

## Equivalent compute logic:
```python
def run_torch_entropy(hidden: torch.Tensor,
                    weight: torch.Tensor,
                    labels: torch.Tensor) -> typing.List[torch.Tensor]:
    logits = torch.matmul(hidden.to(torch.float32), weight.to(torch.float32)) # [num_tokens, vocab_size]
    pd = torch.nn.functional.softmax(logits, dim=-1) # [num_tokens, vocab_size]
    entropy_a = torch.logsumexp(logits, dim=-1) # [num_tokens]
    entropy_b = torch.sum(pd * logits, dim=-1) # [num_tokens]
    entropy = entropy_a - entropy_b
    logprobs = torch.nn.functional.cross_entropy(logits, labels) # [1]
    logprobs = torch.neg(logprobs)
    return logprobs, entropy
```

## API
```python
from verl.utils.kernel import linear_cross_entropy

hidden = torch.randn(num_tokens, hidden_size, dtype=torch.bfloat16, device="cuda")
weight = torch.randn(hidden_size, vocab_size, dtype=torch.bfloat16, device="cuda")
labels = torch.randint(0, vocab_size, (num_tokens,), device="cuda")

loss, entropy = linear_cross_entropy(hidden, weight, labels, reduction="mean")
```

## Storage and latency
<img width="636" alt="image"
src="https://github.com/user-attachments/assets/396b7303-a46a-46b1-a261-917fda034b02"
/>

## Unit test
```shell
$ cd verl/
$ python3 tests/kernel/test_memory_efficient_entropy.py
```

# NOTE
For compatibility, `torch.library.triton_op` was not applied to those
APIs, so that `torch.compile` might not be able to be enabled on top of
it.

---------

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: gaoziyuan.955 <gaoziyuan.955@bytedance.com>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-06-11 19:48:47 +08:00
4779f26164 [Refactor] fused kernel in forward (#1624)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Shifts fused_linear_for_ppo into model.forward for FSDP

### High-Level Design

Self explaining

### Specific Changes

- Update monkey patch to return log_probs and entropy instead of
last_hidden_state.

### API

No changes

### Usage Example

```sh
actor_rollout_ref.model.use_fused_kernels=True
```

### Test


![image](https://github.com/user-attachments/assets/c6af68fb-0200-4aee-9596-0b445afdc562)


### Additional Info.

- This is to fix #1565 
- The original bug arises because we tried to access
model.lm_head.weight from outside of the FSDP wrapped context.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-24 13:50:57 +08:00
a3c4cb386c Disable fused kernels in prime (#1598)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Currently, the `e2e_prime` test encounters the error` AttributeError:
'NoneType' object has no attribute 'squeeze'`, which is caused by [
#1212].

In PR [#1568], the parameter `use_fused_kernel` in `ppo_trainer.yaml`
was set to `false`, but the corresponding parameter in
`prime_trainer.yaml` was not updated. This is preventing the CI from
passing. Before the root cause of `use_fused_kernel` is fully resolved ,
I guess we should temporarily set `use_fused_kernel` to `false` in
`prime_trainer.yaml`
### High-Level Design

Not needed

### Specific Changes

- Default use_fused_kernels = False

### API

Not needed

### Usage Example

Not needed

### Test

Not needed

### Additional Info.

Not needed

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-20 16:27:33 +08:00
eb077f66e5 Feat/memory optimized loss (#1212)
# What does this PR do?

This PR implements fused losses for alignment. #710
It reduces the memory required for loss calculation to a small constant
amount.

# ChangeLog:

- added the option use_fused_kernels
- monkey patch to make model.forward return last_hidden_state and not
calculate logits
- Added FusedLinearForPPO to verl/utils/experimental/torch_functional.py

# Usage

Simply add the following option
```
actor_rollout_ref.model.use_fused_kernels=True
```

## Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info:
- The current implementation uses chunking to reduce the memory
consumption to a constant value.
- It works by splitting the loss calculations into chunks of 512 tokens.
Calculating the log_probs / entropy values / gradients for each chunk
and accumulating them.
- However the current implementation can be slow. It processes each
chunk sequentially in a python for loop.
- In the future we should consider converting the fused functions into
triton or some other JIT solution.
- Compared to FusedPPOLossFunction, optimizing hidden_states -> entropy
& log_probs is much better for algorithm developers as the memory heavy
part is optimized away for them and they are free to combine the values
for their own custom loss functions.

---------

Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gaoziyuan <gaoziyuan.955@bytedance.com>
2025-05-16 22:52:54 +08:00
22657bade5 [config] feat: lr_warmup_steps (#564)
This PR adds the `lr_warmup_steps` configuration.

Note the `num_warmup_steps` is prior to `lr_warmup_steps_ratio`.
2025-03-14 16:09:12 +08:00
f0e7f9fcbe recipe: PRIME algorithm (#362)
Refactor and merge PRIME algorithm into verl/main
https://github.com/PRIME-RL/PRIME

Breaking changes:    
`trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.
2025-03-10 11:31:43 -07:00