Files
verl/verl/trainer/config/critic/critic.yaml
H 4de3ecf0f0 [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code (#2621)
As initially mentioned in
https://github.com/volcengine/verl/discussions/1941, having structured
configuration classes in verl makes argument passing easier for testing
and validation.

This is an extended thread on the current implementation of
configuration schema in verl. Related PRs:
-  https://github.com/volcengine/verl/pull/2117
- https://github.com/volcengine/verl/pull/2621 

# Motivation 
By moving from loose `omegaconfig.DictConfig`-based parameters to
structured dataclasses, we gain:
- Type safety & IDE support when accessing fields (e.g. cfg.optim.lr).
- Validation hooks via __post_init__ in each class.
- Immutable defaults with controlled mutability (e.g., an extra field).
- Seamless Hydra/OmegaConf integration and easy per-recipe extension.

# Core: BaseConfig

hydra natively provides support for converting DictConfig to dataclass,
but dataclass does not support accessing attribute via `get()`. We
introduce a base class to provide backward compatibility and make the
change less abrupt for existing users.

All config dataclasses inherit from BaseConfig, which:
- Implements collections.abc.Mapping → dict-like iteration/access.
- Freezes attributes once set, unless listed in _mutable_fields.
- Provides an `extra: dict[str, Any]` for unchecked extensions.

```python
@dataclass
class BaseConfig(collections.abc.Mapping):
    """Dict-like, frozen dataclass with opt-in mutability."""
    _mutable_fields: set[str] = {"extra"}
    extra: dict[str, Any] = field(default_factory=dict)

    def __setattr__(self, name: str, value):
        if name in self.__dict__ and name not in self._mutable_fields:
            raise FrozenInstanceError(f"Field '{name}' is frozen")
        super().__setattr__(name, value)

    # Mapping methods: get, __getitem__, __iter__, __len__ …

```

# Example Config Classes (verl/trainer/config)

Each sub-component of the trainer has its own dataclass, inheriting
BaseConfig.
```yaml:
critic:
  checkpoint:
    _target_: verl.trainer.config.CheckpointConfig
    save_contents: ["model","optimizer","extra"]
    load_contents: ["model","optimizer","extra"]
    async_save: false
```
Definition: 
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False

    def __post_init__(self):
        # validation checks go here after initialization


ckpt_cfg = CheckpointConfig(async_save=True)
print(ckpt_cfg.save_contents)
print(ckpt_cfg.get("save_contents", default_value))
print(ckpt_cfg["save_contents"])

# converting hydra-generated omegaconf.DictConfig to the dataclass config:
from verl.utils.config import omegaconf_to_dataclass
ckpt_cfg_from_cli = omegaconf_to_dataclass(config.critic.checkpoint)
```

# Extending existing config classes
Because now configs become structured, unexpected keys would raise
exceptions. To add new keys, there are two ways:
## Explicit class extensions:
```python
from verl.workers.config import FSDPActorConfig

@dataclass
class SPPOActorConfig(FSDPActorConfig):
    """Add SPPO-specific temperature/penalty."""
    sppo_eta: float = 1.0

```
When using yaml or from command line, update the target config class:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    _target_:  recipe.sppo.config.SPPOActorConfig # **new target dataclass required for extension **
    sppo_eta: 1.0  
```
or directly from command line:
```bash
python main_sppo.py \
  actor_rollout_ref.actor._target_=recipe.sppo.config.SPPOActorConfig \
  actor_rollout_ref.actor.sppo_eta=1.0
```

## Leverage the `extra` field
Adding more keys to the `extra` field of any dataclass that inherits
from `BaseConfig` also works. This way there's no need to define your
own dataclass in python:
```yaml
hydra:
  searchpath:
    - file://verl/trainer/config
defaults:
  - ppo_trainer      # base trainer config
  - _self_               # then apply these overrides

actor_rollout_ref:
  actor:
    extra:
        sppo_eta: 1.0  
```

# Declaring mutable fields
For historical reasons some fields in the configs are mutated inplace in
the codebase such as batch size for data/sequence parallelism. We are in
the process of deprecating this kind of behavior. However, if you want
to intentionally mutate one field, specify it with the `_mutable_fields`
attr:
```python
@dataclass
class CheckpointConfig(BaseConfig):
    """What to save/load and async behavior."""
    _mutable_fields = BaseConfig._mutable_fields | {"save_contents"} # mark save_contents as mutable.

    save_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    load_contents: list[str] = field(default_factory=lambda: ["model","optimizer","extra"])
    async_save: bool = False
```

# Other helpful resources
verl default trainer configs combines the following config files
together, specified in the `_defaults_` field:
https://github.com/volcengine/verl/blob/main/verl/trainer/config/ppo_trainer.yaml#L1-L36
- verl/trainer/config/ppo_trainer.yaml  # main config for entrypoint 
- verl/trainer/config/actor/dp_actor.yaml 
- verl/trainer/config/critic/dp_critic.yaml 
- verl/trainer/config/reward_model/dp_reward_model.yaml 
- verl/trainer/config/rollout/rollout.yaml 

To quickly peek the default full config in a single file, you can check
the auto-generated full config in
https://github.com/volcengine/verl/blob/main/verl/trainer/config/_generated_ppo_trainer.yaml

# Change log and impact on existing code
This PR converts the following fields to structured dataclass in the
training pipeline. More can be done in future PRs (contributions from
the community is welcome)
- [x] actor_rollout_ref.actor
- [x] critic 
- [ ] actor_rollout_ref.rollout
- [ ] actor_rollout_ref.ref
- [ ] reward_model
- [ ] data
- [ ] trainer

Changes needed for existing code that added new fields to config:
- see recipe/sppo for an example 
- `OmegaConf.to_container(self.config.model.get("override_config",
OmegaConf.create()))` now has to manually changed to
`self.config.model.get("override_config", {})`. Because
OmegaConf.to_container expects a DictConfig but
config.model.override_config is already a dict.

# Other Breaking Changes
critic.optim.lr for megatron changed from 1e-6 to 1e-5

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Joel <wuxibin@bytedance.com>
Co-authored-by: Cheetah <1659275352@qq.com>
Co-authored-by: 杨睿 <yangruipis@163.com>
Co-authored-by: X. HU <huxiaobo@zju.edu.cn>
Co-authored-by: Le Xue <48175490+ShareLer@users.noreply.github.com>
Co-authored-by: Ziheng Jiang <ziheng@apache.org>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-23 11:45:14 -07:00

113 lines
3.4 KiB
YAML

# Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs
_target_: verl.workers.config.CriticConfig
# Number of rollouts per update (mirrors actor rollout_n)
rollout_n: ${oc.select:actor_rollout_ref.rollout.n,1}
# fsdp or fsdp2 strategy used for critic model training
strategy: ???
# whether to enable the critic worker.
# by default it is only enabled if advantage estimator is gae
# set it to True manually if you always want to enable critic worker
enable: null
# optimizer configs
optim:
# Learning rate
lr: 1e-5
# Warmup steps ratio; total steps will be injected at runtime
lr_warmup_steps_ratio: 0.0
# Total training steps (must be overridden at runtime)
total_training_steps: -1
# Weight decay
weight_decay: 0.01
# Prioritized. None, 0 or Negative values mean delegating to lr_warmup_steps_ratio.
lr_warmup_steps: -1
# model config for the critic
model:
# Path to pretrained model weights
path: ~/models/deepseek-llm-7b-chat
# Tokenizer path (defaults to actor's model path)
tokenizer_path: ${oc.select:actor_rollout_ref.model.path,"~/models/deepseek-llm-7b-chat"}
# Hugging Face config override
override_config: {}
# External model implementation (optional)
external_lib: ${oc.select:actor_rollout_ref.model.external_lib,null}
# Whether to trust remote code from Hugging Face models
trust_remote_code: ${oc.select:actor_rollout_ref.model.trust_remote_code,false}
# PPO mini-batch size per update
ppo_mini_batch_size: ${oc.select:actor_rollout_ref.actor.ppo_mini_batch_size,256}
# [Deprecated] Global micro batch size
ppo_micro_batch_size: null
# Local per-GPU micro batch size
ppo_micro_batch_size_per_gpu: ${oc.select:.ppo_micro_batch_size,null}
# Whether to automatically adjust batch size at runtime
use_dynamic_bsz: ${oc.select:actor_rollout_ref.actor.use_dynamic_bsz,false}
# Max tokens per GPU in one PPO batch (doubled for critic)
ppo_max_token_len_per_gpu: 32768
# Max token length per GPU in forward pass
forward_max_token_len_per_gpu: ${.ppo_max_token_len_per_gpu}
# Number of PPO epochs per batch
ppo_epochs: ${oc.select:actor_rollout_ref.actor.ppo_epochs,1}
# Shuffle training data across PPO epochs
shuffle: ${oc.select:actor_rollout_ref.actor.shuffle,false}
# PPO value function clipping range
cliprange_value: 0.5
# Loss aggregation mode: "token-mean", "seq-mean-token-sum", or "seq-mean-token-mean"
loss_agg_mode: ${oc.select:actor_rollout_ref.actor.loss_agg_mode,token-mean}
# checkpoint configs
checkpoint:
# Target dataclass for this configuration
_target_: verl.trainer.config.CheckpointConfig
# What to include in saved checkpoints
# with 'hf_model' you can save whole model as hf format, now only use sharded model checkpoint to save space
save_contents: ['model', 'optimizer', 'extra']
# What to include when loading checkpoints
load_contents: ${.save_contents}
# Whether to save checkpoints asynchronously. Only effective for Megatron as of now.
async_save: False
# profiler configs
# the corresponding dataclass is verl.utils.profiler.ProfilerConfig.
profiler:
# Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs
_target_: verl.utils.profiler.ProfilerConfig
# True for each task has its own database, False for all tasks in one training step share one database.
discrete: False
# Whether to profile all ranks.
all_ranks: False
# The ranks that will be profiled. [] or [0,1,...]
ranks: []