mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
### What does this PR do? This PR introduces a BaseConfig class that bridges dataclass and hydra's DictConfig in the codebase. In this PR, the algorithm related configs and profiler related configs are instantiated as dataclass upfront for both main_ppo and main_dapo. The config related changes are expected to be backward compatible (supporting xx_config.get() API) Besides, this PR also moves the profiler related files under verl.utils.debug to verl.utils.profiler.xx. The `verl.utils.debug.performance.py` is kept for backward compatibility purpose and we'll drop it in later versions. Main principle: - users are not forced to use dataclass configs. All changes are backward compatible. - dataclass configs are converted upfront on a per entrypoint basis. Here we target main_ppo.py and main_dapo.py, and the other recipes' entrypoints are left intact. - the new dataclass are intentionally set to be frozen. Configs should not be mutable. Whenever a new field is needed, we should make a copy of the config for a new one. - whenever a dataclass config is introduced, we encourage having simple cpu-based unit tests to test the basic functionality of functions that rely on it (e.g. the grpo adv estimation in core_algorithm.py). and then also update all type annotation for the impacted functions. - in the yaml file, `_target_` field should be specified for dataclass conversion. e.g. `_target_: verl.xxx.XXConfig` The PR is built on top of @liuzhenhai93 's contribution. ### Checklist Before Describing the Details - [x] Searched for similar PR(s). - [x] PR title is in the format of: `[modules] type: Title` - modules: `trainer, cfg` - type: `feat` ### Test - Added comprehensive unit tests in `tests/trainer/config/test_algorithm_config_on_cpu.py`, `test_base_config_on_cpu.py` - Tests cover dataclass creation, nested configuration handling, backward compatibility, and integration with core algorithms - All tests pass successfully, validating the functionality and integration with existing code ### High-Level Design The design introduces three dataclasses: 1. **`KLControlConfig`**: Handles KL control parameters (type, kl_coef, horizon, target_kl) 2. **`PFPPOConfig`**: Manages preference feedback PPO parameters (reweight_method, weight_pow) 3. **`AlgorithmConfig`**: Main algorithm configuration containing all fields from the YAML config The conversion uses the existing `verl.utils.omega_conf_to_dataclass` utility to seamlessly convert from OmegaConf DictConfig to typed dataclasses. ### API and Usage Example The API maintains backward compatibility while providing type-safe access: ```python # Before (DictConfig) if config.algorithm.use_kl_in_reward: kl_penalty = config.algorithm.kl_penalty kl_coef = config.algorithm.kl_ctrl.get("kl_coef", 0.001) # After (Dataclass) - Type-safe with IDE support algorithm_config = omega_conf_to_dataclass(config.algorithm) if algorithm_config.use_kl_in_reward: kl_penalty = algorithm_config.kl_penalty # Type-safe access kl_coef = algorithm_config.kl_ctrl.kl_coef # Nested config access # Backward compatibility maintained gamma = algorithm_config.get("gamma", 1.0) # Still works # other cases profiler_config = omega_conf_to_dataclass(config) self.assertEqual(profiler_config.discrete, config.discrete) self.assertEqual(profiler_config.all_ranks, config.all_ranks) self.assertEqual(profiler_config.ranks, config.ranks) assert isinstance(profiler_config, ProfilerConfig) with self.assertRaises(AttributeError): _ = profiler_config.non_existing_key assert config.get("non_existing_key") == profiler_config.get("non_existing_key") assert config.get("non_existing_key", 1) == profiler_config.get("non_existing_key", 1) assert config["discrete"] == profiler_config["discrete"] from dataclasses import FrozenInstanceError with self.assertRaises(FrozenInstanceError): profiler_config.discrete = False ``` ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit run --show-diff-on-failure --color=always --all-files` - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. **Note**: This change is fully backward compatible and does not break any existing APIs. The dataclass provides the same interface as the original DictConfig while adding type safety and better structure. --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
29 lines
628 B
YAML
29 lines
628 B
YAML
hydra:
|
|
searchpath:
|
|
- file://verl/trainer/config
|
|
|
|
defaults:
|
|
- ppo_trainer
|
|
- _self_
|
|
|
|
data:
|
|
gen_batch_size: ${data.train_batch_size}
|
|
|
|
reward_model:
|
|
reward_manager: dapo
|
|
overlong_buffer:
|
|
enable: False # We try to avoid forgetting to set enable
|
|
len: 0
|
|
penalty_factor: 0.0
|
|
log: False
|
|
|
|
algorithm:
|
|
filter_groups:
|
|
_target_: verl.trainer.config.FilterGroupsConfig
|
|
enable: False # We try to avoid forgetting to set enable
|
|
metric: null # acc / score / seq_reward / seq_final_reward / ...
|
|
max_num_gen_batches: 0 # Non-positive values mean no upper limit
|
|
|
|
trainer:
|
|
project_name: verl-dapo
|