[doc] fix: Fix mismatched config description for ppo_epochs in critic (#2102)

### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Fix mismatched config description for `ppo_epochs` in critic ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ![image](https://github.com/user-attachments/assets/72df0d9a-3ac8-418c-b1c0-aa6e6daaccfd) > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
2025-10-20 13:43:50 +08:00 · 2025-06-19 18:19:31 +08:00
parent 42f612dc15
commit ccefcf05ca
2 changed files with 2 additions and 2 deletions
--- a/docs/algo/ppo.md
+++ b/docs/algo/ppo.md
@ -37,7 +37,7 @@ Most critic configs are similar to those of actors. Note that the critic model i

 - `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for actor

- `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic
+- `critic.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic. Defaults to `actor_rollout_ref.actor.ppo_epochs`

 - `algorithm.gemma`: discount factor

--- a/examples/ppo_trainer/README.md
+++ b/examples/ppo_trainer/README.md
@ -37,7 +37,7 @@ Most critic configs are similar to those of actors. Note that the critic model i

 - `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for actor

- `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic
+- `critic.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic. Defaults to `actor_rollout_ref.actor.ppo_epochs`

 - `algorithm.gemma`: discount factor