[doc] fix: Fix mismatched config description for ppo_epochs in critic (#2102)

### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> Fix mismatched config description for `ppo_epochs` in critic

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.


![image](https://github.com/user-attachments/assets/72df0d9a-3ac8-418c-b1c0-aa6e6daaccfd)

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
This commit is contained in:
xichengpro
2025-06-19 18:19:31 +08:00
committed by GitHub
parent 42f612dc15
commit ccefcf05ca
2 changed files with 2 additions and 2 deletions

View File

@ -37,7 +37,7 @@ Most critic configs are similar to those of actors. Note that the critic model i
- `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for actor
- `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic
- `critic.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic. Defaults to `actor_rollout_ref.actor.ppo_epochs`
- `algorithm.gemma`: discount factor

View File

@ -37,7 +37,7 @@ Most critic configs are similar to those of actors. Note that the critic model i
- `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for actor
- `actor_rollout_ref.actor.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic
- `critic.ppo_epochs`: Number of epochs for PPO updates on one set of sampled trajectories for critic. Defaults to `actor_rollout_ref.actor.ppo_epochs`
- `algorithm.gemma`: discount factor