mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
[rollout, sglang] feat: Add sync mode for bash (#3186)
### What does this PR do? - Use `sync` mode for `dapo`, `gsm8k` and `geo` ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
This commit is contained in:
@ -40,6 +40,7 @@ python3 -m verl.trainer.main_ppo \
|
||||
actor_rollout_ref.rollout.n=5 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=20 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
actor_rollout_ref.rollout.mode=sync \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
trainer.critic_warmup=0 \
|
||||
trainer.logger='["console","wandb"]' \
|
||||
|
@ -48,7 +48,8 @@ python3 -m verl.trainer.main_ppo \
|
||||
actor_rollout_ref.rollout.n=16 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=32 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
actor_rollout_ref.rollout.over_sample_rate=0 \
|
||||
actor_rollout_ref.rollout.over_sample_rate=0.1 \
|
||||
actor_rollout_ref.rollout.mode=sync \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
trainer.critic_warmup=0 \
|
||||
trainer.logger='["console","wandb"]' \
|
||||
|
@ -35,6 +35,8 @@ python3 -m verl.trainer.main_ppo \
|
||||
actor_rollout_ref.rollout.name=sglang \
|
||||
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
|
||||
actor_rollout_ref.rollout.n=16 \
|
||||
actor_rollout_ref.rollout.over_sample_rate=0.1 \
|
||||
actor_rollout_ref.rollout.mode=sync \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=32 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
|
Reference in New Issue
Block a user