frozenleaves/verl - verl - Gitea: Git for Me

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Author	SHA1	Message	Date
Kai-Hsun Chen	a31a8f251f	[doc] fix: quickstart example can't work on zsh (#2509 ) ### What does this PR do? I followed the instructions at https://verl.readthedocs.io/en/latest/start/quickstart.html to run the PPO example on my devbox, which uses zsh. However, I got the error zsh: no matches found: `trainer.logger=[console]` because `[]` is interpreted as a glob pattern in zsh. ``` (verl) ➜ verl git:(20250713-devbox-2-tmux0-verl-2) ✗ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/data/gsm8k/train.parquet \ data.val_files=$HOME/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=64 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=4 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console'] \ trainer.val_before_train=False \ trainer.n_gpus_per_node=1 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.total_epochs=15 2>&1 \| tee verl_demo.log zsh: no matches found: trainer.logger=[console] ``` This PR has 3 changes: * `trainer.logger=['console']` -> `trainer.logger=console` * `trainer.logger=['console','wandb']` -> `trainer.logger='["console","wandb"]'` * `trainer.logger=['console','tensorboard']` -> `trainer.logger='["console","tensorboard"]'` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test * `trainer.logger=console` (zsh) <img width="898" height="564" alt="image" src="https://github.com/user-attachments/assets/a957a493-75e6-462b-9974-6b1c4cdf5a80" /> * ``trainer.logger='["console","wandb"]'`` (zsh) <img width="870" height="565" alt="image" src="https://github.com/user-attachments/assets/e20613bf-2ccc-4653-b23f-90edc3d568d1" /> * `trainer.logger=console` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger=console \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:52:27,669 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1799248) TaskRunner hostname: ip-172-31-9-244, PID: 1799248 (TaskRunner pid=1799248) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra'], (TaskRunner pid=1799248) 'save_contents': ['model', (TaskRunner pid=1799248) 'optimizer', (TaskRunner pid=1799248) 'extra']}, ``` * `trainer.logger='["console","wandb"]'` (bash) ```bash ubuntu@ip-xxx-xx-x-xxx:~/verl$ PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ > data.train_files=$HOME/data/gsm8k/train.parquet \ > data.val_files=$HOME/data/gsm8k/test.parquet \ > data.train_batch_size=256 \ > data.max_prompt_length=512 \ > data.max_response_length=256 \ > actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > actor_rollout_ref.actor.optim.lr=1e-6 \ > actor_rollout_ref.actor.ppo_mini_batch_size=64 \ > actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ > actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ > actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ > actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ > actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ > critic.optim.lr=1e-5 \ > critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ > critic.ppo_micro_batch_size_per_gpu=4 \ > algorithm.kl_ctrl.kl_coef=0.001 \ > trainer.logger='["console","wandb"]' \ > trainer.val_before_train=False \ > trainer.n_gpus_per_node=1 \ > trainer.nnodes=1 \ > trainer.save_freq=10 \ > trainer.test_freq=10 \ > trainer.total_epochs=15 2>&1 \| tee verl_demo.log 2025-07-14 02:54:13,989 INFO worker.py:1908 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 (TaskRunner pid=1805000) TaskRunner hostname: ip-172-31-9-244, PID: 1805000 (TaskRunner pid=1805000) {'actor_rollout_ref': {'actor': {'checkpoint': {'load_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra'], (TaskRunner pid=1805000) 'save_contents': ['model', (TaskRunner pid=1805000) 'optimizer', (TaskRunner pid=1805000) 'extra']}, ``` ### API and Usage Example No ### Design & Code Changes No ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2025-07-14 13:26:32 +08:00
H	cbeb3f4dae	[rollout] fix: fix hf rollout and add single gpu test (#2371 )	2025-07-05 18:51:26 +08:00
Changling	c5bc81b692	[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (#1682 )	2025-05-31 12:51:19 +08:00
Shawn/Yuxuan Tong	5ba1dbc606	[ci] feat: improve CI speed to 1-2min per test (#1032 ) ### Summary #### Minimize Test Workloads This PR minimizes the test workloads while keeping them meaningful, reducing the time cost of a test from >10 min to 1~2 min. Specifically, we 1. set batch sizes and steps as small but still meaningful numbers: ```bash train_traj_micro_bsz_per_gpu=2 # b n_resp_per_prompt=4 # g train_traj_micro_bsz=$((train_traj_micro_bsz_per_gpu * NUM_GPUS)) # b * n train_traj_mini_bsz=$((train_traj_micro_bsz * 2)) # 2 * b * n train_prompt_mini_bsz=$((train_traj_mini_bsz * n_resp_per_prompt)) # 2 * b * n / g train_prompt_bsz=$((train_prompt_mini_bsz * 2)) # 4 * b * n / g # ... TOT_TRAIN_STEPS=${TOT_TRAIN_STEPS:-1} ``` 2. disable validation (this costs a lot!) / saving / resuming for training tests by default and leave them to specialized tests ```bash # Validation VAL_BEFORE_TRAIN=${VAL_BEFORE_TRAIN:-False} TEST_FREQ=${TEST_FREQ:--1} # Save & Resume RESUME_MODE=${RESUME_MODE:-disable} SAVE_FREQ=${SAVE_FREQ:--1} ``` #### Improve Triggering Mode This PRs introduces a more comprehensive triggering logic mode. Specifically, we 1. consider all Python code by default 2. include related entrypoints (the workflow config, scripts used by it and hydra config, etc.) 3. exclude unrelated Python code from other components (e.g., recipes, examples, Megatron, SFT, generation, evaluation, etc. for FSDP training) An example from `e2e_ppo_trainer`: ```yaml on: paths: - "*/.py" # Entrypoints - ".github/workflows/e2e_ppo_trainer.yml" - "examples/data_preprocess/gsm8k.py" - "examples/data_preprocess/geo3k.py" - "tests/e2e/ppo_trainer" - "verl/trainer/main_ppo.py" - "verl/trainer/config/ppo_trainer.yaml" - "!examples" - "!verl/trainer/main_.py" - "!verl/trainer/fsdp_sft_trainer.py" # Recipes - "!recipe" # Megatron - "!verl/workers//megatron_.py" ``` #### Avoid missing out errors Some test scripts didn't end with the main python command and might miss out the error. To address this issue, this PR introduces the following options: ```bash set -xeuo pipefail ``` , which means - `x`: Print each command before executing it (useful for debugging) - `e`: Exit immediately if any command fails (returns non-zero exit status) - `u`: Treat unset variables as an error - `o pipefail`: Return the exit status of the last command in a pipeline that failed, or zero if all succeeded Together, these options make the script fail fast and provide verbose output, which helps with debugging and ensuring the script doesn't continue after encountering errors. #### Others Besides, we also 1. unify runner labels into `"L20x8"` to enable preemptive scheduling of jobs 2. reduce test scripts of minimal differences, grouping by entrypoint (e.g. `ppo_trainer`, `ppo_megatron_trainer`, recipes, etc.), into a base script with options	2025-04-14 09:48:10 -07:00
Guangming Sheng	f2a76acd94	[BREAKING][misc] feat: change micro_batch_size to micro_batch_size_per_gpu (#136 ) ## Summary This PR changes all the micro_batch_size to micro_batch_size_per_gpu. The Core logic of setting batch size: - All algorithmic metrics (train batch size, ppo mini batch size): are global (from the perspective of single-controller), which will be normalized in each Worker. - All performance-related parameters (micro batch size, max token length in dynamic batch size) are local parameters, which represent the data sizes per GPU (i.e., each Worker). ## Main Changes 1. Change the scripts and config and delete the normalization for micro_bsz 2. Fix CI for SFT	2025-01-27 11:26:52 +08:00
Xingyao Wang	6d96fda3d4	[SFT] feat: Add LoRA support for SFT (#127 ) This PR adds support for LoRA (Low-Rank Adaptation) for efficient model fine-tuning. ### Changes 1. Added LoRA configuration support in trainer config 2. Modified FSDP wrapping policy to handle LoRA modules 3. Integrated with existing FSDP training infrastructure 4. Added peft dependency 5. Removed unused ring_attn_utils.py ### Features - Configurable LoRA rank and alpha parameters - Target module specification for selective adaptation - Compatible with FSDP sharding strategy ### Testing Tested with Qwen2.5-0.5B-Instruct model on GSM8K dataset using the provided example script. ### Dependencies - Added `peft` package to requirements.txt This PR is based on commit 902ddbe6 and has been merged with the latest upstream main branch. --------- Co-authored-by: Jiayi Pan <i@jiayipan.me> Co-authored-by: openhands <openhands@all-hands.dev>	2025-01-25 14:53:47 +08:00

6 Commits