frozenleaves/verl - verl - Gitea: Git for Me

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Author	SHA1	Message	Date
Blue Space	545f899844	[BREAKING] [perf] refactor: Profiler api refactor (#2894 ) ### What does this PR do? Refactor profiler CI to a unified way. TODO: - nsys use `save_path` - nsys descrete tests are disabled - torch profiler cc: @davidmlw ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example Global profiler config: ```yaml global_profiler: _target_: verl.utils.profiler.ProfilerConfig tool: null steps: null profile_continuous_steps: false save_path: outputs/profile tool_config: nsys: _target_: verl.utils.profiler.config.NsightToolConfig discrete: false npu: _target_: verl.utils.profiler.config.NPUToolConfig discrete: false contents: [] level: level1 analysis: true torch: _target_: verl.utils.profiler.config.TorchProfilerToolConfig step_start: 0 step_end: null ``` Local profiler config: ```yaml profiler: # Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs _target_: verl.utils.profiler.ProfilerConfig # profiler tool, default same as profiler.tool in global config # choices: nsys, npu, torch tool: ${oc.select:global_profiler.tool,null} # whether enable profile on critic enable: False # Whether to profile all ranks. all_ranks: False # The ranks that will be profiled. [] or [0,1,...] ranks: [] # profile results saving path save_path: ${oc.select:global_profiler.save_path,null} # specific tool config tool_config: ${oc.select:global_profiler.tool_config,null} ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-08-11 09:52:41 +08:00
Blue Space	d255783a0a	[docker] feat: upgrade vllm to 0.9.1 (#2747 )	2025-07-29 07:32:04 +08:00
vllbc02	3126c8b428	remove redundant 'get_custom_reward_fn' function (#1791 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > remove redundant 'get_custom_reward_fn' function. ### High-Level Design > None. ### Specific Changes > "from verl.trainer.ppo.reward import get_custom_reward_fn" instead of 'get_custom_reward_fn' function in verl/recipe/dapo/main_dapo.py verl/recipe/r1/main_eval.py verl/recipe/spin/main_spin.py verl/verl/trainer/main_eval.py verl/verl/trainer/main_eval.py > remove 'get_custom_reward_fn' function in verl/verl/trainer/main_ppo.py ### Additional Info. - [Issue Number](https://github.com/volcengine/verl/issues/1716): Fixes issue # or discussion # if any. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).	2025-06-01 21:45:54 +08:00
H	249c26fdc8	[tests] BREAKING: move recipe.dapo.src to recipe.dapo; move test files to their own namespaces (tests/verl/xxx -> tests/xxx) (#1392 )	2025-05-10 11:21:53 +08:00
Blue Space	ccab83654c	Megatron checkpoint default not save hf_models, and provide model merge tool. (#780 ) Because CI is too slow, combine the features and functions of checkpoint here in 1 PR. # Add Layer idx to decode layers But it seems to be hard to attach a "correct" layer number to each layer, now verl implemented megatron each pp and vpp rank's layers start from index 0, leading to some inconvenience for merging tool. The difficulty mainly comes from `torch.nn.ModuleList` implementation, [it suggests and forces to directly use index rather than custom layer number](`8a40fca9a1/torch/nn/modules/container.py (L302C5-L324C66)`). Current solution is that we modify the layer number to actual number starts from pp and vpp offset when saving megatron checkpoint, and recover when loading. When use merging tool, there is no need for extra scans. # Huggingface Model loader logic simplified Since every rank can have access to state_dict, there is actually no need to broadcast the weights among mp and dp groups at all, and all from rank 0. The implementation before is too costly and may cause OOM issue because each rank can take up whole model space in GPU. And the loader logic is not straight-forward, since everyone only need to load its vpp_size number of layers, why iterate over whole num_layers. So current solution is every rank load itself's sharded weights from `state_dict`. But this requires users having storage nodes available to connect with every calculation nodes. For those who can only use rank 0 to store huggingface model, we move original implementation to deperacated besides new version of file. # Modify test scripts to reuse downloaded huggingface model Avoid errors when connecting with huggingface to access metadata. # Modify CI workflows to enable load-balance of CI machines Currently L20-0 takes up 6 more jobs than L20-1, try reduce the pipeline bubble of each task.	2025-03-30 10:39:40 +08:00
Junrong Lin	333e6d624a	[rollout] feat: add SGLang as rollout engine to verl (#490 ) #22 . WIP, will add more details tomorrow :) --------- Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-17 21:12:33 +08:00
Blue Space	35555d8ae9	Verl's megatron core_r0.11.0 backend successfully tested with 3D parallelism with multiple bug fixed (#495 ) This PR combines multiple modifications. # QWen2.5 checkpoint saver bug fix Thanks for the efforts @uygnef contributed to #368 , we use the new saver for model loader and saver for 3D parallelism support. # Megatron backend 3D-parallelism test benches We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well as the CI workflows, all tested. # Bug Fix for 3D-parallelism Including configuration bugs as well as the module packing. Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the implementation with `torch.bmm`. # Fully migration to Megatron Core Now we only use Megatron core in verl, fully get rid of calling other components. If they are in need, please integrate them into `utils/megatron`. --------- Co-authored-by: uygnef <admin@fengyu.org>	2025-03-07 13:38:58 +08:00
Yusheng (Ethan) Su	4a291fa760	[Hardware] Support AMD (Rocm kernel) (#360 )	2025-03-06 13:56:20 +08:00
hoshi-hiyouga	b46f55ecc9	[feat] Initial support for VLMs, add Qwen2.5VL GRPO example (#386 ) ## What does this PR do? This PR migrates the feature of RL on VLMs in our implementation in [EasyR1](https://github.com/hiyouga/EasyR1) fork back to veRL. We have validated this feature using Qwen2.5-VL 7B model on 8H100 GPUs. The configuration and data processing script are provided along this PR for easy reproducing. ## How to reproduce? 1. Download and preprocess the dataset ```bash python3 examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k ``` 2. Start GRPO training ```bash bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh ``` ## Dependencies - vllm>=0.7.3 - transformers>=4.49.0 - [qwen-vl-utils](https://pypi.org/project/qwen-vl-utils/) - [mathruler](https://pypi.org/project/mathruler/) ## Major Changes ### New dataflow for multimodal RL In this PR, we introduce two new concepts in the dataflow, `multi_modal_data` and `multi_modal_inputs`. The former means the multi-modal features required by the rollout* worker (such as vLLM), while the latter means the multi-modal features required by the actor/critic worker (such as an HF model). They are different because the rollout and actor workers have their own data format requirements. Taking Qwen2-VL + huggingface + vLLM as an example, the data structure should be: - multi_modal_data: {"image": [PIL.Image, PIL.Image, ...]} - multi_modal_inputs: {"pixel_values": torch.Tensor, "image_grid_thw": torch.Tensor} Both of them are converted to numpy objects and placed in the non-tensor batch in DataProto. This design can be extended to other modalities/VLMs easily due to the agnostic of models. ### Other changes - Data - Support pre-processing the [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k) dataset. - Support `config.data.image_key`, which should be a list of Pillow images. - Actor/Ref/Critic - Support `multi_modal_inputs`. - Process position ids to adapt to the m-rope . - Rollout - Update dtensor weight loader to adapt to the Qwen2-VL architecture in vLLM 0.7+. - Support `multi_modal_data`. - Use `raw_prompt_ids` as the vLLM inputs to avoid unpadding the input ids. - Reward Manager - Add mathruler for more accurate math scores on the Geometry 3k dataset - Models - Support calculating the position ids for the m-rope in Qwen2-VL. - Support removing padding in flash attention2 for m-rope (transformers itself does not support it). - Sharding Manager - Support all-gathering the non-tensor batch. - FSDP Workers / Checkpoint Merger - Support `AutoModelForVision2Seq` at model initialization. Note: The Ulysses parallelism is not completed yet. We will support it in the next update. ## Performance We provide the estimated MFU of the language model part for H100 GPUs. These values are lower than the actual ones because we did not compute the FLOPs of the vision tower part. - `remove_padding=False`: MFU ~7% - `remove_padding=True`: MFU ~20% The training and test reward score curves are presented as follows. ![image](https://github.com/user-attachments/assets/ecb9fc27-8591-4c5b-ae4b-4ba77c6e30f9) ## Who can review? @vermouth1992 @PeterSH6	2025-03-03 19:41:28 +08:00
Guangming Sheng	27484a7bbb	[misc] feat: add ckpt manager in utils (#216 ) - Support FSDPCheckpointManager - Support hdfs_io import if installed - Add CI for FSDPCheckpointManager TODO: - Will integrate in the next PR	2025-02-07 09:09:03 +08:00
HL	e6b089c5a8	[example] docs: add getting started notebook with free GPUs from lightning (#92 )	2025-01-11 09:10:53 -08:00
shengguangming	30911f133a	[init] feat: upload first open source version of verl	2024-10-31 14:29:44 +08:00

12 Commits