frozenleaves/verl - verl - Gitea: Git for Me

mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Author	SHA1	Message	Date
Chi Zhang	515f2255ac	[ci] fix: use local models/configs/datasets to increase stability (#3616 ) ### What does this PR do? - As title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)	2025-09-25 22:14:56 +08:00
Chi Zhang	c4f4caf0cd	[misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 (#2733 ) ### What does this PR do? - Add TensorDict utilities and tests to cover the current DataProto functionalities. - Add nested tensor example to remove padding throughout the system - Add image example - Upgrade tensordict to v0.10 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-09 14:47:32 +08:00
Rasul Alakbarli	21b99ed741	[misc] feat: Added: "tensorboard" to the requirements.txt (#2900 ) ### What does this PR do? > This PR adds tensorboard as a dependency to requirements.txt file, across several Dockerfiles (Dockerfile.ngc.vllm, Dockerfile.ngc.vllm0.8, Dockerfile.ngc.vllm0.8.sagemaker), a setup script (install_vllm_sglang_mcore.sh), and the main setup.py file. This change ensures that the tensorboard package is consistently installed, enabling visualization of training metrics for various configurations and deployment environments. This is a maintenance task that enhances the project's observability without altering core functionality. ### Test > This change is a dependency update and doesn't require specific testing beyond confirming the installation is successful. ### API and Usage Example > No API changes are introduced. The usage of TensorBoard would be initiated by the user after installing the requirements. ```python # No code snippet is applicable for this change	2025-08-08 22:39:53 +08:00
Yuge Zhang	473d8ff0c1	[env] fix: bump tensordict to 0.9.1 (#2541 ) ### What does this PR do? Bump to tensordict 0.9.1 and ban 0.9.0 per discussions in #2460. This bug: https://github.com/pytorch/tensordict/issues/1374 has an impact on dp_actor, making it crash because of the wrong batch size. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-15 19:04:07 +08:00
Chi Zhang	de38ed4218	[env] feat: upgrade tensordict version (#2460 ) ### What does this PR do? Upgrade tensordict to latest ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).	2025-07-10 16:12:02 -07:00
Blue Space	ebb21b7fc7	[docker] refactor: Migrate images to verlai, support latest flash attention and newer CUDA versions in future (#2085 ) ### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Migrate images to verlai, upgrade CUDA support to 12.6 and support latest flash attention ```txt docker ├── README.md ├── verl0.4-cu124-torch2.6-fa2.7.4 │ ├── Dockerfile.app.sglang.vllm.mcore0.12 │ ├── Dockerfile.app.sglang.vllm.mcore0.13.preview │ ├── Dockerfile.app.vllm.mcore0.12 │ ├── Dockerfile.app.vllm.mcore0.13.preview │ ├── Dockerfile.base │ └── README.md ├── verl0.5-cu126-torch2.7.1-fa2.8.0 │ ├── Dockerfile.app.sglang.mcore0.12 │ ├── Dockerfile.app.sglang.mcore0.13.preview │ ├── Dockerfile.base.fi0.2.6 │ └── README.md └── verl0.5-preview-cu128-torch2.7.1-fa2.8.0 ├── Dockerfile.app.sglang.megatron ├── Dockerfile.base.fi0.2.6 └── README.md ``` - verlai/verl - verl0.4 - base - app.sglang.vllm.mcore - app.vllm.mcore - verl0.5 - base - app.sglang.mcore - app.vllm.mcore [may not support now, for debug] - verl0.5-preview - base - app.sglang.mcore - app.vllm.mcore [may not support now, for debug] ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.	2025-07-04 14:32:02 +08:00
xichengpro	b2235f0a55	[recipe] fix: unsupported operand type(s) for \|: 'dict' and 'DictConfig' (#2217 ) ### What does this PR do? #### Fix https://github.com/volcengine/verl/issues/2216 #### 1 Fix Config Reference in entropy_trainer.yaml #### 2 Fix TypeError When Merging `reward_kwargs` and `cfg_reward_kwargs` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. #### 1 Fix Config Reference in entropy_trainer.yaml - Modified File : `recipe.entropy.config.entropy_trainer.yaml` - Change: ```yaml - reward_model.reward_kwargs.overlong_buffer_cfg: $reward_model.overlong_buffer + reward_model.reward_kwargs.overlong_buffer_cfg: ${reward_model.overlong_buffer} ``` - Purpose : Ensures OmegaConf correctly resolves the reference as a DictConfig object instead of interpreting it as a string. #### 2 Fix TypeError When Merging `reward_kwargs` and `cfg_reward_kwargs` - Modified File : `recipe.entropy.main_entropy.py` - Change : ```yaml - reward_fn = load_reward_manager(config, tokenizer, num_examine=0, (merge_dict(reward_kwargs, cfg_reward_kwargs))) + reward_fn = load_reward_manager(config, tokenizer, num_examine=0, OmegaConf.merge(OmegaConf.create(reward_kwargs), cfg_reward_kwargs)) ``` - Purpose : Use OmegaConf.merge() to safely merge dict and DictConfig types. > Background : > The DAPORewardManager class accesses the `enable` attribute from `overlong_buffer_cfg`. > This fails if `overlong_buffer_cfg` is a regular dict. > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: H <linhaibin.eric@gmail.com>	2025-06-26 17:27:01 -07:00
Blue Space	43782a24bd	[Doc/Docker Image] Update mcore image to use vLLM which support qwen3 and rewrite installation from conda (#1505 ) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Update mcore image to use vLLM which support qwen3 and rewrite installation from conda ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes Docker image and docs ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - Issue Number: Fixes issue # or discussion # if any. - Training: both - Inference: both ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.	2025-05-14 14:40:13 +08:00
Qunhong Zeng	17f283b1e8	[vllm rollout] minor fix: make vllm version determination stronger (#1401 )	2025-05-09 18:11:30 -07:00
OC	d3b6c7052e	add pip dependances (#1439 )	2025-05-08 23:35:55 +08:00
湛露先生	5c3802687f	distro: clean req packages. (#1253 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-04-25 07:14:00 -07:00
Shawn/Yuxuan Tong	b00f77d855	[dev] feat: immigrate from yapf & pylint to ruff based on pre-commit (#1010 ) > [!WARNING] > We are [immigrating to `ruff` as the linter and formatter and `pre-commit` as the managing tool](https://github.com/volcengine/verl/pull/1010). > > If your branch is based on a previous commit using `yapf` and `pylint`, simply merging might trigger overwhelming linting errors, while you are only expected to resolve ones in the files related to your PR. > > To resolve this issue, please try the following workaround to only include the files you really changed in the PR: > > 1. In your branch, fix linting and format with `ruff`: `ruff check --fix && ruff-format` > 2. Squash into a single commit in a new branch: `git reset --soft $(git merge-base main HEAD) && git add -A && git commit -m "feat: ..."` > 3. Merge with the latest main: `git merge origin/main` > 4. Force push to your branch: `git push --force` We add the reminder above to the documentation to tell contributors how to avoid overwhelming linting errors. ### Motivation According to dicussion in #896, this PR immigrates from yapf & pylint to ruff based on pre-commit, which allows unified version control and automatic hook on committing. ### Summary The `pre-commit` hook and CI - checks staged / committed files in commits / PR's - checks all files each month (This should fail before we fix all the files by the ruff standard) ### Explanation for the Failing CI Workflow `pre-commit` For now, we only apply `ruff format` and `ruff check --fix` without resolving all the errors, since there are too many errors to resolve, which causes the CI workflow `pre-commit` fails. For resolving the remaining errors, we leave to future commits. Specifically, the `pre-commit` hook and CI will require every commit to fix its related files with `ruff`, which will fix all the files incrementally. ### Reviewing Suggestion The commit `3d93f51ba8` is huge since we apply `ruff` to all the files. To review the main changes, please check the commits before and after it.	2025-04-18 07:49:31 -07:00
Qunhong Zeng	6974bbaeea	[dataset] refactor: use hf Dataset instead of pandas DataFrame in RLHFDataset for speedup (#890 ) HF Dataset provides better memory management and can handle larger datasets. It also supports multi-process acceleration during map/filter operations (while pandas requires version >2.0). Now we can specify `filter_overlong_prompts` on large-scale datasets when set `filter_overlong_prompts_workers` to a appreciate num. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>	2025-04-03 21:51:53 -07:00
frederrx	4f32b32c99	ci/cd: add pylint to CI (#811 ) * add a workflow to run pylint * add a section to `pyproject.toml` that blacklists all rules which would trigger given the current code * pin a version of pylint in `requirements.txt` for reproducability In a followup PR I will remove some rules from the blacklist and fix some bugs.	2025-03-28 14:59:38 -07:00
hoshi-hiyouga	1b8b0e83b8	doc: upgrade to vllm 0.8.2 (#769 )	2025-03-27 21:54:23 +08:00
Yuyang Ding	529a4fe076	Make Math-Verify Optional (#683 ) https://github.com/volcengine/verl/issues/680 Changes: - Move math-verify to the optional dependencies. Now it can be installed via `cd verl && pip install -e .[math]` - Revert using naive verifier for math dataset. Users can switch to math-verify or custom a new `compute_score` function.	2025-03-20 21:43:13 +08:00
Junrong Lin	333e6d624a	[rollout] feat: add SGLang as rollout engine to verl (#490 ) #22 . WIP, will add more details tomorrow :) --------- Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-17 21:12:33 +08:00
Joel	9ae01af216	doc: add multinode training and debug tutorial (#585 ) #354	2025-03-14 10:58:37 +08:00
Yuyang Ding	d4a00ef0b7	Add Math-Verify Support (#545 ) # Description https://github.com/volcengine/verl/issues/287, https://github.com/volcengine/verl/issues/295. This PR introduces support for [Math-Verify](https://github.com/huggingface/Math-Verify) as a new rule-based reward scorer, significantly improving evaluation accuracy. # Key changes - Added `math-verify` to the installation dependencies. - Introduced `reward_score/math_verify.py` and updated `reward_score/__init__.py`. # Test Comparison between the existing scorer in math.py and the newly added `math_verify.py`, using Qwen2.5-Math-7B-Instruct: ``` # Use scorer in math.py (original) {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803} # Use scorer in math_verify.py (newly added) {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338} ``` Test scripts: ```bash set -x # Data Process python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math # Evaluation export CUDA_VISIBLE_DEVICES=4,5,6,7 export VLLM_ATTENTION_BACKEND=XFORMERS math_train_path=/workspace/datasets/math/train.parquet math_test_path=/workspace/datasets/math/test.parquet python3 -m verl.trainer.main_ppo \ data.train_files="$math_train_path" \ data.val_files="$math_test_path" \ data.max_prompt_length=2048 \ data.max_response_length=2048 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=1 \ actor_rollout_ref.rollout.temperature=0 \ trainer.logger=['console'] \ trainer.project_name='test-math-verify' \ trainer.experiment_name='test-math-verify' \ +trainer.val_before_train=True \ trainer.n_gpus_per_node=4 \ trainer.nnodes=1 \ trainer.total_epochs=0 \ data.train_batch_size=1024 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ algorithm.adv_estimator=grpo $@ ```	2025-03-12 22:13:44 +08:00
alexchiu	96d98ccb2a	[ckpt] replace DataLoader with StatefulDataLoader to support resume training for SequentialSampler (#389 ) Try to resolve this [issue](https://github.com/volcengine/verl/issues/356). As suggested by this issue discussion, I replace default DataLoader with StatefulDataloader, which provides state_dict and load_state_dict methods that may support resuming the iterator position of mid-epoch checkpointing.	2025-02-27 09:42:01 +08:00
Yu Feng	0268917a95	[misc] Add Ray Serve to requirements to support multi-node training (#318 ) This PR adds Ray Serve to the requirements to enable support for multi-node training. It addresses the issue described here: https://github.com/volcengine/verl/issues/87#issuecomment-2659493418 Co-authored-by: Yu Feng <fengyufengyu@didiglobal.com>	2025-02-21 13:26:43 +08:00
HL	0a1b16f800	distro: bump up version to v0.2.0.dev, limit vllm version (#327 )	2025-02-20 15:21:43 +08:00
HL	0c32cf784e	distro: make liger-kernel optional. do not rely on requirement.txt during setup (#286 )	2025-02-16 18:55:13 +08:00
Guangming Sheng	f1e13a6007	[misc] fix install requirement (#279 ) - Rollout back vllm version (vllm > 0.7.0 only for testing) - pyext as an extra requirement	2025-02-15 10:50:35 +08:00
ZSL98	f8b4d0857f	[testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version) (#209 ) This PR aims to integrate vllm>=0.7.0 and preserve: Backward compatibility: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still supported Forward compatibility: Future versions of vllm (>= 0.7.0) will be supported without requiring manual maintenance for each new release. The readme of this Beta version is located at docs/README_vllm0.7.md, where users can find the installation method and related features. This readme is copied as below. --- # Readme for verl(vllm>=0.7) version ## Installation Note: This version of veRL supports FSDP for training and vLLM for rollout. (Megatron-LM is not supported yet.) ``` # Create the conda environment conda create -n verl python==3.10 conda activate verl # Install verl git clone https://github.com/volcengine/verl.git cd verl pip3 install -e . # Install vLLM>=0.7 pip3 install vllm==0.7.0 # Install flash-attn pip3 install flash-attn --no-build-isolation ``` For existing stable vllm versions (<=0.7.2), you also need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps: - vllm/distributed/parallel_state.py: Remove the assertion below: ``` if (world_size != tensor_model_parallel_size * pipeline_model_parallel_size): raise RuntimeError( f"world_size ({world_size}) is not equal to " f"tensor_model_parallel_size ({tensor_model_parallel_size}) x " f"pipeline_model_parallel_size ({pipeline_model_parallel_size})") ``` - vllm/executor/uniproc_executor.py: change `local_rank = rank` to `local_rank = int(os.environ["LOCAL_RANK"])` - vllm/model_executor/model_loader/weight_utils.py: remove the `torch.cuda.empty_cache()` in `pt_weights_iterator` These modifications have already been merged into the main branch of vLLM. To avoid modifying these files manually, you can directly build vLLM from source. ## Features ### Use cuda graph After installation, examples using FSDP as training backends can be used. By default, the `enforce_eager` is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script: ``` actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ ``` For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds. Note: Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts). We are working with the vLLM team to check this issue. ### Other features in vLLM 1. num_scheduler_step>1: not supported yet (weight loading has not been aligned with `MultiStepModelRunner`) 2. Prefix caching: not supported yet (vLLM sleep mode does not support prefix caching) 3. Chunked prefill: supported --------- Co-authored-by: zhangshulai <zhangshulai@bytedance.com>	2025-02-15 00:17:33 +08:00
Zefan Wang	577a341bf3	add requirements (#231 ) add requirements to make some CI tests work	2025-02-09 19:41:02 +08:00
HL	677e120afa	data: fix the math dataset source (#175 ) since 'lighteval/MATH' is no longer available on huggingface.	2025-02-01 10:02:23 -08:00
Hongpeng Guo	dd4187795d	[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model (#133 ) ## Summary This PR enables to use Liger Kernel's `_apply_liger_kernel_to_instance` to init a fsdp worker model. ## Main Changes 1. Adding an option of using `liger_kernel.transformers.AutoLigerKernelForCausalLM` to load a model from pretained, instead of the default `transformers.AutoModelForCausalLM` 2. Added a test case using configuration file `tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh` ## Related Issue #96 ## TODO #97 optimize the memory usage when computing entropy & log_probs `6d96fda3d4/verl/workers/actor/dp_actor.py (L94-L106)` --------- Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-30 17:53:44 +08:00
Chi Zhang	29935aed35	[misc] fix: fix ray requirement (#163 )	2025-01-30 11:05:42 +08:00
Xingyao Wang	6d96fda3d4	[SFT] feat: Add LoRA support for SFT (#127 ) This PR adds support for LoRA (Low-Rank Adaptation) for efficient model fine-tuning. ### Changes 1. Added LoRA configuration support in trainer config 2. Modified FSDP wrapping policy to handle LoRA modules 3. Integrated with existing FSDP training infrastructure 4. Added peft dependency 5. Removed unused ring_attn_utils.py ### Features - Configurable LoRA rank and alpha parameters - Target module specification for selective adaptation - Compatible with FSDP sharding strategy ### Testing Tested with Qwen2.5-0.5B-Instruct model on GSM8K dataset using the provided example script. ### Dependencies - Added `peft` package to requirements.txt This PR is based on commit 902ddbe6 and has been merged with the latest upstream main branch. --------- Co-authored-by: Jiayi Pan <i@jiayipan.me> Co-authored-by: openhands <openhands@all-hands.dev>	2025-01-25 14:53:47 +08:00
HL	e6b089c5a8	[example] docs: add getting started notebook with free GPUs from lightning (#92 )	2025-01-11 09:10:53 -08:00
Guangming Sheng	569210e06c	[misc] feat: spport rmpad/data-packing in FSDP with transformers (#91 ) * init commit of rmpad * add rmpad test * support rmpad in actor model * add test for value model * support rmpad in critic and rm * fix actor return and fix num_labels and clean not used rmpad * fix critic and benchmark * update script * fix critic * lint * fix util issue * fix unnecessary unpad * address issues * fix args * update test and update rmpad support model list * fix typo * fix typo and fix name * rename rmpad to rename padding * fix arch to model_type * add ci for e2e rmpad and fix typo * lint * fix ci * fix typo * update tests for customize tokenizer in actor * fix rmpad test * update requirement of transformers as hf_rollout may have issue	2025-01-11 16:50:15 +08:00
HL	e88cf81ae8	[megatron] docs: clean up unused code, update megatron backend docs and installation docs (#89 ) * [megatron] style: clean up unused code in megatron * update docs * add install from docker section for docs --------- Co-authored-by: Your Name <you@example.com>	2025-01-09 21:09:05 -08:00
Guangming Sheng	c7bd2528b3	[ci] feat: add more CI workflow (#38 ) * [ci] upload several tests * [ci] add sanity and tensordict utility workflow * [ci] fix workflow * try fix import ci * [dataproto] update repeat and unpad/pad * fix rollout test to 2GPU * add a fsdp vllm hybridengine script, which can be launched by torchrun * fix import test * update requirement.txt * draft vllm fsdp test * update label * fix * upload conda * test conda * test ci * use docker * test ci * test ci * test ci * update ci * test ci * fix model loader * fix model loader * test ci * test * upload e2e digit completion test * update running script for e2e test * update test config * fix path * test * fix import to register autotokenizer * fix tokenizer * fix create dataset * fix * fix reward model validate * fix reward module of digit_completion * fix reward module of digit_completion * fix reward module of digit_completion * fix reward module of digit_completion * fix reward module of digit_completion * can run but seems to have some test issue * no problem, add check results * add e2e training * l20-0 seems has docker permission problem, test later * fix * test l20-0 and torchrun * test l20-0 and torchrun * fix * fix * fix * fix * fix * tolerate difference * tolerate difference with levenshtein * lint * add more test for ray * delete * use docker on l20 * use docker on l20 * add upgrade * update ci * delete code * ignore test * upgrade ray * fix workerhelper method * lint * revert worker changes * fix * fix * fix * fix worker missing func	2025-01-08 17:10:27 -08:00
Guangming Sheng	94f4ca026d	[misc] fix: weak reference of WorkerDict in RayTrainer (#65 ) * [misc] fix: weak reference of WorkerDict in RayTrainer * remove docker changes to next commit	2024-12-30 11:06:38 +00:00
HL	86fa600870	[install] chore: add pyproject.toml. make vllm default dependency (#63 )	2024-12-23 16:52:33 -07:00
HL	09568e60ea	[install] fix: revert pyproj.toml and fix tensordict req (#59 ) * [megatron] chore: remove unused code * fix lint * ci install to user * test * fix requirements.txt * bump tensordict to 0.5	2024-12-21 10:50:03 -07:00
HL	370228247e	[distro] refactor: cleanup dependencies in setup script (#37 ) * [distro] refactor: cleanup dependencies in setup script * Update requirements.txt	2024-12-17 11:23:38 -08:00
Guangming Sheng	c592a8be4c	[rollout] feat: support vLLM v0.6.3 and fix hf rollout import issue (#33 ) * [feat] support vllm spmd version in v0.6.3 * [misc] fix hf_weight loader * [misc] rollout: update vllm version and fix hf import * lint * [misc] fix init * [doc] feat: modify doc to support vllm v6	2024-12-06 14:23:41 +08:00
shengguangming	30911f133a	[init] feat: upload first open source version of verl	2024-10-31 14:29:44 +08:00

40 Commits