Commit Graph

43 Commits

Author SHA1 Message Date
H
d0c7bbbc05 [cfg] refactor: support +extra.any_key usage for the base dataclass config in verl (#2502)
### What does this PR do?

This PR makes update to the base config in verl:
- support +extra.any_key usage for the base config in verl.
- allow selective subfields to be frozen
- add a auto-generated config yaml file
`verl/trainer/config/_generated_ppo_trainer.yaml` for reference purpose,
in case the nested inheritance structure makes the config information
too scattered

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

- added frozen field tests

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

Now you can pass `--xx.profiler.extra.any_new_key=any_plain_value` in
command line to a dataclass inheriting `verl.BaseConfig`. This way we
can still pass dataclass configs inside verl but allow some flexiblity
in accepting new keys from users' adhoc usage.


### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Lin <haibin@Lins-Laptop.hsd1.wa.comcast.net>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 09:06:56 +08:00
OC
4c37c97495 [rollout] fix: sglang async fail with Multi-stage Awake feature (#2365)
### What does this PR do?

Fix a regression from https://github.com/volcengine/verl/pull/1911,
because the PR did not change the sglang async branch.

CI did not catch this error because it only run 1 step, but this error
happen in the second test. So I update the testcases to run 2 steps.

To reproduce the bug, run test:
TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash
tests/special_e2e/ppo_trainer/run_function_reward.sh

It fail with:
```
(WorkerDict pid=1257286) Total steps: 2, num_warmup_steps: 0
(WorkerDict pid=1257286) Actor use_remove_padding=True
(WorkerDict pid=1257286) Actor use_fused_kernels=False
(AsyncSglangServer pid=1260392) FastAPI listen on [192.168.111.48:40451](http://192.168.111.48:40451/)
(WorkerDict pid=1257286) terminate called after throwing an instance of 'c10::Error'
(WorkerDict pid=1257286)   what():  CUDA error: an illegal memory access was encountered
(WorkerDict pid=1257286) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(WorkerDict pid=1257286) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(WorkerDict pid=1257286) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(WorkerDict pid=1257286)
(WorkerDict pid=1257286) Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
(WorkerDict pid=1257286) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbf6036c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/))
(WorkerDict pid=1257286) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fbf60315a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/))
(WorkerDict pid=1257286) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fbf6080d918 in
```



### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20an%20illegal%20memory%20access%20was%20encountered
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
```
(TaskRunner pid=1647269) step:2 - global_seqlen/min:13075 - global_seqlen/max:14837 - global_seqlen/minmax_diff:1762 - global_seqlen/balanced_min:14231 - global_seqlen/balanced_max:14232 - global_seqlen/mean:14231.5 - actor/entropy:2.0606913566589355 - critic/vf_loss:8.7157882153
```
### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-07-07 13:56:07 +08:00
H
cbeb3f4dae [rollout] fix: fix hf rollout and add single gpu test (#2371) 2025-07-05 18:51:26 +08:00
ebb21b7fc7 [docker] refactor: Migrate images to verlai, support latest flash attention and newer CUDA versions in future (#2085)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Migrate images to verlai, upgrade CUDA support to 12.6 and support
latest flash attention

```txt
docker
├── README.md
├── verl0.4-cu124-torch2.6-fa2.7.4
│   ├── Dockerfile.app.sglang.vllm.mcore0.12
│   ├── Dockerfile.app.sglang.vllm.mcore0.13.preview
│   ├── Dockerfile.app.vllm.mcore0.12
│   ├── Dockerfile.app.vllm.mcore0.13.preview
│   ├── Dockerfile.base
│   └── README.md
├── verl0.5-cu126-torch2.7.1-fa2.8.0
│   ├── Dockerfile.app.sglang.mcore0.12
│   ├── Dockerfile.app.sglang.mcore0.13.preview
│   ├── Dockerfile.base.fi0.2.6
│   └── README.md
└── verl0.5-preview-cu128-torch2.7.1-fa2.8.0
    ├── Dockerfile.app.sglang.megatron
    ├── Dockerfile.base.fi0.2.6
    └── README.md
```

- verlai/verl
  - verl0.4
    - base
    - app.sglang.vllm.mcore
    - app.vllm.mcore
  - verl0.5
    - base
    - app.sglang.mcore
    - app.vllm.mcore [may not support now, for debug]
  - verl0.5-preview
    - base
    - app.sglang.mcore
    - app.vllm.mcore [may not support now, for debug]


### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-07-04 14:32:02 +08:00
7ac0d98f09 [trainer, vllm] feat: add lora exclude_modules to support VL model lora training (#2182)
### What does this PR do?

Regarding multimodal models, vLLM currently only supports adding LoRA to
language model. We can use exclude_modules in lora config to exclude the
ViT part from applying lora finetuning. Anyway, just prepare this
feature for any possible use.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+lora+exclude+is%3Aopen.
you will see my closed pr
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test
Qwen2.5-VL-7B + GRPO + geo3k:
- blue: full parameters
- yellow: all-linear + exclude visual, lora rank 64
- red: all-linear w/o exclude visual, lora rank 64
- purple: ["q_proj","gate_proj"], + exclude visual, lora rank 16

- The red directly failed as expected with KeyError:
'blocks.0.attn.qkv.base_layer.weight'. Any mismatching in module names
will also fail directly, so successful runs validate the correctness.
- The val generations of lora VLM all look normal.
- Add "Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with
rmpad using function rm" test to e2e_ppo_trainer_vllm_vlm


![企业微信截图_1750837404244](https://github.com/user-attachments/assets/336261f0-5260-45e2-8312-86eb1ae375a5)

![企业微信截图_1750837525244](https://github.com/user-attachments/assets/eafae66e-6b61-4db4-853b-a3a0425be2aa)

![企业微信截图_17508374562057](https://github.com/user-attachments/assets/f01b098a-b383-4cc6-8f14-d51978121b59)

![企业微信截图_17508374786794](https://github.com/user-attachments/assets/75b4d566-cb63-4b63-9b85-300e02711739)

![企业微信截图_17508374937879](https://github.com/user-attachments/assets/8ed2979d-30b7-4d4f-85ad-0fee6aded619)


### API and Usage Example

For Qwen2.5VL, set
`actor_rollout_ref.model.exclude_modules='.*visual.*'`
It should be similar for other VLMs, e.g.
`actor_rollout_ref.model.exclude_modules='.*vision_tower.*'` for
kimi-vl.
To avoid failure for special architectures, specifying
actor_rollout_ref.model.target_modules is recommended over setting
actor_rollout_ref.model.target_modules=all-linear


### High-Level Design

The main conflict is that unlike Peft which only adds base_layer to
validated target_modules, vllm adds base_layer to all-linaer modules
(q/k/v/gate/up/down) of LLM with lora applied. When dealing with modules
to be stacked in vllm (qkv, gate_up), base_layer must be added to their
module name, which can unexpectedly involve all-linear modules of the
visual architecture, as is in current
`FSDPVLLMShardingManager.update_params.replace_lora_wrapper`.
My solution is prioritizing lora exclude_modules to ensure that lora and
base_layer will not be added to ViT, while the rest cases should remain
unchanged.

### Specific Changes

- add exclude_modules field to ppo_trainer.yaml
- adapt check_target_module_exists from Peft for standard
target/exclude_modules checking
- refactor replace_lora_wrapper in sharding_manager/fsdp_vllm.py for
correctly matching base_layer modules

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-30 12:15:48 +08:00
3b3e597042 [megatron] feat: Support of dist checkpoint (#2125)
### Checklist Before Starting

- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Support of dist checkpoint in saving, loading and model merger.

### Test

Algorithm:

<img width="783" alt="image"
src="https://github.com/user-attachments/assets/9a200b47-5937-426a-8da6-c601d2d8328f"
/>

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-06-25 17:17:29 +08:00
dc805c7897 [ci] fix: enable e2e ppo trainer test (#2174)
### What does this PR do?

Fix bugs introduced by https://github.com/volcengine/verl/pull/2113

Do not skip the e2e tests when pushing changes to the main branch

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-24 20:16:17 +08:00
2ac410f001 [fsdp] feat: support fsdp2 save hugging face model (#2138)
### What does this PR do?

Support FSDP2 save HF model. Previously only supported FSDP1, and FSDP2
will lead to error in https://github.com/volcengine/verl/issues/1703.

Fix https://github.com/volcengine/verl/issues/1703.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
2025-06-23 15:15:36 +08:00
644aaa76bc [sglang] feat: add multimodal input to multiturn async rollout (#2014)
### Checklist Before Starting

- [X] Searched for similar PR(s).

### What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang
async rollout only support text. There is also a placeholder for video
data, will be added as an input when SGLang engine supports it.

### High-Level Design

Since sglang engine already handle the image input, just need to
properly handling the tokenization.

### Specific Changes

Change `self.tokenizer.apply_chat_template()` to
`self.processing_class.apply_chat_template()`. `processing_class` could
be `tokenizer` or `processor`.


### Usage Example
It will automatically using processor to process image when the model's
processor supports that. It will use tokenizer if there is no processor
available

### Checklist Before Submitting

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: xieck13 <xieck13@gmail.com>
2025-06-22 15:43:46 -07:00
H
c87e91b2ef [ci] test: inspect the type annotation of newly added code, focusing on func defs (#2113)
### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

Per https://github.com/volcengine/verl/discussions/2112, type annotation
should be encouraged to increase readability.
In previous PRs, the type check script does not really take effect
(either too strict or too loose). In this PR, the check is limited to
only function definitions, with a default threshold. By default on CI it
only inspect the files changed in the current PR. For reference, below
is a glimpse of failure cases if we force it to inspect all files under
`verl`.

Upon failure, it prints:
```
f"Please add type annotations for inputs and outputs to meet threshold {args.threshold}. 
Cases exempt from checking:"
"1. Private methods."
"2. Args with name in ('self', 'cls'), or *args / **kwargs"
"3. Files under tests/"
```

```
verl/trainer/main_generation.py:44: def main(config):
verl/trainer/main_generation.py:48: def run_generation(config) -> None:
verl/trainer/main_generation.py:60: def main_task(config):
verl/trainer/main_eval.py:33: def process_item(reward_fn, data_source, response_lst, reward_data):
verl/trainer/main_eval.py:40: def main(config):
verl/trainer/main_ppo.py:26: def main(config):
verl/trainer/main_ppo.py:31: def run_ppo(config) -> None:
verl/trainer/main_ppo.py:182: def create_rl_dataset(data_paths, data_config, tokenizer, processor):
verl/trainer/main_ppo.py:224: def create_rl_sampler(data_config, dataset):
verl/trainer/main_ppo.py:57: def run(self, config):
verl/trainer/fsdp_sft_trainer.py:71: def extract_step(path):
verl/trainer/fsdp_sft_trainer.py:549: def run_sft(config):
verl/trainer/fsdp_sft_trainer.py:572: def main(config):
verl/trainer/fsdp_sft_trainer.py:576: def create_sft_dataset(data_paths, data_config, tokenizer):
verl/trainer/fsdp_sft_trainer.py:384: def training_step(self, batch: TensorDict):
verl/trainer/fsdp_sft_trainer.py:433: def validation_step(self, batch: TensorDict):
verl/trainer/fsdp_sft_trainer.py:444: def save_checkpoint(self, step):
verl/trainer/fsdp_sft_trainer.py:486: def fit(self):
verl/trainer/ppo/reward.py:25: def get_custom_reward_fn(config):
verl/trainer/ppo/reward.py:60: def load_reward_manager(config, tokenizer, num_examine, **reward_kwargs):
verl/trainer/ppo/reward.py:111: def compute_reward(data: DataProto, reward_fn):
verl/trainer/ppo/reward.py:133: def compute_reward_async(data: DataProto, config, tokenizer):
verl/trainer/ppo/reward.py:54: def wrapped_fn(*args, **kwargs):
verl/trainer/ppo/ray_trainer.py:132: def apply_kl_penalty(data: DataProto, kl_ctrl: core_algos.AdaptiveKLController, kl_penalty="kl", multi_turn=Fals
verl/trainer/ppo/ray_trainer.py:181: def compute_response_mask(data: DataProto):
verl/trainer/ppo/ray_trainer.py:199: def compute_advantage(data: DataProto, adv_estimator, gamma=1.0, lam=1.0, num_repeat=1, multi_turn=False, norm_a
verl/trainer/ppo/ray_trainer.py:89: def create_resource_pool(self):
verl/trainer/ppo/ray_trainer.py:710: def init_workers(self):
verl/trainer/ppo/ray_trainer.py:892: def fit(self):
verl/trainer/ppo/ray_trainer.py:381: def check_mutually_exclusive(mbs, mbs_per_gpu, name: str):
verl/trainer/ppo/core_algos.py:34: def register_adv_est(name_or_enum):
verl/trainer/ppo/core_algos.py:53: def get_adv_estimator_fn(name_or_enum):
verl/trainer/ppo/core_algos.py:116: def get_kl_controller(kl_ctrl):
verl/trainer/ppo/core_algos.py:127: def compute_gae_advantage_return(
verl/trainer/ppo/core_algos.py:174: def compute_grpo_outcome_advantage(
verl/trainer/ppo/core_algos.py:231: def compute_grpo_passk_outcome_advantage(
verl/trainer/ppo/core_algos.py:291: def compute_reinforce_plus_plus_baseline_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, 
verl/trainer/ppo/core_algos.py:336: def compute_rloo_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, index: np.ndarray, 
verl/trainer/ppo/core_algos.py:379: def compute_opo_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, index: np.ndarray, 
verl/trainer/ppo/core_algos.py:426: def compute_reinforce_plus_plus_outcome_advantage(token_level_rewards: torch.Tensor, response_mask: torch.Tensor, 
verl/trainer/ppo/core_algos.py:463: def compute_remax_outcome_advantage(token_level_rewards: torch.Tensor, reward_baselines: torch.Tensor, response_mask: 
verl/trainer/ppo/core_algos.py:492: def compute_rewards(token_level_scores, old_log_prob, ref_log_prob, kl_ratio):
verl/trainer/ppo/core_algos.py:497: def agg_loss(loss_mat: torch.Tensor, loss_mask: torch.Tensor, loss_agg_mode: str):
verl/trainer/ppo/core_algos.py:533: def compute_policy_loss(
verl/trainer/ppo/core_algos.py:599: def compute_entropy_loss(logits, response_mask, loss_agg_mode: str = "token-mean"):
verl/trainer/ppo/core_algos.py:616: def compute_value_loss(vpreds: torch.Tensor, returns: torch.Tensor, values: torch.Tensor, response_mask: torch.Tensor, 
verl/trainer/ppo/core_algos.py:651: def kl_penalty(logprob: torch.FloatTensor, ref_logprob: torch.FloatTensor, kl_penalty) -> torch.FloatTensor:
verl/trainer/ppo/core_algos.py:689: def compute_pf_ppo_reweight_data(
verl/trainer/ppo/core_algos.py:43: def decorator(fn):
verl/trainer/ppo/core_algos.py:99: def update(self, current_kl, n_steps):
verl/trainer/ppo/core_algos.py:112: def update(self, current_kl, n_steps):
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:329: def init_cache_engine(self):
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:334: def free_cache_engine(self):
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:355: def from_engine_args(
```

### Usage Example

For current git diffs compared to `main`:
```
python3 tests/special_sanity/type_coverage_check.py 
```
For inspecting all files under `verl/`
```
find verl -type f -name "*.py" | xargs -n 1 python3 tests/special_sanity/type_coverage_check.py --all-lines --debug --target-file
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-21 00:47:23 +08:00
OC
6642bb2eae [rollout] fix: error in sgyang async mode (#2098)
Fixed regression from:
 - https://github.com/volcengine/verl/pull/1668
 - https://github.com/volcengine/verl/pull/1933

Added e2e test for both sglang and vllm async mode test
2025-06-19 19:22:57 -07:00
H
34342365e6 [doc] test: ensure new docs are included in TOC tree (#2070)
### Checklist Before Starting

- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`
- [x] Searched for similar PR(s).

### What does this PR do?

Add docs to the ToC tree of the documentation website.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.



### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
2025-06-17 20:27:35 -07:00
H
5fa911b3ce [ci] refactor: setup testing guidance (#1958) 2025-06-12 06:16:58 -07:00
c8908e197c [fsdp] feat: Memory efficient cross entropy with a linear layer fused (#462)
Implemented forward and backward of the following compute logics, which
eliminated many intermediate storage tensors, and resulted in reduced
peak memory usage.

## Equivalent compute logic:
```python
def run_torch_entropy(hidden: torch.Tensor,
                    weight: torch.Tensor,
                    labels: torch.Tensor) -> typing.List[torch.Tensor]:
    logits = torch.matmul(hidden.to(torch.float32), weight.to(torch.float32)) # [num_tokens, vocab_size]
    pd = torch.nn.functional.softmax(logits, dim=-1) # [num_tokens, vocab_size]
    entropy_a = torch.logsumexp(logits, dim=-1) # [num_tokens]
    entropy_b = torch.sum(pd * logits, dim=-1) # [num_tokens]
    entropy = entropy_a - entropy_b
    logprobs = torch.nn.functional.cross_entropy(logits, labels) # [1]
    logprobs = torch.neg(logprobs)
    return logprobs, entropy
```

## API
```python
from verl.utils.kernel import linear_cross_entropy

hidden = torch.randn(num_tokens, hidden_size, dtype=torch.bfloat16, device="cuda")
weight = torch.randn(hidden_size, vocab_size, dtype=torch.bfloat16, device="cuda")
labels = torch.randint(0, vocab_size, (num_tokens,), device="cuda")

loss, entropy = linear_cross_entropy(hidden, weight, labels, reduction="mean")
```

## Storage and latency
<img width="636" alt="image"
src="https://github.com/user-attachments/assets/396b7303-a46a-46b1-a261-917fda034b02"
/>

## Unit test
```shell
$ cd verl/
$ python3 tests/kernel/test_memory_efficient_entropy.py
```

# NOTE
For compatibility, `torch.library.triton_op` was not applied to those
APIs, so that `torch.compile` might not be able to be enabled on top of
it.

---------

Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Co-authored-by: gaoziyuan.955 <gaoziyuan.955@bytedance.com>
Co-authored-by: Blue Space <57280232+ETOgaosion@users.noreply.github.com>
2025-06-11 19:48:47 +08:00
f880ec4c72 [ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters (#1821) 2025-06-10 20:29:16 +08:00
5aa1b046b4 [ppo] feat: add critic valuehead model support for multi-modal PPO (#1839)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

- 支持多模的 PPO,主要是复用 trl 的 `AutoModelForCausalLMWithValueHead` 作为 critic
valuehead model

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-06-09 10:22:54 +08:00
OC
9afa8d6dff fix error when ci failed by incorrect sgl-kernel version (#1872)
### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

Fix ci failure from incorrect sgl-kernel version in docker image:

```
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 647, in assert_pkg_version
    raise Exception(
Exception: sgl-kernel is installed with version 0.1.0, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall`
```
2025-06-06 13:55:08 +08:00
f7f8b042d5 [feat] Add support for FSDP2 in GRPO-LoRA (#1844)
1. Add: Add support for FSDP2 in GRPO-LoRa
2. Format: Automatic code formatting changes initiated by the pre-commit
tool
3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into
the CI pipeline.
2025-06-05 19:32:53 +08:00
4de247fe4d [sglang] refactor: Unify async rollout under SGLangRollout, and support sglang==0.4.6.post5 (#1717)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

- Unify the functionality of SGLangRollout and AsyncSGLangRollout,
remove original SGLangRollout and rename AsyncSGLangRollout to
SGLangRollout.
- Make trivial changes due to modification in sglang==0.4.6.post5.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Co-authored-by: zyzshishui <@qq.com>
Co-authored-by: Xiang Long <mindsculptor@yeah.net>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
2025-05-31 19:47:25 -07:00
9c50ffd0cb [vlm] Support ulysses sequence parallelism for vlm (#1739)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Only apply Ulysses sequence parallel to the LLM part of the VLM model,
which is the main component, to avoid `the Image features and image
tokens do not match` issue from occurring before `masked_scatter`.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. For the VLM model, we only pad the inputs before forward pass without
slicing them; instead, we perform slicing after the embedding stage.
2. In cases where ViT and LLM share/reuse FlashAttention, distinguish
the ViT scenario and skip the Ulysses logic.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

```
python -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=/mnt/hdfs/zhudelin123/data/geo3k/train.parquet \
    data.val_files=/mnt/hdfs/zhudelin123/data/geo3k/test.parquet \
    data.train_batch_size=64 \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation=error \
    data.image_key=images \
    actor_rollout_ref.model.path=/mnt/hdfs/Qwen2.5-VL-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=64 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.model.use_fused_kernels=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=[console,wandb] \
    trainer.project_name=nanzhe_verl_grpo_example_geo3k \
    trainer.experiment_name=qwen2_5_vl_7b_sp2_test \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=2 \
    trainer.save_freq=-1 \
    trainer.test_freq=-1 \
    trainer.default_hdfs_dir=null \
    trainer.total_epochs=1 \
    trainer.resume_mode=disable
```

<img width="481" alt="image"
src="https://github.com/user-attachments/assets/066db41d-46cf-4bc8-9d50-b9a8189c7654"
/>


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-30 11:12:49 +08:00
16f6c1ee65 [feat] lora: new feature -- LoRA support for PPO (#1127)
Co-Authored-By: Stephen Xie <stephenx@berkeley.edu>
Co-Authored-By: Tony Lian <longlian@berkeley.edu>
Co-Authored-By: Jiayi Pan <jiayipan@berkeley.edu>
Co-Authored-By: Simon Huang <thelongestusernameofall@gmail.com>

测试脚本如下:


```
#!/bin/bash
#
#   Author  :   simon huang
#   Date    :   2025年04月15日14:20:30
#   
#   For GRPO LoRA Support Dev 
#

set -x
## master:
# ray start --head --port=6379

## slave:
# ray start --address='localhost:6379'


# export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export WANDB_DIR=wandb-kkr1-lora-4p3bv1
export WANDB_PROJECT=simon-kkr1-lora-4p3bv1

# wandb server start --port 9090
export WANDB_BASE_URL=http://wandblocal:9000
export WANDB_API_KEY=local-5239e89783ebebea9bac5509e2bd1a8e734f55f7
# wandb login --relogin --host=http://wandblocal:9000
# export WANDB_MODE=offline

MODEL_PATH=/data1/models/Qwen/Qwen2.5-0.5B-Instruct

export VLLM_ATTENTION_BACKEND=XFORMERS

nproc_per_gpu=1
nnodes=1
nproc_per_node=2
total_procs=$(( nproc_per_gpu * nnodes * nproc_per_node ))
mini_batch_size=$(( total_procs ))

python3 -m verl.trainer.main_ppo \
    --config-name=lora-ppo_trainer.yaml \
    algorithm.adv_estimator=grpo \
    data.train_files=data/kk/parquet/train.parquet \
    data.val_files=data/kk/parquet/val.parquet \
    data.train_batch_size=${total_procs} \
    data.val_batch_size=${total_procs} \
    data.max_prompt_length=2000 \
    data.max_response_length=600 \
    actor_rollout_ref.model.path=$MODEL_PATH\
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.model.lora_rank=8 \
    actor_rollout_ref.model.lora_alpha=16 \
    actor_rollout_ref.model.target_modules=[k_proj,v_proj] \
    actor_rollout_ref.actor.optim.lr=3e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=${mini_batch_size} \
    actor_rollout_ref.actor.ppo_micro_batch_size=${mini_batch_size} \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=${mini_batch_size} \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.1 \
    actor_rollout_ref.rollout.n=2 \
    actor_rollout_ref.rollout.max_num_seqs=4 \
    actor_rollout_ref.rollout.max_model_len=4000 \
    actor_rollout_ref.rollout.max_num_batched_tokens=4000 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.ref.log_prob_micro_batch_size=${mini_batch_size} \
    actor_rollout_ref.ref.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 \
    actor_rollout_ref.actor.entropy_coeff=0.001 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    reward_model.reward_manager=naive \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name=$WANDB_PROJECT \
    trainer.experiment_name=$WANDB_PROJECT \
    trainer.n_gpus_per_node=${nproc_per_node} \
    trainer.nnodes=${nnodes} \
    trainer.default_local_dir=$WANDB_PROJECT \
    trainer.default_hdfs_dir=null \
    trainer.save_freq=1 \
    trainer.test_freq=1 \
    trainer.total_epochs=8 $@ 2>&1 | tee ${WANDB_PROJECT}.log

```


输出log如下:

```
(TaskRunner pid=2931272)   [Error] </answer> appears 0 times (expected 1)
(TaskRunner pid=2931272)   [Error] Incorrect tag order: Expected <think>...</think><answer>...</answer>
(TaskRunner pid=2931272)
(TaskRunner pid=2931272)   Format validation: FAIL
(TaskRunner pid=2931272)   Format score: -2
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) [Content Validation] Skipped due to format errors or missing answer
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) --------------------------------------------------------------------------------
(TaskRunner pid=2931272) --------------------------------- Final Score ----------------------------------
(TaskRunner pid=2931272)   Format: -2
(TaskRunner pid=2931272)   Answer: -2
(TaskRunner pid=2931272)   Total: -4
(TaskRunner pid=2931272) ================================================================================
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) local_global_step_folder: simon-kkr1-lora-4p3bv1/global_step_10
(WorkerDict pid=2948236) [rank-0]: LoRA adapter saved to simon-kkr1-lora-4p3bv1/global_step_10/actor/lora_adapter
Training Progress:   0%|          | 10/47200 [05:16<308:34:14, 23.54s/it]
(WorkerDict pid=2948236) [rank-0]: Saving model to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/model_world_size_2_rank_0.pt
(WorkerDict pid=2948236) [rank-0]: Saving checkpoint to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/model_world_size_2_rank
_0.pt
(WorkerDict pid=2948236) [rank-0]: Saving extra_state to /mnt/h800fast/simon/research/Train/RL/volcengine/simonverl/simon-kkr1-lora-4p3bv1/global_step_10/actor/extra_state_world_size
_2_rank_0.pt
(TaskRunner pid=2931272) step:10 - global_seqlen/min:1981.000 - global_seqlen/max:4883.000 - global_seqlen/minmax_diff:2902.000 - global_seqlen/balanced_min:3417.000 - global_seqlen/bal
anced_max:3447.000 - global_seqlen/mean:3432.000 - actor/entropy:1.657 - actor/pg_loss:0.000 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_
norm:1.258 - perf/mfu/actor:0.034 - perf/max_memory_allocated_gb:12.799 - perf/max_memory_reserved_gb:13.301 - perf/cpu_memory_used_gb:49.778 - actor/lr:0.000 - val-core/simon-kkr1/rewar
d/mean@1:-5.278 - val-aux/simon-kkr1/reward/std@1:0.000 - val-core/simon-kkr1/reward/best@1/mean:-5.278 - val-core/simon-kkr1/reward/best@1/std:0.000 - val-aux/simon-kkr1/reward/worst@1/mea
n:-5.278 - val-aux/simon-kkr1/reward/worst@1/std:0.000 - critic/score/mean:-3.658 - critic/score/max:-1.638 - critic/score/min:-5.734 - critic/rewards/mean:-3.658 - critic/rewards/max:-1
.638 - critic/rewards/min:-5.734 - critic/advantages/mean:-0.174 - critic/advantages/max:0.707 - critic/advantages/min:-0.707 - critic/returns/mean:-0.174 - critic/returns/max:0.707 - c
ritic/returns/min:-0.707 - response_length/mean:81.500 - response_length/max:150.000 - response_length/min:28.000 - response_length/clip_ratio:0.000 - prompt_length/mean:1634.500 - prom
pt_length/max:2319.000 - prompt_length/min:950.000 - prompt_length/clip_ratio:0.000 - timing_s/gen:3.607 - timing_s/old_log_prob:0.482 - timing_s/adv:0.015 - timing_s/update_actor:1.428
 - timing_s/testing:5.142 - timing_s/save_checkpoint:2.504 - timing_s/step:13.183 - timing_per_token_ms/adv:0.002 - timing_per_token_ms/update_actor:0.208 - timing_per_token_ms/gen:11.0
65 - perf/total_num_tokens:6864.000 - perf/time_per_step:13.183 - perf/throughput:260.329
(TaskRunner pid=2931272)
(TaskRunner pid=2931272) ================================================================================
(TaskRunner pid=2931272) ============================ Processing New Sample =============================
(TaskRunner pid=2931272) [Warnning] Failed to locate model response header
(TaskRunner pid=2931272)
```

LoRA adapter会和Checkpoint一同保存,截图如下:
<img width="831" alt="image"
src="https://github.com/user-attachments/assets/5b8b2283-decc-499a-b08c-62dcaa961c9f"
/>


少量训练后的reward@worst曲线:
<img width="511" alt="image"
src="https://github.com/user-attachments/assets/d3253782-50b8-4f42-b203-38a09685dc24"
/>

---------

Co-authored-by: Stephen Xie <stephenx@berkeley.edu>
Co-authored-by: Tony Lian <longlian@berkeley.edu>
Co-authored-by: Jiayi Pan <jiayipan@berkeley.edu>
Co-authored-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
2025-05-28 10:53:47 -07:00
8298f7d267 [Bugfix] Fix for non_fused_kernels passing arguments (#1687)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Non_fused_kernels passing arguments error causes Qwen2_5_VL failed.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-05-26 22:09:49 +08:00
96c181a2e6 chore(ci): support FSDP2 for multi-turn SGLangRollout with tool calling (#1650) 2025-05-23 22:52:04 +08:00
7b0426a738 [Docker Image] update images and fix sglang installation (#1606)
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

update images and fix sglang installation, the latest image:
`whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3`

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- vLLM: 0.8.5.post1
- SGLang: 0.4.6.post4, fix installation
- Megatron: core_v0.12.0 announcement
- TransformerEngine: 2.3

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
2025-05-21 09:13:51 +08:00
8160ec6a58 Bump to sglang 0.4.6.post4 & unified generate sequences ability between sgl and sgl async (#1577)
### Checklist Before Starting

- [x] Search for similar PR(s).
- Thanks to:
  - close #1558 due to mix of prs
  - close #1449 due to partial fix sgl new version issue
  - close #1300 which is part of current pr
- This pr is co-authored with @ocss884 

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

- bump sglang to 0.4.6.post4
- unified sglang and sglang_async `generate_sequences` api behavior,
e.g. image support
- fix warning for cuda barrier at start of fsdp_workers

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: ocss884 <ocss.lin@gmail.com>
2025-05-20 09:39:07 +08:00
b8bd596811 [Docker Image] use latest vLLM (0.8.5) to fully support Qwen3 moe (#1544) 2025-05-17 07:28:55 +08:00
3f4647f9bc [model merger] refactor model merger for better usage and maintainability (#1468)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR refactors `model_merge`, making the code cleaner and more
maintainable:

- now verl checkpointer manager will save model config and
processor/tokenizer (introduced in
https://github.com/volcengine/verl/pull/1288), so there is no need for
`hf_model_path`. This PR deprecates this argument and keeps it for
backward compatibility.
- the current `model_merge` has two purposes, merge checkpoints and test
checkpoints (mainly for CI). This PR separates these two purposes into
two sub-commands to better manage user input argument for improved user
experience.
- generally cleans up the code and makes it look better.

### Test
Our current CI hasn't tested DDP+FSDP e2e training. This PR also adds
DDP+FSDP e2e into CI and tests merging DDP+FSDP checkpoints.

The current CI should test this PR correctly.


### Additional Info.

- **Training**: both
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-16 23:53:08 +08:00
43782a24bd [Doc/Docker Image] Update mcore image to use vLLM which support qwen3 and rewrite installation from conda (#1505)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Update mcore image to use vLLM which support qwen3 and rewrite
installation from conda

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

Docker image and docs

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: both
- **Inference**: both

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-14 14:40:13 +08:00
ba6a2e0bb5 [FSDPCheckpointManager] feat: save huggingface model when 'hf_model' in checkpoint_contents (#1288)
Before, `FSDPCheckpointManager` will not save hf model when `hf_model`
is given in `checkpoint_contents`, instead, it only save the hf model's
config.

This PR correctly save the huggingface model when 'hf_model' is in
`checkpoint_contents`.
2025-05-07 20:44:46 +08:00
8bb009bf47 [CI] feat: separate FSDP2 test & fix: CI trigger (#1389)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Separate the FSDP2 test to avoid blocking other tests.
2. Fix the CI trigger rule to avoid redundant runs (since I find the
original PR triggers unrelated tests, so I fix the rule based on [the
doc](https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore))

### Test

For 2, I test by commenting out the matching path for workflow `.yml`,
and see only related workflows are triggered:

Before: <img width="870" alt="image"
src="https://github.com/user-attachments/assets/2f7dbe0c-f638-4a75-8cbc-a364081271fc"
/>

After: <img width="869" alt="image"
src="https://github.com/user-attachments/assets/f5a35d85-f03c-452e-abed-3ca3ce22d699"
/>

### Additional Info.

- **Issue Number**: https://github.com/volcengine/verl/issues/1388
- **Training**: FSDP
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
2025-05-05 07:20:35 -07:00
ec6843c604 [sglang] Upgrade sglang to 0.4.6.post1 & misc fixes (#1385)
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
- [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3
- [x] fix: flush_cache was never awaited
- [x] remove unused env 
- [x] fix: add rank num to port to avoid SGLang picking the same port
when random.seed being set
- [x] feat: disable SGLang memory inbalance check by default
https://github.com/sgl-project/sglang/pull/5426
- [x] update setup.py to avoid old version pip can not resolving deps  
- [x] fix: tools_kwargs length mismatch with batch #1380

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
2025-05-04 11:53:21 -07:00
e0d035cd4a [sglang] feat: Add SGLang async multi-turn rollout with tool support (#1037)
A redesigned version of #917 

## Current Status
[Develop log &
Tracker](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/113)

**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.

**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.

## Key Features will be introduced in future version

- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.

**Future Plan**
[Discussion
Thread](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/74#issuecomment-2763192625)
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.

## Contributors & Acknowledgement

- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH 
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng

---------

Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
2025-04-29 13:20:06 -07:00
0234d8e3ab fix reward model and add CI test (#1252)
Fix bugs related to #1165 .

Megatron backend reward model has no CI test, add to current ppo
trainer.

Fix `micro_batch_size_per_gpu` but not sure whether it is right for
reward config.

The output format is also not right with current `forward_micro_batch`
implementation.
2025-04-29 21:20:21 +08:00
f9dae2bb11 [CI] feat: only check changed files (#1294) 2025-04-28 20:24:41 +08:00
8e5ad4688a [Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors:

```
examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120)
examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120)
examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120)
examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120)
examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120)
examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120)
examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120)
examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell
recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120)
recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120)
recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120)
recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120)
recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120)
recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120)
recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120)
recipe/r1/data_process.py:131:9: E722 Do not use bare `except`
recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except`
scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used
scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found
scripts/model_merger.py:42:121: E501 Line too long (184 > 120)
scripts/model_merger.py:146:13: E722 Do not use bare `except`
tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120)
tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used
tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used
tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used
tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used
tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120)
tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20
tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120)
tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18
tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120)
tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120)
tests/sanity/check_license.py:22:1: E402 Module level import not at top of file
tests/sanity/check_license.py:23:1: E402 Module level import not at top of file
tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/__init__.py:22:1: E402 Module level import not at top of file
verl/__init__.py:24:1: E402 Module level import not at top of file
verl/__init__.py:25:1: E402 Module level import not at top of file
verl/__init__.py:29:1: E402 Module level import not at top of file
verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120)
verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120)
verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file
verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120)
verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120)
verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120)
verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120)
verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120)
verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120)
verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120)
verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120)
verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120)
verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120)
verl/protocol.py:303:121: E501 Line too long (125 > 120)
verl/protocol.py:352:121: E501 Line too long (171 > 120)
verl/protocol.py:578:121: E501 Line too long (142 > 120)
verl/protocol.py:580:121: E501 Line too long (150 > 120)
verl/protocol.py:583:121: E501 Line too long (167 > 120)
verl/protocol.py:715:1: E402 Module level import not at top of file
verl/protocol.py:725:121: E501 Line too long (121 > 120)
verl/protocol.py:766:1: E402 Module level import not at top of file
verl/protocol.py:768:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file
verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120)
verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120)
verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+`
verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused
verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused
verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120)
verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120)
verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged
verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120)
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120)
verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120)
verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120)
verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82
verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file
verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found
verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120)
verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120)
verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120)
verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used
verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key`
verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key`
verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120)
verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120)
verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition
verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120)
verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120)
verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string
verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120)
verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string
verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120)
verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused
verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused
verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/model.py:464:121: E501 Line too long (124 > 120)
verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string
verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120)
verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120)
verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused
verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120)
verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120)
verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120)
verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120)
verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120)
verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120)
verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120)
verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120)
verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120)
verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120)
verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file
verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120)
verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l`
verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used
verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file
verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used
verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used
verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120)
verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120)
verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120)
verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120)
verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120)
verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120)
verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120)
verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used
verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used
verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120)
verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used
verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120)
verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used
verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120)
verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120)
verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120)
verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used
verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120)
verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used
Found 305 errors.
```

---------

Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
2025-04-27 15:24:30 -07:00
fbb93e44b1 [CI] feat: only test for push to main (#1271) 2025-04-27 09:51:09 +08:00
c71b24d2e9 [SGLang] feat: upgrade to 0.4.5.post3 & fix ipv6 (#1203)
The ipv6 part is picked from
https://github.com/volcengine/verl/pull/1184 cc @BearBiscuit05

---------

Co-authored-by: BearBiscuit05 <xiangyongan@bytedance.com>
Co-authored-by: Gelee-Q <leege233@gmail.com>
2025-04-24 18:23:53 -07:00
a35c044627 Migrate to new image with FlashInfer 0.2.2 + vLLM 0.8.3 + SGLang 0.4.5 + MCore 0.12.0 + TE 2.2 + cuDNN 9.8.0 (#1237)
As support both, we let TE to choose attention backend now.

New Image:
`whatcanyousee/verl:ngc-cu124-vllm0.8.3-sglang0.4.5-mcore0.12.0-te2.2`
2025-04-24 16:14:48 +08:00
HL
5313d96f9b [CI] fix: add additional pre-commit test before ppo trainer tests (#1175) 2025-04-20 11:16:19 -07:00
HL
568239fb38 CI: limit ruff checks and enable push tests (#1157) 2025-04-19 13:54:45 +08:00
588404196b doc: upgrade to vllm 0.8.3 (#1081)
## What does this PR do?

- Upgrade docker image to vllm 0.8.3 to avoid memory leakage
- Add wake up tags to megatron rollout worker

## Who can review?

@vermouth1992 @BearBiscuit05 @ETOgaosion
2025-04-16 01:11:52 +08:00
4ec9974735 CI: add vlm CI for sglang rollout (#1088)
As titled.
2025-04-14 20:21:29 -07:00
5ba1dbc606 [ci] feat: improve CI speed to 1-2min per test (#1032)
### Summary

#### Minimize Test Workloads

This PR minimizes the test workloads while keeping them meaningful,
reducing the time cost of a test from >10 min to 1~2 min. Specifically,
we

1. set batch sizes and steps as small but still meaningful numbers:

```bash
train_traj_micro_bsz_per_gpu=2 # b
n_resp_per_prompt=4 # g

train_traj_micro_bsz=$((train_traj_micro_bsz_per_gpu * NUM_GPUS)) # b * n
train_traj_mini_bsz=$((train_traj_micro_bsz * 2)) # 2 * b * n
train_prompt_mini_bsz=$((train_traj_mini_bsz * n_resp_per_prompt)) # 2 * b * n / g
train_prompt_bsz=$((train_prompt_mini_bsz * 2)) # 4 * b * n / g
# ...
TOT_TRAIN_STEPS=${TOT_TRAIN_STEPS:-1}
```

2. disable validation (this costs a lot!) / saving / resuming for
training tests by default and leave them to specialized tests

```bash
# Validation
VAL_BEFORE_TRAIN=${VAL_BEFORE_TRAIN:-False}
TEST_FREQ=${TEST_FREQ:--1}
# Save & Resume
RESUME_MODE=${RESUME_MODE:-disable}
SAVE_FREQ=${SAVE_FREQ:--1}
```

#### Improve Triggering Mode

This PRs introduces a more comprehensive triggering logic mode.
Specifically, we

1. consider all Python code by default
2. include related entrypoints (the workflow config, scripts used by it
and hydra config, etc.)
3. exclude unrelated Python code from other components (e.g., recipes,
examples, Megatron, SFT, generation, evaluation, etc. for FSDP training)

An example from `e2e_ppo_trainer`:

```yaml
on:
    paths:
      - "**/*.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/e2e/ppo_trainer"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_trainer.yaml"
      - "!examples"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Recipes
      - "!recipe"
      # Megatron
      - "!verl/workers/**/megatron_*.py"
```

#### Avoid missing out errors

Some test scripts didn't end with the main python command and might miss
out the error.

To address this issue, this PR introduces the following options:

```bash
set -xeuo pipefail
```

, which means

- `x`: Print each command before executing it (useful for debugging)
- `e`: Exit immediately if any command fails (returns non-zero exit
status)
- `u`: Treat unset variables as an error
- `o pipefail`: Return the exit status of the last command in a pipeline
that failed, or zero if all succeeded

Together, these options make the script fail fast and provide verbose
output, which helps with debugging and ensuring the script doesn't
continue after encountering errors.

#### Others

Besides, we also

1. unify runner labels into `"L20x8"` to enable preemptive scheduling of
jobs
2. reduce test scripts of minimal differences, grouping by entrypoint
(e.g. `ppo_trainer`, `ppo_megatron_trainer`, recipes, etc.), into a base
script with options
2025-04-14 09:48:10 -07:00