### What does this PR do?
- As title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
### What does this PR do?
- As title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
### What does this PR do?
- As title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
### What does this PR do?
This is the first part to support vllm/sglang native http server in
server mode rollout. In native http server mode,
the inference services are launched separately from the training engine,
and the model runner share GPU with training engine but in different
processes.
We're going to support three deployment modes:
- **hybrid mode**: Training engine and model runner share GPU but in
different process. To sync weights, there's a server adapter in training
process, which is a http client to send wake_up/sleep/update_weights
request to inference server. This is used for on-policy training.
- **standalone mode**: Training engine and inference services have
separate GPU resource, disaggregated architecture. This is used for
off-policy training.
- **colocated mode**: Like hybrid mode, but without server adapter since
no need to sync weights. This is mainly used for GRM service (LLM as a
judge).
<img width="2644" height="1276" alt="image"
src="https://github.com/user-attachments/assets/2c1adf2d-adb5-4563-8a1a-8948f93b09b7"
/>
Following PR will be:
- [2/N] support DP+EP
- [3/N] standalone rollout with weight transfer by NCCL/UCX
- [4/N] colocated GRM service with wake_up/sleep(without weight
synchronization)
- [5/N] switch to `/generate` http api with token-in-token-out:
currently sglang has `/generate` api but may need some effort to support
multi-modal; while vllm still lack `/generate` api
- [6/N] switch to sglang/vllm router with better kv-cache awareness load
balance
The native http server is inspired by the design of
[slime](https://github.com/THUDM/slime), thanks to their prior work.
Also credit to @ChangyiYang @zhaochenyang20
https://github.com/volcengine/verl/pull/3090 @SuperCB
https://github.com/volcengine/verl/pull/3102 with their prior
contribution.
### What does this PR do?
Compute reward score for each prompt once the agent loop is finished,
this can significantly hide the reward computation time.
https://github.com/volcengine/verl/issues/2618
### What does this PR do?
This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).
The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.
Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.
### API and Usage Example
The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.
1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
---------
Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
### What does this PR do?
feat: Upgrade sglang 0.4.9 + transformers 4.53.2
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
### What does this PR do?
[docker] feat: upgrade to torch 2.7, sglang 0.4.8
Stage 2: vllm 0.9.1
Stage 3: mcore 0.13.0
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
---------
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
### What does this PR do?
Downgrade TransformerEngine version to allow mcore image using rope
fusion and provide another set of v0.5 image.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
### What does this PR do?
Before this PR, when `generate_sequences` with sampling param n>1,
DataProto repeat is quit diverge.
- validation: DataProto is repeated by `n` in driver, then chunked and
dispatched to rollout workers.
- training
- batch mode: DataProto is chunked and dispatched to rollout workers,
then repeated in rollout workers
- server mode: DataProto is repeated by `n` in driver, then chunked and
dispatched to rollout workers.
In batch mode, the `chunk-dispatch-repeat` pattern restricts GRPO
training where we have more GPUs than batch_size. For example,
`batch_size=128, n=16, world_size=256`:
- `chunk-dispatch-repeat`: DataProto(batch_size=128) can't be chunked to
256 shards.
- `repeat-chunk-dispatch`: after repeat, DataProto(batch_size=2048) can
be successfully chunked.
After this PR, always repeat DataProto in driver whether it's validate
or training, batch mode or server mode.
> [!IMPORTANT]
> This change breaks almost all recipes and projects using verl GRPO as
submodules.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
---------
Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do?
This PR adds support for tools to create and return multimodal data
(images and videos) during rollout. It enhances the framework to
properly handle multimodal inputs that are dynamically generated by
tools during multi-turn conversations.
### Key Features
- Tools can now return images and videos as part of their response
- Added support for processing multimodal inputs in the rollout system
- Introduced a new configuration option `return_multi_modal_inputs` to
control how multimodal inputs are processed
- Updated documentation with examples of how to implement tools that
generate multimodal data
### API and Usage Example
```python
async def execute(self, ...) -> Tuple[str | Dict[str, Any], float, dict]:
# Process images or videos
from verl.utils.dataset.vision_utils import process_image, process_video
img1 = process_image(img1)
video1 = process_video(video1)
# Return multimodal data
return {"image": [img1, ...], "video": [video1, ...], "text": "..."}, 0, {}
```
In your dataset config, set:
```yaml
data:
return_multi_modal_inputs: False
```
### Specific Changes
- Enhanced `AsyncRolloutRequest` to handle multimodal data from tools
- Updated `add_tool_response_messages` to process multimodal content
- Added documentation for multimodal tool support in the RST docs
- Fixed configuration in example YAML files
- Added proper handling of multimodal inputs in the rollout system
### Checklist Before Submitting
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] New CI unit test(s) are added to cover the code path.
- [X] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting
- [ ] Searched for similar PR(s).
- [ ] Checked PR Title format
- In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data`
- type is in `feat, fix, refactor, chore, test`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`
### What does this PR do?
Migrate images to verlai, upgrade CUDA support to 12.6 and support
latest flash attention
```txt
docker
├── README.md
├── verl0.4-cu124-torch2.6-fa2.7.4
│ ├── Dockerfile.app.sglang.vllm.mcore0.12
│ ├── Dockerfile.app.sglang.vllm.mcore0.13.preview
│ ├── Dockerfile.app.vllm.mcore0.12
│ ├── Dockerfile.app.vllm.mcore0.13.preview
│ ├── Dockerfile.base
│ └── README.md
├── verl0.5-cu126-torch2.7.1-fa2.8.0
│ ├── Dockerfile.app.sglang.mcore0.12
│ ├── Dockerfile.app.sglang.mcore0.13.preview
│ ├── Dockerfile.base.fi0.2.6
│ └── README.md
└── verl0.5-preview-cu128-torch2.7.1-fa2.8.0
├── Dockerfile.app.sglang.megatron
├── Dockerfile.base.fi0.2.6
└── README.md
```
- verlai/verl
- verl0.4
- base
- app.sglang.vllm.mcore
- app.vllm.mcore
- verl0.5
- base
- app.sglang.mcore
- app.vllm.mcore [may not support now, for debug]
- verl0.5-preview
- base
- app.sglang.mcore
- app.vllm.mcore [may not support now, for debug]
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
#1630
Core Implementation
- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
- Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback
Data Processing
- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field
Testing & Quality
- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases
Backward Compatibility
- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged
Key Features
1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
1. MCP client manager which manages the connection with
MCP server, such as session multiplexing, rate limit.
2. Search Tool with MCP client and
[Tavily](https://app.tavily.com/home) MCP server, which delivers the
same capability with Search R1 Tool.
3. A general MCP tool base for handling the logic of
executing.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
1. Register a [Tavily](https://app.tavily.com/home) account
2. Edit the `mcp_server.json` file by replacing `url` and `auth_token`.
Surely, you can use your own MCP server according to the instructions
provided by
[FastMCP](https://gofastmcp.com/clients/transports#configuration-based-transports)
(supporting SSEServer, stdioServer and streamHTTP)
3. Configure the `mcp_tool_config.yaml` file:
- `mcp_server_config_path` should point to the JSON file from step 2
- `tool_selected_list` specifies the tools you need to register from the
MCP server
4. *(Optional)* Implement a concrete instance based on `MCPBaseTool` to
parse the results returned by the server
Details are listed in
[tutorial](https://github.com/AlecHenx/ml-recipe/blob/main/Tutorial%20for%20MCP%20Tool%20in%20veRL.md)
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes part of issue #1837
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
- Unify the functionality of SGLangRollout and AsyncSGLangRollout,
remove original SGLangRollout and rename AsyncSGLangRollout to
SGLangRollout.
- Make trivial changes due to modification in sglang==0.4.6.post5.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
---------
Co-authored-by: zyzshishui <@qq.com>
Co-authored-by: Xiang Long <mindsculptor@yeah.net>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: H <linhaibin.eric@gmail.com>
- As users of veRL, we want to allow the model to call certain tools
during Actor rollout, incorporating the results into the training
process.
- We aim to support tool-calling capabilities of inference engines using
`sandbox-fusion` as the code execution system, providing the community
with a reimplementation of `retools`.
### Checklist Before Starting
- [x] Search for similar PR(s).
- Thanks to:
- close#1558 due to mix of prs
- close#1449 due to partial fix sgl new version issue
- close#1300 which is part of current pr
- This pr is co-authored with @ocss884
### What does this PR do?
> Add one-line overview of what this PR aims to achieve or accomplish.
- bump sglang to 0.4.6.post4
- unified sglang and sglang_async `generate_sequences` api behavior,
e.g. image support
- fix warning for cuda barrier at start of fsdp_workers
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
---------
Co-authored-by: ocss884 <ocss.lin@gmail.com>
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
Fix sglang CI, use stable way to download Qwen 7B model
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
- [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3
- [x] fix: flush_cache was never awaited
- [x] remove unused env
- [x] fix: add rank num to port to avoid SGLang picking the same port
when random.seed being set
- [x] feat: disable SGLang memory inbalance check by default
https://github.com/sgl-project/sglang/pull/5426
- [x] update setup.py to avoid old version pip can not resolving deps
- [x] fix: tools_kwargs length mismatch with batch #1380
> Add one-line overview of what this PR aims to achieve or accomplish.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.