mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Files

Minghui Jia 9f4161e250 [recipe] feat: add deepeyes recipe (#2398 )

### What does this PR do?

This PR introduces a complete training recipe for [DeepEyes:
Incentivizing "Thinking with Images" via Reinforcement
Learning](https://arxiv.org/abs/2505.14362).

The core feature is the support for multi-turn visual tools,
specifically the `ImageZoomInTool`, integrated with a custom reward
function based on the "LLM-as-a-Judge" pattern to evaluate model
performance.

Additionally, to better monitor and analyze the model's tool-use
behavior, this PR adds functionality to track tool call counts during
the training process and reports these metrics to logging systems like
wandb.

### API and Usage Example

The primary change is the new training recipe for DeepEyes. Users can
start a training run by using the provided configuration file.

1. Preprocess the dataset. We need to add some tool-related extra_info:
```bash
python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data>
```
2. Start the PPO training:
```bash
bash recipe/deepeyes/run_deepeyes_grpo.sh
```
The training process will automatically load the ImageZoomInTool and the
custom reward function as defined in the recipe.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

- **DeepEyes Recipe Integration**: Added a new recipe directory with
data preprocessing, tool config, and a custom reward function for
DeepEyes.
- **Visual Tool Support**: Implemented `ImageZoomInTool` with robust
bbox validation and resizing.
- **Tool Call Statistics**: Modified the rollout and metrics code to
track and log tool call counts per sample and per step.
- **Bug Fixes**: Fixed image byte handling and ensured special tokens
are preserved during decoding for tool call formatting.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

---------

Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com>
Co-authored-by: xieck13 <xieck13@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

2025-08-12 09:51:58 +08:00

5.0 KiB

Raw Permalink Blame History

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

This directory contains the implementation for reproducing the DeepEyes paper within the verl framework, supporting multi-turn visual tool calls. This implementation is based on the original DeepEyes paper and its official implementation, integrated with the multi-modal and multi-turn capabilities of the verl framework.

Reproducing the Experiment

Note on the 'Chart' Dataset:

The provided preprocessing script intentionally excludes data_v0.8_visual_toolbox_v2.parquet, which contains the 'Chart' data. This subset consists of very high-resolution images, often resembling large figures composed of multiple sub-plots, much like those found in academic papers.

Consequently, even after using the zoom-in tool, the resulting cropped images remain large. This poses a significant risk of causing Out-of-Memory (OOM) errors, which can abruptly terminate the training process.

We strongly recommend against training on the 'Chart' dataset on a single node.

Note on the 'thinklite' Dataset: Many images in the thinklite dataset have a very low resolution, with either a height or width below 28 pixels. This fails to meet the minimum input size required by the Qwen-2.5VL image processor and would cause errors during data loading.

To mitigate this, we upscale these low-resolution images to satisfy the processor's requirements. However, please be aware that because the original resolution is low, subsequent crop operations by the zoom-in tool might frequently trigger exceptions, which could in turn affect the model's tool-use performance.

First, launch an inference service to act as a judge for reward calculation. You can use the following script as a reference:

python -m sglang.launch_server --model-path /path/to/Qwen2.5-72B-Instruct \
    --port 18901 \
    --tp-size 8 \
    --context-length 32768 \
    --trust-remote-code \
    --log-requests false

Next, you can start the training:

bash recipe/deepeyes/run_deepeyes_grpo.sh

Performance

See Comment for more details.

Note: AgentLoop does not directly record num_tool_calls, but records num_turns. In our scenario, you can calculate the number of tool calls by num_tool_calls = num_turns / 2 - 1.

References and Acknowledgements

If you need further details for reproduction or encounter any issues, feel free to open an issue or contact the maintainers.

5.0 KiB Raw Permalink Blame History

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

Reproducing the Experiment

Performance

References and Acknowledgements

5.0 KiB

Raw Permalink Blame History