mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
### What does this PR do? This PR introduces a complete training recipe for [DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning](https://arxiv.org/abs/2505.14362). The core feature is the support for multi-turn visual tools, specifically the `ImageZoomInTool`, integrated with a custom reward function based on the "LLM-as-a-Judge" pattern to evaluate model performance. Additionally, to better monitor and analyze the model's tool-use behavior, this PR adds functionality to track tool call counts during the training process and reports these metrics to logging systems like wandb. ### API and Usage Example The primary change is the new training recipe for DeepEyes. Users can start a training run by using the provided configuration file. 1. Preprocess the dataset. We need to add some tool-related extra_info: ```bash python recipe/deepeyes/deepeyes47k_preprocess.py --dataset_dir <path_to_raw_dataset> --save_dir <path_to_processed_data> ``` 2. Start the PPO training: ```bash bash recipe/deepeyes/run_deepeyes_grpo.sh ``` The training process will automatically load the ImageZoomInTool and the custom reward function as defined in the recipe. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes - **DeepEyes Recipe Integration**: Added a new recipe directory with data preprocessing, tool config, and a custom reward function for DeepEyes. - **Visual Tool Support**: Implemented `ImageZoomInTool` with robust bbox validation and resizing. - **Tool Call Statistics**: Modified the rollout and metrics code to track and log tool call counts per sample and per step. - **Bug Fixes**: Fixed image byte handling and ensured special tokens are preserved during decoding for tool call formatting. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Maxwell-Jia <mr.minghui.jia@gamil.com> Co-authored-by: xieck13 <xieck13@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>