mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
## What does this PR do? This PR migrates the feature of RL on VLMs in our implementation in [EasyR1](https://github.com/hiyouga/EasyR1) fork back to veRL. We have validated this feature using Qwen2.5-VL 7B model on 8*H100 GPUs. The configuration and data processing script are provided along this PR for easy reproducing. ## How to reproduce? 1. Download and preprocess the dataset ```bash python3 examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k ``` 2. Start GRPO training ```bash bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh ``` ## Dependencies - vllm>=0.7.3 - transformers>=4.49.0 - [qwen-vl-utils](https://pypi.org/project/qwen-vl-utils/) - [mathruler](https://pypi.org/project/mathruler/) ## Major Changes ### New dataflow for multimodal RL In this PR, we introduce two new concepts in the dataflow, `multi_modal_data` and `multi_modal_inputs`. The former means the multi-modal features required by the **rollout** worker (such as vLLM), while the latter means the multi-modal features required by the **actor/critic** worker (such as an HF model). They are different because the rollout and actor workers have their own data format requirements. Taking Qwen2-VL + huggingface + vLLM as an example, the data structure should be: - **multi_modal_data**: {"image": [PIL.Image, PIL.Image, ...]} - **multi_modal_inputs**: {"pixel_values": torch.Tensor, "image_grid_thw": torch.Tensor} Both of them are converted to numpy objects and placed in the non-tensor batch in DataProto. This design can be extended to other modalities/VLMs easily due to the agnostic of models. ### Other changes - Data - Support pre-processing the [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k) dataset. - Support `config.data.image_key`, which should be **a list of Pillow images**. - Actor/Ref/Critic - Support `multi_modal_inputs`. - Process position ids to adapt to the m-rope . - Rollout - Update dtensor weight loader to adapt to the Qwen2-VL architecture in vLLM 0.7+. - Support `multi_modal_data`. - Use `raw_prompt_ids` as the vLLM inputs to **avoid unpadding** the input ids. - Reward Manager - Add **mathruler** for more accurate math scores on the Geometry 3k dataset - Models - Support calculating the position ids for the m-rope in Qwen2-VL. - Support removing padding in flash attention2 for m-rope (transformers itself **does not support it**). - Sharding Manager - Support all-gathering the non-tensor batch. - FSDP Workers / Checkpoint Merger - Support `AutoModelForVision2Seq` at model initialization. Note: The Ulysses parallelism is not completed yet. We will support it in the next update. ## Performance We provide the estimated MFU of the language model part for H100 GPUs. These values are lower than the actual ones because **we did not compute the FLOPs of the vision tower part**. - `remove_padding=False`: MFU ~7% - `remove_padding=True`: MFU ~20% The training and test reward score curves are presented as follows.  ## Who can review? @vermouth1992 @PeterSH6
120 lines
1.3 KiB
Plaintext
120 lines
1.3 KiB
Plaintext
**/*.pt
|
|
**/checkpoints
|
|
**/wget-log
|
|
**/_build/
|
|
**/*.ckpt
|
|
**/outputs
|
|
**/*.tar.gz
|
|
**/playground
|
|
**/wandb
|
|
|
|
# Byte-compiled / optimized / DLL files
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
dataset/*
|
|
tensorflow/my_graph/*
|
|
.idea/
|
|
# C extensions
|
|
*.so
|
|
|
|
# Distribution / packaging
|
|
.Python
|
|
env/
|
|
build/
|
|
develop-eggs/
|
|
dist/
|
|
downloads/
|
|
eggs/
|
|
.eggs/
|
|
lib/
|
|
lib64/
|
|
parts/
|
|
sdist/
|
|
var/
|
|
*.egg-info/
|
|
.installed.cfg
|
|
*.egg
|
|
|
|
# PyInstaller
|
|
# Usually these files are written by a python script from a template
|
|
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
|
*.manifest
|
|
*.spec
|
|
|
|
# Installer logs
|
|
pip-log.txt
|
|
pip-delete-this-directory.txt
|
|
|
|
# Unit test / coverage reports
|
|
htmlcov/
|
|
.tox/
|
|
.coverage
|
|
.coverage.*
|
|
.cache
|
|
nosetests.xml
|
|
coverage.xml
|
|
*,cover
|
|
.hypothesis/
|
|
|
|
# Translations
|
|
*.mo
|
|
*.pot
|
|
|
|
# Django stuff:
|
|
*.log
|
|
local_settings.py
|
|
|
|
# Flask stuff:
|
|
instance/
|
|
.webassets-cache
|
|
|
|
# Scrapy stuff:
|
|
.scrapy
|
|
|
|
# Sphinx documentation
|
|
docs/_build/
|
|
|
|
# PyBuilder
|
|
target/
|
|
|
|
# IPython Notebook
|
|
.ipynb_checkpoints
|
|
|
|
# pyenv
|
|
.python-version
|
|
|
|
# celery beat schedule file
|
|
celerybeat-schedule
|
|
|
|
# dotenv
|
|
.env
|
|
|
|
# virtualenv
|
|
venv/
|
|
ENV/
|
|
|
|
# Spyder project settings
|
|
.spyderproject
|
|
|
|
# Rope project settings
|
|
.ropeproject
|
|
|
|
# vscode
|
|
.vscode
|
|
|
|
# Mac
|
|
.DS_Store
|
|
|
|
# output logs
|
|
tests/e2e/toy_examples/deepspeed/synchronous/output.txt
|
|
|
|
# vim
|
|
*.swp
|
|
|
|
# ckpt
|
|
*.lock
|
|
|
|
# data
|
|
*.parquet
|