mirror of https://github.com/volcengine/verl.git synced 2025-10-20 13:43:50 +08:00

Files

H 52065c6405 [BREAKING][rollout] refactor: drop vllm v0.5.4 and v0.6.3 support (#2257 )

### What does this PR do?

This PR removes support for vLLM versions 0.5.4 and 0.6.3 from the verl
repository, completing a comprehensive cleanup of legacy
version-specific code branches. The changes simplify the codebase by
eliminating conditional logic and version-specific implementations,
requiring users to upgrade to vLLM 0.7.0 or later (recommended: vLLM
0.8.3+).

**Key Changes:**
- Deleted legacy rollout implementations (`fire_vllm_rollout.py`,
`vllm_rollout.py`, `test_vllm_hf_loader.py`)
- Removed version-specific directories (`vllm_v_0_5_4`, `vllm_v_0_6_3`) 
- Simplified sharding managers by removing `customized_vllm` flag
conditionals
- Updated configuration files to remove deprecated options
(`use_fire_sampling`)
- Cleaned up documentation and environment variable exports

### Checklist Before Starting

- [x] Search for similar PRs: No similar PRs found for this specific
cleanup
- [x] Format the PR title as `[BREAKING][vllm, rollout, worker]
refactor: Remove vLLM 0.5.4 and 0.6.3 support`
  - Modules: `vllm`, `rollout`, `worker` (primary affected components)
  - Type: `refactor` (code cleanup and simplification)
  - Breaking: Yes, requires vLLM version upgrade

### Test

This PR has been validated through:
- **CI Pipeline**: All existing tests pass with vLLM 0.7.0+ (27 checks
pending/running)
- **Version Detection**: New version check logic properly rejects vLLM
0.5.4/0.6.3 with clear error messages
- **Merge Conflict Resolution**: Successfully resolved complex conflicts
during main branch merge
- **Pre-commit Checks**: All linting and formatting requirements
satisfied

### API and Usage Example

**Breaking Changes:**
- **vLLM Version Requirement**: Minimum supported version is now 0.7.0
(recommended: 0.8.3+)
- **Removed Configuration Options**: `use_fire_sampling` no longer
available in config files
- **Environment Variables**: `VLLM_ATTENTION_BACKEND=XFORMERS` exports
removed (not needed for vLLM 0.7.0+)

**Migration Guide:**
```bash
# Before: vLLM 0.5.4/0.6.3 with custom flags
pip install vllm==0.6.3
export VLLM_ATTENTION_BACKEND=XFORMERS

# After: vLLM 0.8.3+ with V1 API
pip install vllm>=0.8.3
export VLLM_USE_V1=1  # Recommended for optimal performance
```

**Updated Configuration:**
```yaml
# generation.yaml - removed use_fire_sampling option
rollout:
  name: vllm_rollout
  # use_fire_sampling: False  # <- REMOVED
  
# Use standard vLLM rollout without legacy options
```

### High-Level Design

```mermaid
graph TB
    subgraph "Before: Multi-Version Support"
        A1[vLLM Version Check] --> B1{Version 0.5.4?}
        A1 --> B2{Version 0.6.3?}
        A1 --> B3{Version 0.7.0+?}
        B1 --> C1[Legacy vllm_v_0_5_4 Code]
        B2 --> C2[Legacy vllm_v_0_6_3 Code]
        B3 --> C3[Modern vLLM Code]
    end
    
    subgraph "After: Simplified Support"
        A2[vLLM Version Check] --> B4{Version >= 0.7.0?}
        B4 -->|Yes| C4[Modern vLLM Code Only]
        B4 -->|No| C5[Clear Error Message]
    end
```

### Specific Changes

**Deleted Files:**
- `verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py`
- `verl/workers/rollout/vllm_rollout/vllm_rollout.py` 
- `tests/workers/rollout/rollout_vllm/test_vllm_hf_loader.py`
- `verl/third_party/vllm/vllm_v_0_5_4/` (entire directory)
- `verl/third_party/vllm/vllm_v_0_6_3/` (entire directory)
- `pytest.ini`

**Modified Core Files:**
- `verl/third_party/vllm/__init__.py`: Simplified version detection with
clear error messages
- `verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py`: Removed
cache engine management and version conditionals
- `verl/workers/sharding_manager/fsdp_vllm.py`: Dropped
`customized_vllm` flag logic
- `verl/workers/sharding_manager/megatron_vllm.py`: Simplified weight
loading and cache management

**Configuration Updates:**
- `verl/trainer/config/generation.yaml`: Removed `use_fire_sampling`
option
- `verl/trainer/config/ppo_trainer.yaml`: Removed `use_fire_sampling`
option
- `tests/special_sanity/check_api_docs.py`: Removed `LLMEngine` from
whitelist

**Documentation Updates:**
- `docs/start/install.rst`: Updated to recommend vLLM 0.8.3+ with
`VLLM_USE_V1=1`
- `docs/perf/perf_tuning.rst`: Updated performance recommendations
- Removed 42+ `VLLM_ATTENTION_BACKEND=XFORMERS` exports from bash
scripts

**Reverted Changes:**
- `.github/workflows/vllm.yml`: Restored original container image names
- `docs/faq/faq.rst`: Restored original apptainer commands
- `docs/ascend_tutorial/ascend_quick_start.rst`: Reverted all
modifications
- `examples/tuning/*/`: Restored original `nproc_per_gpu` settings

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit run --all-files --show-diff-on-failure --color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs):
Updated install and performance tuning docs
- [x] Add unit or end-to-end test(s): Existing CI tests validate the
changes; legacy-specific tests were removed as intended
- [x] **CI Request**: Once PR is ready, message will be sent to
`ci-request` channel in verl Slack workspace

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

2025-06-29 19:27:22 -07:00

3.0 KiB

Raw Permalink Blame History

Upgrading to vllm >= 0.7

Note: verl+vllm 0.8.3 is now stable. Please see docs/README_vllm0.8.md for upgrade guide.

Installation

Note: At time of writing, verl+vllm 0.7.x supports FSDP for training and vLLM for rollout.

# Create the conda environment
conda create -n verl python==3.10
conda activate verl

# Install verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .

# Install the latest stable version of vLLM
pip3 install vllm==0.7.3 

# Install flash-attn
pip3 install flash-attn --no-build-isolation

Note that if you are installing lower versions of vLLM (0.7.0, 0.7.1, 0.7.2), you need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps:

vllm/distributed/parallel_state.py: Remove the assertion below:

if (world_size
        != tensor_model_parallel_size * pipeline_model_parallel_size):
    raise RuntimeError(
        f"world_size ({world_size}) is not equal to "
        f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
        f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")

vllm/executor/uniproc_executor.py: change local_rank = rank to local_rank = int(os.environ["LOCAL_RANK"])
vllm/model_executor/model_loader/weight_utils.py: remove the torch.cuda.empty_cache() in pt_weights_iterator

Features

Use cuda graph

After installation, examples using FSDP as training backends can be used. By default, the enforce_eager is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script:

actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=True \

For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.

Note: Currently, if the n is greater than 1 in SamplingParams in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts) using vLLM's V0 Engine.

Use vLLM V1 Engine

Using the vLLM V1 engine can avoid instability issues and achieve additional performance improvements. To use the V1 engine, you can first uninstall the previously installed vLLM and then follow the steps below to install the newer version.

git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 2275784
sed -i "903a\    data_parallel_size = world_size // pipeline_model_parallel_size // tensor_model_parallel_size" ./vllm/distributed/parallel_state.py
VLLM_USE_PRECOMPILED=1 pip install --editable .

Then you can enable the V1 engine by setting export VLLM_USE_V1=1. In some benchmark tests, the V1 engine demonstrates a 1.5x speed improvement over the vLLM V0 engine. The stable support of the vLLM V1 engine is available on verl main.

3.0 KiB Raw Permalink Blame History