mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
docs: add vllm 0.8 page (#694)
## What does this PR do? Add document for using vLLM 0.8 in verl ## Who can review? @eric-haibin-lin
This commit is contained in:
@ -108,8 +108,8 @@ verl is fast with:
|
||||
## Performance Tuning Guide
|
||||
The performance is essential for on-policy RL algorithm. We write a detailed performance tuning guide to allow people tune the performance. See [here](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html) for more details.
|
||||
|
||||
## vLLM v0.7 integration preview
|
||||
We have released a testing version of veRL that supports vLLM>=0.7.0. Please refer to [this document](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.7.md) for installation guide and more information.
|
||||
## Use vLLM v0.8
|
||||
veRL now supports vLLM>=0.8.0 when using FSDP as the training backend. Please refer to [this document](docs/README_vllm0.8.md) for installation guide and more information.
|
||||
|
||||
## Citation and acknowledgement
|
||||
|
||||
|
59
docker/Dockfile.ngc.vllm0.8
Normal file
59
docker/Dockfile.ngc.vllm0.8
Normal file
@ -0,0 +1,59 @@
|
||||
# Start from the NVIDIA official image (ubuntu-22.04 + python-3.10)
|
||||
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html
|
||||
FROM nvcr.io/nvidia/pytorch:24.08-py3
|
||||
|
||||
# uninstall nv-pytorch fork
|
||||
RUN pip3 uninstall -y pytorch-quantization \
|
||||
pytorch-triton torch torch-tensorrt torchvision \
|
||||
xgboost transformer_engine flash_attn apex megatron-core
|
||||
|
||||
# Define environments
|
||||
ENV MAX_JOBS=32
|
||||
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
ENV NODE_OPTIONS=""
|
||||
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
|
||||
|
||||
# Define installation arguments
|
||||
ARG APT_SOURCE=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
|
||||
ARG PIP_INDEX=https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
|
||||
|
||||
# Set apt source
|
||||
RUN cp /etc/apt/sources.list /etc/apt/sources.list.bak && \
|
||||
{ \
|
||||
echo "deb ${APT_SOURCE} jammy main restricted universe multiverse"; \
|
||||
echo "deb ${APT_SOURCE} jammy-updates main restricted universe multiverse"; \
|
||||
echo "deb ${APT_SOURCE} jammy-backports main restricted universe multiverse"; \
|
||||
echo "deb ${APT_SOURCE} jammy-security main restricted universe multiverse"; \
|
||||
} > /etc/apt/sources.list
|
||||
|
||||
# Install systemctl
|
||||
RUN apt-get update && \
|
||||
apt-get install -y -o Dpkg::Options::="--force-confdef" systemd && \
|
||||
apt-get clean
|
||||
|
||||
# Install tini
|
||||
RUN apt-get update && \
|
||||
apt-get install -y tini && \
|
||||
apt-get clean
|
||||
|
||||
# Change pip source
|
||||
RUN pip config set global.index-url "${PIP_INDEX}" && \
|
||||
pip config set global.extra-index-url "${PIP_INDEX}" && \
|
||||
python -m pip install --upgrade pip
|
||||
|
||||
# Install torch-2.6.0 + vllm-0.8.1
|
||||
RUN pip install --no-cache-dir vllm==0.8.1 torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 tensordict torchdata \
|
||||
transformers>=4.49.0 accelerate datasets peft hf-transfer \
|
||||
ray codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb dill pybind11 liger-kernel mathruler \
|
||||
pytest yapf py-spy pyext pre-commit ruff
|
||||
|
||||
# Install flash_attn-2.7.4.post1
|
||||
RUN pip uninstall -y transformer-engine flash-attn && \
|
||||
wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
|
||||
pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
|
||||
|
||||
# Fix cv2
|
||||
RUN pip uninstall -y pynvml nvidia-ml-py && \
|
||||
pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6 && \
|
||||
pip install -U optree>=0.13.0
|
44
docs/README_vllm0.8.md
Normal file
44
docs/README_vllm0.8.md
Normal file
@ -0,0 +1,44 @@
|
||||
# Upgrading to vLLM >= 0.8
|
||||
|
||||
## Installation
|
||||
|
||||
Note: This version of veRL+vLLM 0.8+ supports **FSDP** for training and **vLLM** for rollout.
|
||||
|
||||
```bash
|
||||
# Create the conda environment
|
||||
conda create -n verl python==3.10
|
||||
conda activate verl
|
||||
|
||||
# Install verl
|
||||
git clone https://github.com/volcengine/verl.git
|
||||
cd verl
|
||||
pip3 install -e .
|
||||
|
||||
# Install the latest stable version of vLLM
|
||||
pip3 install vllm==0.8.1
|
||||
|
||||
# Install flash-attn
|
||||
pip3 install flash-attn --no-build-isolation
|
||||
|
||||
```
|
||||
|
||||
We have a pre-built docker image for veRL+vLLM 0.8.0. You can direct import it with the following command:
|
||||
|
||||
```bash
|
||||
docker pull hiyouga/verl:ngc-th2.6.0-cu120-vllm0.8.0
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
vLLM 0.8+ supports cuda graph and V1 engine by default in veRL. To enable these features, remember to add the following lines to the bash script:
|
||||
|
||||
```bash
|
||||
actor_rollout_ref.rollout.enforce_eager=False \
|
||||
actor_rollout_ref.rollout.free_cache_engine=False \
|
||||
```
|
||||
|
||||
and also **remove** the environment variable if it exists:
|
||||
|
||||
```bash
|
||||
export VLLM_ATTENTION_BACKEND=XFORMERS
|
||||
```
|
@ -74,6 +74,7 @@ verl is fast with:
|
||||
|
||||
perf/perf_tuning
|
||||
README_vllm0.7.md
|
||||
README_vllm0.8.md
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
Reference in New Issue
Block a user