mirror of
https://github.com/volcengine/verl.git
synced 2025-10-20 13:43:50 +08:00
### What does this PR do? > Update Dockerfile/Docker Image ### Checklist Before Starting - [X] Search for similar PRs. - [X] Format the PR title (This will be checked by the CI) ### Test > Done ### API and Usage Example > Usage example(s) [AMD_toturial](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst). ### Design & Code Changes > Dockerfile/Docker Image dependency: ROCm: 6.3.4 (patch version) Pytoch: 2.7.0 vllm: >=0.8.5 sglang: >=v0.4.6.post4 megatron-lm: TransformerEngine==1.14.0, megatron-core==0.12.0 Ray: >=2.45 Also allow VLM training ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
59 lines
1.5 KiB
Docker
59 lines
1.5 KiB
Docker
# Build the docker in the repo dir:
|
|
# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .
|
|
# docker images # you can find your built docker
|
|
|
|
|
|
# Support - Traing: fsdp; Inference: vllm
|
|
# FROM rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
|
|
# Support - Traing: fsdp; Inference: vllm, sglang
|
|
FROM lmsysorg/sglang:v0.4.6.post5-rocm630
|
|
|
|
# Set working directory
|
|
# WORKDIR $PWD/app
|
|
|
|
# Set environment variables
|
|
ENV PYTORCH_ROCM_ARCH="gfx90a;gfx942"
|
|
|
|
ENV HIPCC_COMPILE_FLAGS_APPEND="--amdgpu-target=gfx90a;gfx942 -D__HIP_PLATFORM_AMD__"
|
|
ENV CFLAGS="-D__HIP_PLATFORM_AMD__"
|
|
ENV CXXFLAGS="-D__HIP_PLATFORM_AMD__"
|
|
|
|
# Install vllm
|
|
RUN pip uninstall -y vllm && \
|
|
rm -rf vllm && \
|
|
git clone -b v0.6.3 https://github.com/vllm-project/vllm.git && \
|
|
cd vllm && \
|
|
MAX_JOBS=$(nproc) python3 setup.py install && \
|
|
cd .. && \
|
|
rm -rf vllm
|
|
|
|
# Copy the entire project directory
|
|
COPY . .
|
|
|
|
# Install dependencies
|
|
RUN pip install "tensordict==0.6.2" --no-deps && \
|
|
pip install accelerate \
|
|
codetiming \
|
|
datasets \
|
|
dill \
|
|
hydra-core \
|
|
liger-kernel \
|
|
numpy \
|
|
pandas \
|
|
peft \
|
|
"pyarrow>=15.0.0" \
|
|
pylatexenc \
|
|
"ray[data,train,tune,serve]<2.45.0" \
|
|
torchdata \
|
|
transformers \
|
|
wandb \
|
|
orjson \
|
|
pybind11
|
|
|
|
RUN git clone https://github.com/volcengine/verl.git && \
|
|
cd verl && \
|
|
pip install -e .
|
|
|
|
# Install torch_memory_saver
|
|
RUN pip install git+https://github.com/ExtremeViscent/torch_memory_saver.git --no-deps
|