mirror of https://github.com/vllm-project/vllm.git synced 2025-10-20 23:03:52 +08:00

Files

Peter Pan 0e1759cd54 [docs] add SYS_NICE cap & security-opt for docker/k8s (#24017 )

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2025-09-02 17:27:20 +00:00

1.7 KiB

Raw Blame History

--8<-- [start:installation]

vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.

ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.

!!! warning There are no pre-built wheels or images for this device, so you must build vLLM from source.

--8<-- [end:installation]

--8<-- [start:requirements]

OS: Linux
Compiler: gcc/g++ >= 12.3.0 (optional, recommended)
Instruction Set Architecture (ISA): NEON support is required

--8<-- [end:requirements]

--8<-- [start:set-up-using-python]

--8<-- [end:set-up-using-python]

--8<-- [start:pre-built-wheels]

--8<-- [end:pre-built-wheels]

--8<-- [start:build-wheel-from-source]

--8<-- "docs/getting_started/installation/cpu/build.inc.md"

Testing has been conducted on AWS Graviton3 instances for compatibility.

--8<-- [end:build-wheel-from-source]

--8<-- [start:pre-built-images]

--8<-- [end:pre-built-images]

--8<-- [start:build-image-from-source]

docker build -f docker/Dockerfile.cpu \
        --tag vllm-cpu-env .

# Launching OpenAI server
docker run --rm \
            --privileged=true \
            --shm-size=4g \
            -p 8000:8000 \
            -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
            -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
            vllm-cpu-env \
            --model=meta-llama/Llama-3.2-1B-Instruct \
            --dtype=bfloat16 \
            other vLLM OpenAI server arguments

!!! tip An alternative of --privileged=true is --cap-add SYS_NICE --security-opt seccomp=unconfined.

--8<-- [end:build-image-from-source]

--8<-- [start:extra-information]

--8<-- [end:extra-information]

1.7 KiB Raw Blame History